Difference between revisions of "ISFDB:Data Consistency/Serial Mismatches"

From ISFDB
Jump to navigation Jump to search
(Moved the Serial Mismatches section to this page)
 
(Moved the discussion to the Talk page)
Line 1: Line 1:
 +
See the [http://www.isfdb.org/wiki/index.php?title=Talk:Data_Consistency/Serial_Mismatches&action=edit Talk  page] for associated discussions and code samples.
 +
 
{| border="1" cellpadding="2" style="border-collapse:collapse; text-align:left"
 
{| border="1" cellpadding="2" style="border-collapse:collapse; text-align:left"
 
! Title
 
! Title
Line 295: Line 297:
 
|
 
|
 
|}
 
|}
 
 
:: Some of these don't look like problems. "New Destinies" IS (well, WAS) a magazine, but to allow for the multiple printings, we've had to class it as "Anthology". No consistency problem there really. [[User:BLongley|BLongley]] 15:50, 18 Sep 2007 (CDT)
 
:: Using "Serial" to group several pieces within the same book looks wrong though - abuse "Series" for that, I'd say, unless they really are reprints of magazine entries. [[User:BLongley|BLongley]] 15:50, 18 Sep 2007 (CDT)
 
:: "(Excerpt)" looks wrong too, unless they're numbered excerpts leading to a whole work. [[User:BLongley|BLongley]] 15:50, 18 Sep 2007 (CDT)
 
:: "(Complete Novel)" is a known problem: I know that's what it says in many magazines, but reprints of the exact same work don't cut the mustard by today's standards. Create "Novella/Novellette" Variants of the magazine entries before inclusion in works where they're considered Short-Fiction maybe? Or make the "(Complete Novel)" titles variants of what we'd call them now? [[User:BLongley|BLongley]] 15:50, 18 Sep 2007 (CDT)
 
:: I'm not sure where this is going, but I'd sort this lot by title. "Journal from Ellipsia" still confuses me with double-entries here. [[User:BLongley|BLongley]] 15:50, 18 Sep 2007 (CDT)
 
 
:::Posting the Python code here as per Bill's request:
 
 
<pre>
 
# initialize the variables that will keep track of different type mismatches
 
bad_title = 0
 
bad_pub = 0
 
bad_xref = 0
 
mismatches = {}
 
exceptions = {}
 
# import the MySQLdb module that will allow Python to access MySQL
 
import MySQLdb
 
# establish a connection to MySQL and call it 'db'
 
db = MySQLdb.connect('localhost', username, password, 'ISFDB')
 
# create a cursor for the pub-content table
 
xref_cursor = db.cursor()
 
# create a cursor for the title table
 
title_cursor = db.cursor()
 
# create a cursor for the pubs table
 
pubs_cursor = db.cursor()
 
# initialize the list that will hold xref information
 
xref = []
 
# initialize the dictionary that will hold title information
 
titles = {}
 
# initialize the dictionary that will hold publication information
 
pubs = {}
 
# retrieve all rows in the pub_content (cross-reference) table
 
response = xref_cursor.execute("select * from pub_content;")
 
i = 0
 
while 1:
 
    #retrieve one row of pub_content data
 
    row = xref_cursor.fetchone()
 
    #break out of the loop when we finish the last retrieved row
 
    if row == None:
 
      break
 
    i = i +1
 
    if i % 25000 == 0:
 
        print "Imported ",i," cross-reference records"
 
    #xref_id = int(row[0])
 
    if type(row[1]) is not long:
 
        bad_xref = bad_xref + 1
 
        #print type(title_id)
 
        continue
 
    title_id = int(row[1])
 
    if type(row[2]) is not long:
 
        bad_xref = bad_xref + 1
 
        #print pub_id
 
        continue
 
    pub_id = int(row[2])
 
    xref.append([title_id,pub_id])
 
print "Retrieved ",len(xref)," cross-reference records"
 
 
# now retrieve all rows in the titles table
 
response = title_cursor.execute("select title_id,title_ttype,title_title from titles;")
 
i = 0
 
while 1:
 
    #retrieve one row of titles data
 
    row = title_cursor.fetchone()
 
    #break out of the loop when we finish the last retrieved row
 
    if row == None:
 
      break
 
    i = i +1
 
    if i % 25000 == 0:
 
        print "Imported ",i," title records"
 
    title_id = int(row[0])
 
    titles[title_id] = row[1]
 
       
 
# now retrieve all rows in the pubs table
 
response = pubs_cursor.execute("select pub_id,pub_ctype,pub_title from pubs;")
 
i = 0
 
while 1:
 
    #retrieve one row of titles data
 
    row = pubs_cursor.fetchone()
 
    #break out of the loop when we finish the last retrieved row
 
    if row == None:
 
      break
 
    i = i +1
 
    if i % 25000 == 0:
 
        print "Imported ",i," publications records"
 
    pub_id = int(row[0])
 
    pubs[pub_id] = row[1]
 
 
#iterate over the xref list and compare title/publication type
 
i = 0
 
for index in range(len(xref)):
 
    i = i + 1
 
    if i % 25000 == 0:
 
        print "Analyzed ",i," cross-reference records"
 
    xref_record = xref[index]
 
    title_id = xref_record[0]
 
    pub_id = xref_record[1]
 
    title_type = titles.get(title_id , "bad key")
 
    pub_type = pubs.get(pub_id , "bad key")
 
    if (title_type == "bad key"):
 
        bad_xref = bad_xref + 1
 
        continue
 
    if (pub_type == "bad key"):
 
        bad_xref = bad_xref + 1
 
        continue
 
    if (title_type == "CHAPTERBOOK"):
 
        mismatches["chapterbooks"] = mismatches.get("chapterbooks" , 0) + 1
 
        continue
 
    if pub_type == title_type:
 
        continue
 
    if (pub_type == "None") or (pub_type == ""):
 
        mismatches["missing publication type"] = mismatches.get("missing publication type" , 0) + 1
 
        continue
 
    if (title_type == "ESSAY") or (title_type == "REVIEW") or (title_type == "COVERART") or (title_type == "INTERIORART") or (title_type == "BACKCOVERART") or (title_type == "INTERVIEW"):
 
        continue
 
    if title_type == "OMNIBUS":
 
        if pub_type != "OMNIBUS":
 
            mismatches["omnibuses"] = mismatches.get("omnibuses" , 0) + 1
 
        continue
 
    if (title_type == "ANTHOLOGY") and (pub_type != "ANTHOLOGY"):
 
        mismatches["anthologies"] = mismatches.get("anthologies" , 0) + 1
 
        continue
 
    if (title_type == "EDITOR"):
 
        if (pub_type != "MAGAZINE") and (pub_type != "FANZINE") :
 
            mismatches["editor records"] = mismatches.get("editor records" , 0) + 1
 
        continue
 
    if (title_type == "SERIAL"):
 
        if (pub_type != "MAGAZINE") and (pub_type != "FANZINE") :
 
            mismatches["serials"] = mismatches.get("serials" , 0) + 1
 
            exceptions["serials", (title_id , pub_id)] = ""
 
        continue
 
    if (title_type == "COLLECTION") and (pub_type != "COLLECTION"):
 
        mismatches["collections"] = mismatches.get("collections" , 0) + 1
 
        continue
 
    if (title_type == "NOVEL"):
 
        if pub_type == "NOVEL":
 
            continue
 
        if pub_type == "OMNIBUS":
 
            continue
 
        if (pub_type == "MAGAZINE") or (pub_type == "FANZINE"):
 
            mismatches["novel/magazines"] = mismatches.get("novel/magazines" , 0) + 1
 
            continue
 
        mismatches["novels (other)"] = mismatches.get("novels (other)" , 0) + 1
 
        continue
 
    if (title_type == "SHORTFICTION"):
 
        if (pub_type == "NONFICTION") or (pub_type == "NOVEL"):
 
            mismatches["short fiction"] = mismatches.get("short fiction" , 0) + 1
 
        continue
 
    if (title_type == "POEM"):
 
        if (pub_type == "NONFICTION"):
 
            mismatches["poems"] = mismatches.get("poems" , 0) + 1
 
        continue
 
    if (title_type == "NONFICTION"):
 
        if (pub_type != "NONFICTION"):
 
            mismatches["nonfiction"] = mismatches.get("nonfiction" , 0) + 1
 
        continue
 
    if (title_type == "NONGENRE"):
 
        if (pub_type != "NOVEL"):
 
            mismatches["nongenre"] = mismatches.get("nongenre" , 0) + 1
 
        continue
 
    mismatches["uncategorized"] = mismatches.get("uncategorized" , 0) + 1
 
    #print title_data[0][1],title_type," vs. ",pub_data[0][1],pub_type
 
 
print "Bad title records: ",bad_title
 
print "Bad publication records: ",bad_pub
 
print "Bad cross-references: ",bad_xref
 
for key in exceptions:
 
    title_id = key[1][0]
 
    pub_id = key[1][1]
 
    response = title_cursor.execute("select title_ttype,title_title from titles where title_id = "+str(title_id)+";")
 
    row1 = title_cursor.fetchall()
 
    response = pubs_cursor.execute("select pub_ctype,pub_title,pub_tag from pubs where pub_id = "+str(pub_id)+";")
 
    row2 = pubs_cursor.fetchall()
 
    print "|-"
 
    print "|[http://www.isfdb.org/cgi-bin/title.cgi?"+str(title_id)+" "+str(row1[0][1])+"]"
 
    print "|"+str(row1[0][0])
 
    print "|[http://www.isfdb.org/cgi-bin/pl.cgi?"+str(row2[0][2])+" "+str(row2[0][1])+"]"
 
    print "|"+str(row2[0][0])
 
 
for key in mismatches:
 
    print "Mismatched",key,":",mismatches.get(key , 0)
 
 
db.close()
 
</pre>
 
:::[[User:Ahasuerus|Ahasuerus]] 23:45, 18 Sep 2007 (CDT)
 
 
:Some of the title appear to be correct. The titles have appeared in serial form in collections and anthologys first, but may not have been published as complete novels. So there listing as a serial maybe correct, they just have not been published in the usual magazine form.[[User:Kraang|Kraang]] 06:12, 20 Sep 2007 (CDT)
 
 
::The point about sorting by title is well taken, I will add it next when I have a moment. I will also add a Status column, where we could enter "FIXED" as we update the data. [[User:Ahasuerus|Ahasuerus]] 23:44, 19 Sep 2007 (CDT)
 
 
:::The table has been resorted by title and a "Fixed?" column has been added. However, my attempts to teach myself Python turned out to be more painful than expected (something to do with old dogs and new tricks), so I ended up converting the MySQL database to a more familiar format and parsing it using vintage 1960s technologies. I'll attack other type mismatches shortly. [[User:Ahasuerus|Ahasuerus]] 23:27, 29 Sep 2007 (CDT)
 

Revision as of 22:56, 30 September 2007

See the Talk page for associated discussions and code samples.

Title Title Type Publication Publication Type Fixed?
Balance and Power (Part 1) SERIAL Bardic Voices One ANTHOLOGY
Briar Patch (Part 1 of 2) SERIAL New Destinies VII, Spring 1989 ANTHOLOGY
Briar Patch (Part 2 of 2) SERIAL New Destinies, Volume VIII / Fall 1989 ANTHOLOGY
From Snow Crash: Chapter 1 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From The Crying of Lot 49 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From White Noise: Chapter 17 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From White Noise: Chapter 7 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From White Noise: Chapter 9 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
Iron (Part 1 of 2) SERIAL Far Frontiers, Volume VII/ Winter 1986 ANTHOLOGY
Iron (Part 2 of 2) SERIAL Inconstant Star COLLECTION
Iron (Part 2 of 2) SERIAL New Destinies, Volume I/Spring 1987 ANTHOLOGY
Iron (Part 2 of 2) SERIAL The Man-Kzin Wars ANTHOLOGY
Journal from Ellipsia (Excerpt) SERIAL SF 12 ANTHOLOGY FIXED
Journal from Ellipsia (Excerpt) SERIAL SF 12 ANTHOLOGY FIXED
Sleipnir (Excerpt) SERIAL Science Fiction Writer's Market Place and Sourcebook NOVEL FIXED
Stand on Zanzibar (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
Stand on Zanzibar (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
The Bridge of Light SERIAL The Bridge of Light NOVEL FIXED
The Dragon Masters (Complete Novel) SERIAL Stories from the Hugo Winners, Volume 2 ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners, Volume 2 ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners Volume One 1962-1967 ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners, Volumes One and Two ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Jack Vance Treasury COLLECTION
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners Volume One 1962-1967 ANTHOLOGY
The Eye of the Heron (Complete Novel) SERIAL Millennial Women ANTHOLOGY FIXED
The Left Hand of Darkness (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
The Left Hand of Darkness (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
The Mad Lands - Part 1: Death Wish SERIAL Adventure ANTHOLOGY
The Man on the Meteor - Part 3 (chapters 5-7) SERIAL Conflict Center: Naator NOVEL
The Pillars of the Earth Part 1 of 3 SERIAL The Pillars of the Earth Part 1 of 3 NOVEL
The Pillars of the Earth Part 2 of 3 SERIAL The Pillars of the Earth Part 2 of 3 NOVEL
The Pillars of the Earth Part 3 of 3 SERIAL The Pillars of the Earth Part 3 of 3 NOVEL
The Sleeper and the Seer (Excerpt from King of the World's Edge) SERIAL The Merlin Chronicles ANTHOLOGY
The Sleeper and the Seer (Excerpt from King of the World's Edge) SERIAL The Merlin Chronicles ANTHOLOGY
Untitled: Visible Light (part 1 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 1 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 2 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 2 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 3 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 3 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 4 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 4 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 5 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 5 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 6 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 6 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 7 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 7 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION