ISFDB:Data Consistency/Serial Mismatches

From ISFDB
< ISFDB:Data Consistency
Revision as of 22:53, 30 September 2007 by Ahasuerus (talk | contribs) (Moved the Serial Mismatches section to this page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
Title Title Type Publication Publication Type Fixed?
Balance and Power (Part 1) SERIAL Bardic Voices One ANTHOLOGY
Briar Patch (Part 1 of 2) SERIAL New Destinies VII, Spring 1989 ANTHOLOGY
Briar Patch (Part 2 of 2) SERIAL New Destinies, Volume VIII / Fall 1989 ANTHOLOGY
From Snow Crash: Chapter 1 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From The Crying of Lot 49 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From White Noise: Chapter 17 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From White Noise: Chapter 7 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
From White Noise: Chapter 9 (Excerpt) SERIAL Postmodern American Fiction ANTHOLOGY
Iron (Part 1 of 2) SERIAL Far Frontiers, Volume VII/ Winter 1986 ANTHOLOGY
Iron (Part 2 of 2) SERIAL Inconstant Star COLLECTION
Iron (Part 2 of 2) SERIAL New Destinies, Volume I/Spring 1987 ANTHOLOGY
Iron (Part 2 of 2) SERIAL The Man-Kzin Wars ANTHOLOGY
Journal from Ellipsia (Excerpt) SERIAL SF 12 ANTHOLOGY FIXED
Journal from Ellipsia (Excerpt) SERIAL SF 12 ANTHOLOGY FIXED
Sleipnir (Excerpt) SERIAL Science Fiction Writer's Market Place and Sourcebook NOVEL FIXED
Stand on Zanzibar (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
Stand on Zanzibar (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
The Bridge of Light SERIAL The Bridge of Light NOVEL FIXED
The Dragon Masters (Complete Novel) SERIAL Stories from the Hugo Winners, Volume 2 ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners, Volume 2 ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners Volume One 1962-1967 ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners, Volumes One and Two ANTHOLOGY
The Dragon Masters (Complete Novel) SERIAL The Jack Vance Treasury COLLECTION
The Dragon Masters (Complete Novel) SERIAL The Hugo Winners Volume One 1962-1967 ANTHOLOGY
The Eye of the Heron (Complete Novel) SERIAL Millennial Women ANTHOLOGY FIXED
The Left Hand of Darkness (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
The Left Hand of Darkness (Excerpt) SERIAL The Road to Science Fiction 3: From Heinlein to Here ANTHOLOGY FIXED
The Mad Lands - Part 1: Death Wish SERIAL Adventure ANTHOLOGY
The Man on the Meteor - Part 3 (chapters 5-7) SERIAL Conflict Center: Naator NOVEL
The Pillars of the Earth Part 1 of 3 SERIAL The Pillars of the Earth Part 1 of 3 NOVEL
The Pillars of the Earth Part 2 of 3 SERIAL The Pillars of the Earth Part 2 of 3 NOVEL
The Pillars of the Earth Part 3 of 3 SERIAL The Pillars of the Earth Part 3 of 3 NOVEL
The Sleeper and the Seer (Excerpt from King of the World's Edge) SERIAL The Merlin Chronicles ANTHOLOGY
The Sleeper and the Seer (Excerpt from King of the World's Edge) SERIAL The Merlin Chronicles ANTHOLOGY
Untitled: Visible Light (part 1 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 1 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 2 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 2 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 3 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 3 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 4 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 4 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 5 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 5 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 6 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 6 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 7 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION
Untitled: Visible Light (part 7 of 7) SERIAL The Collected Short Fiction of C. J. Cherryh COLLECTION


Some of these don't look like problems. "New Destinies" IS (well, WAS) a magazine, but to allow for the multiple printings, we've had to class it as "Anthology". No consistency problem there really. BLongley 15:50, 18 Sep 2007 (CDT)
Using "Serial" to group several pieces within the same book looks wrong though - abuse "Series" for that, I'd say, unless they really are reprints of magazine entries. BLongley 15:50, 18 Sep 2007 (CDT)
"(Excerpt)" looks wrong too, unless they're numbered excerpts leading to a whole work. BLongley 15:50, 18 Sep 2007 (CDT)
"(Complete Novel)" is a known problem: I know that's what it says in many magazines, but reprints of the exact same work don't cut the mustard by today's standards. Create "Novella/Novellette" Variants of the magazine entries before inclusion in works where they're considered Short-Fiction maybe? Or make the "(Complete Novel)" titles variants of what we'd call them now? BLongley 15:50, 18 Sep 2007 (CDT)
I'm not sure where this is going, but I'd sort this lot by title. "Journal from Ellipsia" still confuses me with double-entries here. BLongley 15:50, 18 Sep 2007 (CDT)
Posting the Python code here as per Bill's request:
# initialize the variables that will keep track of different type mismatches
bad_title = 0
bad_pub = 0
bad_xref = 0
mismatches = {}
exceptions = {}
# import the MySQLdb module that will allow Python to access MySQL
import MySQLdb
# establish a connection to MySQL and call it 'db'
db = MySQLdb.connect('localhost', username, password, 'ISFDB')
# create a cursor for the pub-content table
xref_cursor = db.cursor()
# create a cursor for the title table
title_cursor = db.cursor()
# create a cursor for the pubs table
pubs_cursor = db.cursor()
# initialize the list that will hold xref information
xref = []
# initialize the dictionary that will hold title information
titles = {}
# initialize the dictionary that will hold publication information
pubs = {}
# retrieve all rows in the pub_content (cross-reference) table
response = xref_cursor.execute("select * from pub_content;")
i = 0
while 1:
    #retrieve one row of pub_content data
    row = xref_cursor.fetchone()
    #break out of the loop when we finish the last retrieved row
    if row == None:
       break
    i = i +1
    if i % 25000 == 0:
        print "Imported ",i," cross-reference records"
    #xref_id = int(row[0])
    if type(row[1]) is not long:
        bad_xref = bad_xref + 1
        #print type(title_id)
        continue
    title_id = int(row[1])
    if type(row[2]) is not long:
        bad_xref = bad_xref + 1
        #print pub_id
        continue
    pub_id = int(row[2])
    xref.append([title_id,pub_id])
print "Retrieved ",len(xref)," cross-reference records"

# now retrieve all rows in the titles table
response = title_cursor.execute("select title_id,title_ttype,title_title from titles;")
i = 0
while 1:
    #retrieve one row of titles data
    row = title_cursor.fetchone()
    #break out of the loop when we finish the last retrieved row
    if row == None:
       break
    i = i +1
    if i % 25000 == 0:
        print "Imported ",i," title records"
    title_id = int(row[0])
    titles[title_id] = row[1]
        
# now retrieve all rows in the pubs table
response = pubs_cursor.execute("select pub_id,pub_ctype,pub_title from pubs;")
i = 0
while 1:
    #retrieve one row of titles data
    row = pubs_cursor.fetchone()
    #break out of the loop when we finish the last retrieved row
    if row == None:
       break
    i = i +1
    if i % 25000 == 0:
        print "Imported ",i," publications records"
    pub_id = int(row[0])
    pubs[pub_id] = row[1]

#iterate over the xref list and compare title/publication type
i = 0
for index in range(len(xref)):
    i = i + 1
    if i % 25000 == 0:
        print "Analyzed ",i," cross-reference records"
    xref_record = xref[index]
    title_id = xref_record[0]
    pub_id = xref_record[1]
    title_type = titles.get(title_id , "bad key")
    pub_type = pubs.get(pub_id , "bad key")
    if (title_type == "bad key"):
        bad_xref = bad_xref + 1
        continue
    if (pub_type == "bad key"):
        bad_xref = bad_xref + 1
        continue
    if (title_type == "CHAPTERBOOK"):
        mismatches["chapterbooks"] = mismatches.get("chapterbooks" , 0) + 1
        continue
    if pub_type == title_type:
        continue
    if (pub_type == "None") or (pub_type == ""):
        mismatches["missing publication type"] = mismatches.get("missing publication type" , 0) + 1
        continue
    if (title_type == "ESSAY") or (title_type == "REVIEW") or (title_type == "COVERART") or (title_type == "INTERIORART") or (title_type == "BACKCOVERART") or (title_type == "INTERVIEW"):
        continue
    if title_type == "OMNIBUS":
        if pub_type != "OMNIBUS":
            mismatches["omnibuses"] = mismatches.get("omnibuses" , 0) + 1
        continue
    if (title_type == "ANTHOLOGY") and (pub_type != "ANTHOLOGY"):
        mismatches["anthologies"] = mismatches.get("anthologies" , 0) + 1
        continue
    if (title_type == "EDITOR"):
        if (pub_type != "MAGAZINE") and (pub_type != "FANZINE") :
            mismatches["editor records"] = mismatches.get("editor records" , 0) + 1
        continue
    if (title_type == "SERIAL"):
        if (pub_type != "MAGAZINE") and (pub_type != "FANZINE") :
            mismatches["serials"] = mismatches.get("serials" , 0) + 1
            exceptions["serials", (title_id , pub_id)] = ""
        continue
    if (title_type == "COLLECTION") and (pub_type != "COLLECTION"):
        mismatches["collections"] = mismatches.get("collections" , 0) + 1
        continue
    if (title_type == "NOVEL"):
        if pub_type == "NOVEL":
            continue
        if pub_type == "OMNIBUS":
            continue
        if (pub_type == "MAGAZINE") or (pub_type == "FANZINE"):
            mismatches["novel/magazines"] = mismatches.get("novel/magazines" , 0) + 1
            continue
        mismatches["novels (other)"] = mismatches.get("novels (other)" , 0) + 1
        continue
    if (title_type == "SHORTFICTION"):
        if (pub_type == "NONFICTION") or (pub_type == "NOVEL"):
            mismatches["short fiction"] = mismatches.get("short fiction" , 0) + 1
        continue
    if (title_type == "POEM"):
        if (pub_type == "NONFICTION"):
            mismatches["poems"] = mismatches.get("poems" , 0) + 1
        continue
    if (title_type == "NONFICTION"):
        if (pub_type != "NONFICTION"):
            mismatches["nonfiction"] = mismatches.get("nonfiction" , 0) + 1
        continue
    if (title_type == "NONGENRE"):
        if (pub_type != "NOVEL"):
            mismatches["nongenre"] = mismatches.get("nongenre" , 0) + 1
        continue
    mismatches["uncategorized"] = mismatches.get("uncategorized" , 0) + 1
    #print title_data[0][1],title_type," vs. ",pub_data[0][1],pub_type

print "Bad title records: ",bad_title
print "Bad publication records: ",bad_pub
print "Bad cross-references: ",bad_xref
for key in exceptions:
    title_id = key[1][0]
    pub_id = key[1][1]
    response = title_cursor.execute("select title_ttype,title_title from titles where title_id = "+str(title_id)+";")
    row1 = title_cursor.fetchall()
    response = pubs_cursor.execute("select pub_ctype,pub_title,pub_tag from pubs where pub_id = "+str(pub_id)+";")
    row2 = pubs_cursor.fetchall()
    print "|-"
    print "|[http://www.isfdb.org/cgi-bin/title.cgi?"+str(title_id)+" "+str(row1[0][1])+"]"
    print "|"+str(row1[0][0])
    print "|[http://www.isfdb.org/cgi-bin/pl.cgi?"+str(row2[0][2])+" "+str(row2[0][1])+"]"
    print "|"+str(row2[0][0])

for key in mismatches:
    print "Mismatched",key,":",mismatches.get(key , 0)

db.close()
Ahasuerus 23:45, 18 Sep 2007 (CDT)
Some of the title appear to be correct. The titles have appeared in serial form in collections and anthologys first, but may not have been published as complete novels. So there listing as a serial maybe correct, they just have not been published in the usual magazine form.Kraang 06:12, 20 Sep 2007 (CDT)
The point about sorting by title is well taken, I will add it next when I have a moment. I will also add a Status column, where we could enter "FIXED" as we update the data. Ahasuerus 23:44, 19 Sep 2007 (CDT)
The table has been resorted by title and a "Fixed?" column has been added. However, my attempts to teach myself Python turned out to be more painful than expected (something to do with old dogs and new tricks), so I ended up converting the MySQL database to a more familiar format and parsing it using vintage 1960s technologies. I'll attack other type mismatches shortly. Ahasuerus 23:27, 29 Sep 2007 (CDT)