Difference between revisions of "ISFDB:Data Consistency"

From ISFDB
Jump to navigation Jump to search
(→‎Pseudonym consistency: Disbled in the software; cleanup report exists; deleting)
 
(46 intermediate revisions by 7 users not shown)
Line 6: Line 6:
 
* [[ISFDB:Invalid characters in Publication titles]] 14 as of 2007-11-27. [[User:Ahasuerus|Ahasuerus]] 19:36, 9 Dec 2007 (CST)
 
* [[ISFDB:Invalid characters in Publication titles]] 14 as of 2007-11-27. [[User:Ahasuerus|Ahasuerus]] 19:36, 9 Dec 2007 (CST)
  
===Malformed ISBNs===
 
* [[ISFDB:Bad ISBN List]] - Need a script to find ISBNs that fail [http://en.wikipedia.org/wiki/ISBN ISBN consistency validation].
 
  
As of 2006-12-18, 162 ISBNs failed consistency validation and 73 failed length validation. As of 2007-11-27, the data (excluding magazines, which are handled separately) looked very different:
 
  
Valid ISBN - 86910 (83.95%)
 
Catalog ID - 6608 (6.38%)
 
No ISBN - 7955 (7.68%)
 
B000 (from Amazon.com) - 1019 (0.98%)
 
ISBN is not 10 digits - 836 (0.81%)
 
Fails checksum validation - 200 (0.19%)
 
  
The difference is presumably due to a more granular validation script. The actual Wikified lists of bad records will be posted shortly. [[User:Ahasuerus|Ahasuerus]] 22:21, 9 Dec 2007 (CST)
 
  
:ISBN fields that don't start with '#', and aren't 10 digits long" This section of the malformed ISBN field is done, but it should probably be checked again to see what was missed or whats new.[[User:Kraang|Kraang]] 20:47, 3 Jan 2008 (CST)
 
  
::Thanks, I'll run the validation script again early next week. I will have to modify it first to allow for 13 digit ISBNs. [[User:Ahasuerus|Ahasuerus]] 20:56, 3 Jan 2008 (CST)
 
  
:::When you do that can you do the Stray Pubs. again?[[User:Kraang|Kraang]] 21:04, 3 Jan 2008 (CST)
 
  
::::I think that was Bill's SQL script, the one that he linked below, wasn't it? If so, he can run it any time while I won't be able to run any scripts until next week. [[User:Ahasuerus|Ahasuerus]] 21:09, 3 Jan 2008 (CST)
 
 
===Duplicate ISBNs===
 
*Need a script to find duplicate ISBNs.
 
 
:: This query: <pre>select pub_id, pub_isbn, count(pub_isbn), pub_title from pubs group by pub_isbn having count(pub_isbn) > 1;</pre> returns 3710 rows, although some of them are non-ISBN duplicates such as "None" or "no ISBN".
 
:: But since the ISFDB captures multiple printings in multiple publication records, not all duplicate ISBNs are errors. This query finds publications with the same ISBN but different titles:<pre>select pub_id, pub_isbn, count(pub_isbn), pub_title from pubs group by pub_isbn having count(distinct pub_title) > 1;</pre>and returns 657 rows at the moment. Looking at the results of this query, more of them seem to be true duplicates.
 
 
Improved SQL excluding magazines, "ISBNs" flagged as not being true iSBNs by a '#' prefix, and trying to remove variants where one version is just a subset of another (e.g. "Title" as opposed to "Title, Series Name" or "Series Name: Title") gives 315 duplicate titles currently. <br>
 
<pre>
 
  select pub_isbn, MIN(pub_title), MAX(pub_title), count(*)
 
  from pubs
 
  where pub_isbn IS NOT NULL
 
  and  pub_isbn NOT LIKE '%Unknown%'
 
  and  pub_isbn NOT LIKE '%No ISBN%'
 
  and  pub_isbn NOT LIKE '%N/A%'
 
  and  pub_isbn NOT LIKE '%None%'
 
  and  pub_isbn NOT LIKE '#%'
 
  and  pub_isbn  != ''
 
  and  pub_ctype != 'MAGAZINE'
 
  group by pub_isbn
 
  having count(distinct pub_title) > 1
 
  AND INSTR(MIN(pub_title), MAX(pub_title)) = 0
 
  AND INSTR(MAX(pub_title), MIN(pub_title)) = 0;
 
</pre>
 
 
Many are just minor punctuation differences. The following may be worth a closer look though:<br>
 
<pre>
 
2253143 The Ancient Trash
 
6470238 The Gap into Ruin: This Day All Gods Die The Runes of Sorcery
 
26151707 Prisoners of Power Roadside Picnic / Tale of the Troika
 
30624363 Labyrinth The Dark Crystal
 
60083611 The Giant Surprise The Giants, the Mice and the Marsh-Wiggles (Step into Narnia)
 
60133821 Nebula Winners Fourteen The Great Science Fiction Series
 
99115913 The Antelope Company at Large The Secret Visitors Fight Back
 
140056513 Heroes and Villains The Infernal Desire Machines of Doctor Hoffman
 
246137762 Starborne The Wild
 
312198930 Greetings, Carbon-Based Bipeds: Collected Works 1934-1998 Greetings, Carbon-Based Bipeds: Collected Works      1944-1998
 
312857586 Ender's Shadow Shadow of the Giant
 
330250337 Decade the 1940s Decade the 1950s
 
330375660 Silver Screen Storming the Bastille
 
345300378 A Fine and Private Place The Last Unicorn
 
345439031 The Witch Queen The Witch's Honour
 
370328280 The Spook's Apprentice The Spook's Secret
 
373625375 Deathlands: Demons of Eden Nightmare Passage
 
373720483 Brandyjack Rebels of Merka
 
380007568 Bring the Jubilee Strange Relations
 
380761602 Cold Chills Cold Shocks
 
038513679X In Joy Still Felt: The Autobiography of Isaac Asimov, 1954-1978 In Memory Yet Green: The Autobiography of Isaac Asimov,  1920-1954
 
399124098 The Dancers of Arun The Northern Girl
 
425032116 Callahan's Crosstime Saloon Farewell to Yesterday's Tomorrow
 
425098982 Playmates The Maddening
 
426064453 The Masks of Time Vornan-19
 
439176824 Such a Pretty Face The Fall
 
441006973 Forever Free The Forever War
 
441289037 The Girl from Farris's The Girl From Hollywood
 
441777678 The Multiple Man The Space Swimmers
 
441873464 Galactic Bounty War World
 
445206187 Hot Sky at Midnight Star of Gypsies
 
446607924 Courage of Falcons (The Secret Texts - Book 3) The End of Magic
 
451145747 Of Kings and Quests Of Quests and Kings
 
517551845 Men, Martians and Machines The Joy Makers
 
553229281 Lord Valentine's Castle Majipoor Chronicles
 
553292463 Foundation and Empire The Trinity Paradox
 
553562428 Illusion Reluctant Voyagers
 
583119999 The King of the Swords The Knight of the Swords
 
586043136 The Grain Kings The Passing of the Dragons
 
586049681 Homeworld Wheelworld
 
059303564X Star Wars: The Truce at Bakura The Courtship of Princess Leia
 
593045580 The Kissed The Standing Dead
 
671015443 Vulcan's Forge Vulcan's Heart
 
671461494 Hitchhiker's Guide to Europe: The Galaxy The Hitchhiker's Guide to the Galaxy
 
671603833 My Friend, the Vampire The Ordinary Princess
 
671653938 The Bug Life Chronicles Tower to the Sky
 
067169801X The Bug Life Chronicles Tower to the Sky
 
701167181 Angel Times Merrick: A Novel
 
708882633 Madbond The Burning Realm
 
712650504 Firebird The Venging
 
712650512 Firebird The Venging
 
722159048 Dragonflight Dragonquest
 
739407929 Journeys of the Catechist The Sarantine Mosaic
 
747253145 Mirage Tides
 
749710152 After the Plague Come Lucky April
 
812551257 Chaos in Lagrangia The Lagrangists
 
821732684 Stones Voodoo Fury
 
825695716 The Drowned World When the Sky Burned
 
861402030 The Light Fantastic The Wizards & the Warriors
 
884043797 L. Ron Hubbard Presents Writers of the Future, Volume V Prosthetic Lady
 
886773520 Magic's Pawn Magic's Promise
 
886778611 First Rider's Call The Survivors
 
088733010X Lyonesse II: the Green Pearl Stitch in Snow
 
088733055X Bitter Ends Last Rites
 
909106118 An Unusual Angle Dreamworks: Strange New Stories
 
937986585 The Sea of the Ravens Yellow Men Sleep
 
940841320 Seeker's Mask Wizard of the Pigeons
 
963094440 Now We Are Sick: An Anthology of Nasty Verse Temporary Walls
 
1557420262 George Alec Effinger Stanislaw Lem
 
1565041585 The Road to Science Fiction 5: The British Way The Road to Science Fiction 6: Around the World
 
158846864X Arcana: Unearthed Children of the Ruin Children of the Rune: Tales From the Land of the Diamond Throne
 
1593150296 Everquest II (Everquest) Ocean of Tears
 
1596871032 Green Lantern: Sleepers: Book 2 Green Lantern: Sleepers: Book 3
 
1841190802 The Mammoth Book of Awesome Comic Fantasy The Mammoth Book of Historical Whodunnits: Brand New Collection
 
184149108X Maul Y: A Chromosome
 
1844351084 A Life in Pieces The Purpura Pawn (Professor Bernice Summerfield Series)
 
1857231082 The Black Unicorn Wizard at Large
 
1857231848 Treaty at Doona Treaty Planet
 
1857982371 The Black Gryphon White Gryphon
 
1857984382 Diaspora Prince of Demons
 
1880325071 From Hell The Beast That Shouted Love at the Heart of the World
 
1885865082 Doing Time Erotica Vampirica
 
1887666001 Blood Wedding Vampire Dreams
 
</pre>
 
[[User:BLongley|BLongley]] 13:23, 10 Dec 2007 (CST)
 
: [http://www.isfdb.org/cgi-bin/pl.cgi?YCHRMSM2003 Y: A Chromosome] could be a working title of [http://www.isfdb.org/cgi-bin/pl.cgi?MLKWDNFZHT2004 Maul]. And tp of ''Maul'' was published 2003. No info about ''Y: A Chromosome'' in ''Maul''? --[[User:Roglo|Roglo]] 13:48, 12 Jan 2008 (CST)
 
 
:: Not really, although Y Chromosomes are relevant to the plot. [http://whitescreenofdespair.blogspot.com/2003_10_01_archive.html This supports] the "working title" hypothesis though. [[User:BLongley|BLongley]] 15:15, 12 Jan 2008 (CST)
 
 
===Stray Publications===
 
*Need a script that finds "novel" Publications whose Author(s) do not match the Author(s) in the associated Title record.
 
* [[ISFDB:Missing Titles]] - Need a script that finds Publications without an associated Title record.
 
*Titles A-Z Fixed(only stray magazines left) (30 Dec 2007) [[User:Kraang|Kraang]] 18:15, 30 Dec 2007 (CST)
 
: See also [http://www.isfdb.org/wiki/index.php/Talk:Database_Schema#Let.27s_try_it_from_the_other_side here] for related SQL scriptiness. [[User:BLongley|BLongley]] 15:01, 10 Dec 2007 (CST)
 
 
===Type mismatches===
 
*Need a script that finds book length Publications whose type doesn't match the type of the associated Title records, e.g. "collection" vs. "novel", "novel" vs. "omnibus", etc.
 
  
 
===Missing data===
 
===Missing data===
 
*Need a script that finds missing data, e.g. missing page counts, missing pb/tp/hc data, 0000-00-00 dates, etc.
 
*Need a script that finds missing data, e.g. missing page counts, missing pb/tp/hc data, 0000-00-00 dates, etc.
  
===Duplicate Tags===
 
 
A script that finds tags in use by more than one publication:
 
 
<pre>
 
select pub_tag, count(*)
 
from pubs
 
group by pub_tag having count(pub_tag) > 1;
 
</pre>
 
 
: Now appear to be 100% fixed - yay! [[User:BLongley|BLongley]]
 
  
:: Broken again - would a magazine editor please look at "The Most Thrilling Science Fiction Ever Told, Summer 1969"? We appear to have 3 editor records for two pubs sharing the same tag. [[User:BLongley|BLongley]] 09:30, 12 Jan 2008 (CST)
 
  
:::Good point, thanks! Michael and I are working on ''The Most Thrilling Science Fiction Ever Told'' and hope to have the whole run cleaned up by the end of January. [[User:Ahasuerus|Ahasuerus]] 11:27, 14 Jan 2008 (CST)
 
  
Results from backup January 13, 2008:
 
<pre>
 
+------------------+----------+
 
| pub_tag          | count(*) |
 
+------------------+----------+
 
| THRLLNGSFSUM1969 |        2 |
 
+------------------+----------+
 
</pre>
 
  
The 3 editor records for this tag have been merged.  Not sure how that happened, except that when I was creating stubs for later entry of contents I may have duplicated the tag that I created for this particular issue in two other issues.  I check to see if all of the other issues have editor records and that they all have individual tags. [[User:Mhhutchins|Mhhutchins]] 12:21, 14 Jan 2008 (CST)
 
  
===Duplicate title-publication records===
 
 
According to this script, the recent database dump has 136 cases of a title being in the same publication more than once. In most cases the page numbers of both entries are NULL but in a few cases the page numbers are distinct.
 
<pre>
 
select a.pubc_id, b.pubc_id, a.pub_id, a.title_id, a.pubc_page, b.pubc_page
 
    from pub_content a, pub_content b
 
    where a.pub_id = b.pub_id and a.pubc_id < b.pubc_id and a.title_id = b.title_id
 
    order by a.title_id;
 
</pre>
 
[[User:WimLewis|WimLewis]] 16:59, 25 Mar 2007 (CDT)
 
 
: I tried correcting one of these through the front-end: unfortunately TitleRemove takes out BOTH copies and you have to manually re-add the other. I think this is best resolved in SQL, especially as it doesn't seem to destroy the presentation much - you can lose a few page numbers, e.g. [http://www.isfdb.org/cgi-bin/pl.cgi?THTRMNLBCH1974 try viewing and editing this pub]. [[User:BLongley|BLongley]] 09:26, 9 Apr 2007 (CDT)
 
 
::I was fighting recently with such publication [http://www.isfdb.org/cgi-bin/pl.cgi?RTOSFBW1998 Road to Science Fiction 5] where two essays are listed under the same title (I added '[#2]' to the second title. So I started to look for similar cases and later found this section.
 
<pre>
 
select distinct p.pub_id, left(p.pub_title, 36) pub, left(t.title_title, 36) title,
 
        t.title_ttype ttype, pc1.pubc_page page1, pc2.pubc_page page2
 
    from pub_content pc1, pub_content pc2, titles t, pubs p
 
    where pc1.title_id = pc2.title_id and pc1.pub_id = pc2.pub_id and pc1.pubc_id < pc2.pubc_id
 
        and t.title_id = pc1.title_id and p.pub_id = pc1.pub_id order by 2;
 
</pre>
 
Note that if page1 = page2, you will see the title on publication listing as only one. To see the two instances, you have to open it for editing. If page1 != page2, you will two identical titles on pub's listing, but when you open this pub for editing, the page numbers will be reset to the same page number for both titles (and so one instance will disappear from the listing after changes are submitted and accepted).
 
* If you open such 'repeated title's' bibliography, you'll see the publication listed twice.
 
* You can't remove a single instance of such title from a pub; when you enter 'Remove titles from pub' and check only one for removal, they will be removed both. You have to remember to re-add one instance of the title again.
 
* We can't do much about duplicated EDITOR records (e.g. ''Pulphouse: The Hardback Magazine''), as they are not shown in pubs editor.
 
--[[User:Roglo|Roglo]] 11:12, 14 Jan 2008 (CST)
 
 
:You can fix/delete Editor records, but it's a little tricky. First you have to change the Title Type from EDITOR to something else -- I usually change them to POEM to make it obvious -- and then you can Remove Title and Delete Title. Admittedly, it's rather painful if you can't approve your own submissions. [[User:Ahasuerus|Ahasuerus]] 11:30, 14 Jan 2008 (CST)
 
 
::I've found the easier way to remove a duplicate content, is to just delete it and create a new entry.  Just have to remember to merge the new record with any pre-existing records. As for duplicate pubs, some of them I'm cloning and then deleting the original record. [[User:Mhhutchins|Mhhutchins]] 15:15, 23 Jan 2008 (CST)
 
 
<pre>
 
+--------+--------------------------------------+--------------------------------------+---------+-------+-------+
 
| pub_id | pub                                  | title                                | ttype  | page1 | page2 |
 
+--------+--------------------------------------+--------------------------------------+---------+-------+-------+
 
|  56490 | Amazing Stories, March 1934          | Editorial: Progress in Material Econ | ESSAY  | NULL  | NULL  |
 
| 124121 | Count Brass                          | Count Brass                          | OMNIBUS | NULL  | NULL  |
 
|  60092 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60096 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60099 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60094 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60093 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60097 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60100 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60095 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60098 | Pulphouse: The Hardback Magazine Iss | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60102 | Pulphouse: The Hardback Magazine: Is | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
|  60101 | Pulphouse: The Hardback Magazine: Is | Pulphouse: The Hardback Magazine - 1 | EDITOR  | NULL  | NULL  |
 
+--------+--------------------------------------+--------------------------------------+---------+-------+-------+
 
</pre>
 
 
===Dangling title-publication records===
 
 
Likewise, there are many entries in the pub_content table whose pub_id or title_id is either NULL or refers to a nonexistent record:
 
 
<pre>
 
select * from pub_content pc where not exists (select * from pubs pu where pc.pub_id = pu.pub_id);
 
select * from pub_content pc where not exists (select * from titles ti where pc.title_id = title_id);
 
</pre>
 
[[User:WimLewis|WimLewis]] 16:59, 25 Mar 2007 (CDT)
 
 
Results from backup January 13, 2008:
 
747 rows; 90 rows
 
 
===Disallowed URLs===
 
 
See [[ISFDB:Data Consistency/Disallowed URLs]].
 
  
 
==Titles==
 
==Titles==
===Date mismatches===
 
*Need a script that finds Titles whose date doesn't match the date of the first published Publication record associated with the Title.
 
  
:Too many, so I broke it down. Even this list of titles where there's a publication of a title in a year prior to the title year (so that we don't get problems with title year of XXXX-00-00) yields over 600 results.
 
select t.title_id, t.title_title, MIN(pub_year), title_copyright
 
from pubs p, pub_content pc, titles t
 
where p.pub_id = pc.pub_id
 
and  pc.title_id = t.title_id
 
and pub_year != '0000-00-00'
 
group by t.title_id, t.title_title
 
HAVING YEAR(MIN(pub_year)) < YEAR(title_copyright)
 
order by 2;
 
:Unfortunately they'll have to be manually corrected: e.g. [http://www.isfdb.org/cgi-bin/pl.cgi?WLVSFTHCLL1993 this looks like a duff pub date] rather than a duff title date. [[User:BLongley|BLongley]] 13:29, 9 Apr 2007 (CDT)
 
  
Results from backup January 27, 2008:
 
949 rows in set (14.42 sec)
 
Using slightly more liberal approach: HAVING YEAR(MIN(pub_year)) < YEAR(title_copyright)-1
 
467 rows in set (13.48 sec)
 
  
But I think we should start with this one:
 
<pre>
 
select pub_id, pub_title, pub_year from pubs where pub_year != '0000-00-00' and YEAR(pub_year) < 1700 ;
 
  
+--------+-----------------------------------------+------------+
 
| pub_id | pub_title                              | pub_year  |
 
+--------+-----------------------------------------+------------+
 
| 251425 | Timeless Stories for Today and Tomorrow | 0197-08-00 |
 
+--------+-----------------------------------------+------------+
 
1 row in set (0.20 sec)
 
</pre>
 
Link: [http://www.isfdb.org/cgi-bin/pl.cgi?251425 Timeless Stories for Today and Tomorrow] (at least the title is relevant...) --[[User:Roglo|Roglo]] 13:43, 28 Jan 2008 (CST)
 
(Timeless Stories...)FIXED
 
And the same for titles (but now we have quite a few valid XV-XVII century titles)
 
<pre>
 
mysql> select title_id, title_title, title_copyright, title_ttype from titles where title_copyright != '0000-00-00' and YEAR(title_copyright) < 1400 ;
 
+----------+------------------------------------+-----------------+--------------+
 
| title_id | title_title                        | title_copyright | title_ttype  |
 
+----------+------------------------------------+-----------------+--------------+
 
|    97038 | Timaios (Excerpt)                  | 0001-00-00      | SHORTFICTION |
 
|  122930 | The Amazons                        | 0500-00-00      | ESSAY        |
 
|  346941 | Cover: The Heidelberg Cylinder    | 0200-10-00      | COVERART    |
 
|  358191 | Cover: A Nomad of the Time Streams | 0199-00-00      | COVERART    |
 
|  361501 | Cover: The Dispossessed            | 0071-00-00      | COVERART    |
 
|  535853 | Cover: Strange Wine                | 0197-06-00      | COVERART    |
 
+----------+------------------------------------+-----------------+--------------+
 
6 rows in set (0.27 sec)
 
</pre>
 
Probably pub dates were fixed and these covers were left behind. Links:
 
* [http://www.isfdb.org/cgi-bin/title.cgi?346941 Cover: The Heidelberg Cylinder] FIXED
 
* [http://www.isfdb.org/cgi-bin/title.cgi?358191 Cover: A Nomad of the Time Streams] FIXED
 
* [http://www.isfdb.org/cgi-bin/title.cgi?361501 Cover: The Dispossessed] FIXED
 
* [http://www.isfdb.org/cgi-bin/title.cgi?535853 Cover: Strange Wine] FIXED
 
  
===Titles that point at themselves===
 
I just looked up a Gardner Dozois collection and most of the stories said "by Gardner Dozois as by [Gardner Dozois]." Looking into this found many titles where the parent field was pointing at itself. It was easy enough to fix but I suspect there are others lurking out there.  While "make variant" will no longer let you point a title to itself I suspect you can still get this if you merge two titles and let the parent title pointer stay. [[User:Marc Kupper|Marc Kupper]] 22:25, 22 Dec 2006 (CST)
 
 
: Like these? There don't seem too many. What is the "easy enough" fix?
 
select title_id, title_title from titles where title_parent = title_id;
 
+----------+--------------------------------+
 
| title_id | title_title                    |
 
+----------+--------------------------------+
 
|    63117 | Command Performance            |
 
|    63296 | Earth, Farewell!              |
 
|    92579 | Werewolves in Sheep's Clothing |
 
+----------+--------------------------------+
 
[[User:BLongley|BLongley]] 13:41, 9 Apr 2007 (CDT)
 
 
:The second and third were easy to fix. Just removed the variant using "0" as the parent title.  The first has another variant involved, so I have no idea how to fix it. [[User:Mhhutchins|Mhhutchins]] 21:22, 11 Jan 2008 (CST)
 
 
Results in the 2008-01-13, 2008-01-27, 2008-02-03 and 2008-02-10 backups: Empty set
 
 
===Titles that point at variants===
 
A variation on the ''Titles that point at themselves'' theme: variants where the parent is also variant (these include variants that point at themselves).
 
select t.title_id as id, left(t.title_title, 40) as title, if(t.title_id=tp.title_id, 'SAME AS', ' ') same,
 
        tp.title_id as pid, left(tp.title_title, 40) as parent from titles t, titles tp
 
    where tp.title_id = t.title_parent and tp.title_parent != 0;
 
Results in the January 6, 2008 backup:
 
33 rows in set
 
(''Earth, Farewell!'' and ''Werewolves in Sheep's Clothing'' were fixed by Mhhutchins after the backup's date) --[[User:Roglo|Roglo]] 03:47, 12 Jan 2008 (CST)
 
::Some of these variants exist so that series with only short stories will appear on the authors summary page such as this one created for me "| 629713 | Kéthani Stories                          | SAME AS | 629713 | Kéthani ". At the moment series without a novel will not display.[[User:Kraang|Kraang]] 07:16, 12 Jan 2008 (CST)
 
:::Amazing! So we have to careful with stories belonging to series. BTW perhaps it's time to add [http://www.amazon.com/K%C3%A9thani-Eric-Brown/dp/184416473X Kéthani book] to ISFDB. --[[User:Roglo|Roglo]] 10:24, 12 Jan 2008 (CST)
 
::::The author's web page mentioned this book as a future publication. Once it's in the data base the variant novel of itself can be deleted.[[User:Kraang|Kraang]] 20:16, 12 Jan 2008 (CST)
 
:::::Most of these are variants created by the use of pseudonyms, so it's natural that the title would be the same.  I fixed some of them that were variants of variants (the title changed as well as the authorship, i.e. I at the Keyhole and The Dawning Shadow.) [[User:Mhhutchins|Mhhutchins]] 23:25, 12 Jan 2008 (CST)
 
::::::Problem with such variants is that bibliography of the top-level title won't show publications of a variant of a variant. --[[User:Roglo|Roglo]] 11:44, 14 Jan 2008 (CST)
 
 
Results in the January 13, 2008 backup:
 
<pre>
 
+--------+------------------------------------------+------+--------+------------------------------------------+
 
| id    | title                                    | same | pid    | parent                                  |
 
+--------+------------------------------------------+------+--------+------------------------------------------+
 
|  10336 | Isaac Asimov: The Foundations of Science |      | 205681 | Isaac Asimov: The Foundations of Science |
 
| 187058 | The English at the North Pole (UK 1874)  |      |  11766 | Les adventures du Capitaine Hatteras    |
 
| 187449 | Earthman, Go Home!                      |      |  62699 | A Plague of Masters                      |
 
| 188881 | The Servants of Twilight                |      |  15998 | The Servants of Twilight                |
 
| 189387 | The Galactiad                            |      |  23185 | The Galactiad (1983 US)                  |
 
| 190807 | Name Your Pleasure                      |      |  68751 | Name Your Pleasure                      |
 
| 190915 | The Man from Beyond                      |      |  71427 | The Man from Beyond                      |
 
| 192624 | Industrial Revolution                    |      |  96831 | Industrial Revolution                    |
 
| 206311 | The Immortals (+1 story)                |      | 206391 | The Immortals (+1 story)                |
 
| 530209 | Thou Good and Faithful                  |      | 190322 | Thou Good and Faithful                  |
 
| 414851 | 2001: A Space Odyssey                    |      |  97665 | Sentinel of Eternity                    |
 
| 529009 | Playing Santa Clause                    |      | 191114 | Playing Santa Clause                    |
 
| 501841 | The Devil of East Lupton                |      | 191217 | The Devil of East Lupton, Vermont        |
 
| 529005 | The Destiny Gorilla                      |      | 191113 | The Destiny Gorilla                      |
 
| 528993 | The Yellow Cobra                        |      | 191110 | The Yellow Cobra                        |
 
| 528997 | In High Society                          |      | 191111 | In High Society                          |
 
| 528989 | The Mandarin Ruby                        |      | 191109 | The Mandarin Ruby                        |
 
| 529001 | Playing Journalist                      |      | 191112 | Playing Journalist                      |
 
| 529013 | The Turkish Menace                      |      | 191115 | The Turkish Menace                      |
 
| 528981 | The Alleys of Singapore                  |      | 191107 | The Alleys of Singapore                  |
 
| 528985 | The Jade Monkey                          |      | 191108 | The Jade Monkey                          |
 
| 623589 | Ullr Uprising                            |      | 540473 | Ullr Uprising                            |
 
| 685885 | Last Laugh                              |      | 187587 | Last Laugh                              |
 
| 589469 | Beyond the Barrier of Space              |      | 187173 | Beyond the Barrier of Space              |
 
| 592761 | Appendix: "Return..." A Preface          |      | 118825 | Appendix: "Return..." A Preface          |
 
| 759881 | Heavyplanet                              |      | 577597 | Heavyplanet                              |
 
| 833677 | Martian Spaceships Invade New York      |      | 193655 | New York Invaded                        |
 
+--------+------------------------------------------+------+--------+------------------------------------------+
 
27 rows in set (0.61 sec)
 
</pre>
 
 
:I believe I've fixed most of these variants of variants, but please do another check with the next backup.  Thanks. [[User:Mhhutchins|Mhhutchins]] 17:54, 14 Jan 2008 (CST)
 
Results from backup January 20, 2008:
 
<pre>
 
+--------+--------------------+------+--------+--------------------+
 
| id    | title              | same | pid    | parent            |
 
+--------+--------------------+------+--------+--------------------+
 
| 190807 | Name Your Pleasure |      |  68751 | Name Your Pleasure |
 
| 623589 | Ullr Uprising      |      | 540473 | Ullr Uprising      |
 
| 755753 | Join Now          |      | 187646 | Join Now          |
 
+--------+--------------------+------+--------+--------------------+
 
3 rows in set (3.78 sec)
 
</pre>
 
 
::Fixed (fingers crossed!) [[User:Mhhutchins|Mhhutchins]] 10:16, 23 Jan 2008 (CST)
 
Results from backup January 27, 2008:
 
Empty set (1.09 sec)
 
Results from backup February 03, 2008: Empty set
 
 
===Titles which are variants of non-existent titles===
 
I'm not sure if this should be here or on the [[ISFDB:Repair_Variants]] page (or somewhere else) but we have such amazing titles:
 
 
select title.title_id, title.title_title, title.title_ttype, title.title_parent
 
  from titles title left outer join titles parent on title.title_parent = parent.title_id
 
  where title.title_parent <> 0 and isnull(parent.title_id) ;
 
 
+----------+----------------------------------------------+--------------+--------------+
 
| title_id | title_title                                  | title_ttype  | title_parent |
 
+----------+----------------------------------------------+--------------+--------------+
 
|  188868 | Psi-Man                                      | NOVEL        |        15726 |
 
|  188911 | Demon Sword                                  | NOVEL        |        17547 |
 
|  188973 | Ship of Madness                              | NOVEL        |        19511 |
 
|  189427 | The Aquilliad                                | NOVEL        |        25556 |
 
|  189430 | The Throne of Madness: The Inquestor Trilogy | NOVEL        |        25650 |
 
|  187205 | From Outer Space                            | NOVEL        |        33294 |
 
|  189496 | There Will Be War                            | ANTHOLOGY    |        34841 |
 
|  189512 | Aftermath: Thieves’ World                    | ANTHOLOGY    |        36679 |
 
|  189560 | The Aquiliad                                | COLLECTION  |        39607 |
 
|  187369 | The Beach Where Time Began                  | SHORTFICTION |        57067 |
 
|  187550 | The Best of Times                            | SHORTFICTION |        64884 |
 
|  190728 | The Ultimate Wish                            | SHORTFICTION |        67642 |
 
|  187612 | The Immortals                                | SHORTFICTION |        68738 |
 
|  187839 | Dead on Sunday                              | NONGENRE    |      110723 |
 
|  193085 | Deservedly Dead                              | NONGENRE    |      110728 |
 
|  193086 | Death and the Delinquent                    | NONGENRE    |      110729 |
 
|  193087 | Death Served Up Cold                        | NONGENRE    |      110730 |
 
|  193088 | A Ceremonial Death                          | NONGENRE    |      110731 |
 
|  193089 | Here's to the Newly Dead                    | NONGENRE    |      110732 |
 
|  187850 | The Book of Andre Norton                    | COVERART    |      144456 |
 
|  187851 | Definitely Maybe                            | COVERART    |      146385 |
 
|    99435 | The Vacant Lot                              | SHORTFICTION |      208521 |
 
+----------+----------------------------------------------+--------------+--------------+
 
22 rows in set (0.17 sec)
 
 
Links:                                                         
 
 
Looks like they are never displayed on their author's page, because: 1) they are variants, 2) they don't have a parent to be displayed next to... So chances are they were added again. --[[User:Roglo|Roglo]] 16:37, 10 Jan 2008 (CST)
 
:All have been fixed. Most were variants because of pseudonyms (Somtow=Sucharitkul, Lindholm=Hobb, Tepper=Oliphant, etc.), but I'm not sure how the variant links between them were broken.  So I removed them as variants (used "0" as the parent record), then started all over again by creating a new variant, then merging them with the title record under the author's canonical name. Thanks to Roglo for the script that uncovered them. [[User:Mhhutchins|Mhhutchins]] 17:14, 10 Jan 2008 (CST)
 
 
Results in the January 13, 2008 backup: Empty set
 
 
Results in the January 27, 2008 backup: Empty set
 
 
Results in the February 03, 2008 backup: Empty set
 
 
===Authorless titles===
 
Two mini-quests for today: titles with authors missing. First, titles that don't match anything in the canonical_author table. I check if they have any pubs (it doesn't depend on having an author):
 
 
select tid, title_title, title_ttype, pubc_id, p.pub_id, pub_tag
 
    from (select t.title_id tid, t.title_title, t.title_ttype
 
              from titles t left outer join canonical_author ca on t.title_id = ca.title_id
 
                  where ca.title_id is null) tmp
 
          left outer join pub_content pc on pc.title_id = tmp.tid
 
          left outer join pubs p on p.pub_id = pc.pub_id;
 
 
Results in the January 6, 2008 backup, in the January 13, 2008 backup:
 
+--------+---------------------------------------------------------+-------------+---------+--------+------------+
 
| tid    | title_title                                            | title_ttype | pubc_id | pub_id | pub_tag    |
 
+--------+---------------------------------------------------------+-------------+---------+--------+------------+
 
| 281671 | The Magazine of Fantasy & Science Fiction, January 2004 | EDITOR      |  350911 | 112701 | FSFJAN2004 |
 
| 406811 | Cover: Planets of Wonder                                | COVERART    |    NULL |  NULL | NULL      |
 
| 406821 | Cover: A World Out of Time                              | COVERART    |    NULL |  NULL | NULL      |
 
| 837920 | Quotation                                              | ESSAY      |    NULL |  NULL | NULL      |
 
+--------+---------------------------------------------------------+-------------+---------+--------+------------+
 
4 rows in set (3.93 sec)
 
 
I have no idea what the first record is; the other 3 authorless and publess titles should be safe to delete. All 4 generate python errors while trying to match reviews by author and title. Anyway, links:
 
 
* [http://www.isfdb.org/cgi-bin/title.cgi?281671 The Magazine of Fantasy & Science Fiction, January 2004] EDITOR [http://www.isfdb.org/cgi-bin/pl.cgi?FSFJAN2004 Pub]
 
* [http://www.isfdb.org/cgi-bin/title.cgi?406811 Cover: Planets of Wonder] COVERART
 
* [http://www.isfdb.org/cgi-bin/title.cgi?406821 Cover: A World Out of Time] COVERART
 
* [http://www.isfdb.org/cgi-bin/title.cgi?837920 Quotation] ESSAY
 
 
And now titles which have their id in the canonical_author table, but the author_id from the canonical_author record is not an id of any existing author. This is more complicated, because you can have multiple canonical_author records for any title, and some of them may point to nonexistent authors. So I try to find titles where one of the authors doesn't exist, but display all the authors.
 
 
select t.title_id, t.title_title, t.title_ttype, ca.author_id as nonexistent, ca2.author_id, a2.author_canonical
 
    from titles t,
 
          (canonical_author ca left outer join authors a on ca.author_id = a.author_id),
 
          (canonical_author ca2 left outer join authors a2 on ca2.author_id = a2.author_id)
 
    where t.title_id = ca.title_id and t.title_id = ca2.title_id and a.author_id is null ;
 
 
Results in the January 6, 2008 backup:
 
+----------+-----------------------------+--------------+-------------+-----------+--------------------+
 
| title_id | title_title                | title_ttype  | nonexistent | author_id | author_canonical  |
 
+----------+-----------------------------+--------------+-------------+-----------+--------------------+
 
|  190423 | Hunter, Come Home          | SHORTFICTION |      11231 |    11231 | NULL              |
 
|  190423 | Hunter, Come Home          | SHORTFICTION |      11231 |    53491 | Richard M. McKenna |
 
|  192212 | The Man the Worlds Rejected | SHORTFICTION |      19035 |    19035 | NULL              |
 
|  192212 | The Man the Worlds Rejected | SHORTFICTION |      19035 |    68421 | Gordon Dickson    |
 
|  198701 | Strictly Confidential      | SHORTFICTION |      19035 |    19035 | NULL              |
 
|  431031 | Spacepaw                    | NOVEL        |      19035 |    68421 | Gordon Dickson    |
 
|  431031 | Spacepaw                    | NOVEL        |      19035 |    19035 | NULL              |
 
+----------+-----------------------------+--------------+-------------+-----------+--------------------+
 
7 rows in set (5.02 sec)
 
 
We can't really clean the canonical_author table (adding and removing authors in the title editor doesn't help) but ''Strictly Confidential'' generates python error and we can fix that by adding at least one existing author (looks like it is Gordon Dickson again?). Links to all 4 titles:
 
* [http://www.isfdb.org/cgi-bin/title.cgi?190423 Hunter, Come Home] SHORTFICTION by Richard M. McKenna and '''NULL'''
 
* [http://www.isfdb.org/cgi-bin/title.cgi?192212 The Man the Worlds Rejected] SHORTFICTION by Gordon Dickson and '''NULL'''
 
* [http://www.isfdb.org/cgi-bin/title.cgi?198701 Strictly Confidential] SHORTFICTION by '''NULL''' (the one interesting)
 
* [http://www.isfdb.org/cgi-bin/title.cgi?431031 Spacepaw] NOVEL by Gordon Dickson and '''NULL'''
 
--[[User:Roglo|Roglo]] 10:46, 11 Jan 2008 (CST)
 
 
: "Strictly Confidential" looks interesting: you can narrow that down to the pub:
 
 
  select * from pub_content pc
 
  where pc.title_id = 198701
 
 
  pubc_id title_id pub_id pubc_page
 
  251922 198701 89769 112
 
 
: And it's [http://www.isfdb.org/cgi-bin/pl.cgi?89769 here], verified, and so theoretically the verifier could tell us if it's Gordon Dickson or Gordon '''R.''' Dickson. [[User:BLongley|BLongley]] 12:47, 11 Jan 2008 (CST)
 
 
::"Strictly Confidential" fixed.  Attributed to "Gordon Dickson", so the pub record was edited, then a variant was created for "Gordon R. Dickson", and then merged with the existing title record. [[User:Mhhutchins|Mhhutchins]] 23:33, 12 Jan 2008 (CST)
 
 
2008-02-10 backup results:
 
 
<pre>
 
+--------+---------------------------------------------------------+-------------+---------+--------+------------+
 
| tid    | title_title                                            | title_ttype | pubc_id | pub_id | pub_tag    |
 
+--------+---------------------------------------------------------+-------------+---------+--------+------------+
 
| 281671 | The Magazine of Fantasy & Science Fiction, January 2004 | EDITOR  |  350911 | 112701 | FSFJAN2004 |
 
| 406811 | Cover: Planets of Wonder                                | COVERART |    NULL |  NULL | NULL      |
 
| 406821 | Cover: A World Out of Time                              | COVERART |    NULL |  NULL | NULL      |
 
| 837920 | Quotation                                              | ESSAY    |    NULL |  NULL | NULL      |
 
+--------+---------------------------------------------------------+-------------+---------+--------+------------+
 
</pre>
 
 
and
 
 
<pre>
 
+----------+-----------------------------+--------------+-------------+-----------+--------------------+
 
| title_id | title_title                | title_ttype  | nonexistent | author_id | author_canonical  |
 
+----------+-----------------------------+--------------+-------------+-----------+--------------------+
 
|  190423 | Hunter, Come Home          | SHORTFICTION |      11231 |    11231 | NULL              |
 
|  190423 | Hunter, Come Home          | SHORTFICTION |      11231 |    53491 | Richard M. McKenna |
 
|  192212 | The Man the Worlds Rejected | SHORTFICTION |      19035 |    19035 | NULL              |
 
|  192212 | The Man the Worlds Rejected | SHORTFICTION |      19035 |    68421 | Gordon Dickson    |
 
|  198701 | Strictly Confidential      | SHORTFICTION |      19035 |    68421 | Gordon Dickson    |
 
|  198701 | Strictly Confidential      | SHORTFICTION |      19035 |    19035 | NULL              |
 
|  431031 | Spacepaw                    | NOVEL        |      19035 |    68421 | Gordon Dickson    |
 
|  431031 | Spacepaw                    | NOVEL        |      19035 |    19035 | NULL              |
 
+----------+-----------------------------+--------------+-------------+-----------+--------------------+
 
</pre>
 
  
 
===Safe to auto-merge identical titles?===
 
===Safe to auto-merge identical titles?===
 
Related to the Gardner Dozois collection was that for nearly all of the titles I found had two title records that were identical other than one was a parent of the other. If it seems safe it seems it would safe some work to to do a sweep for title records that are identical and to auto-merge them. [[User:Marc Kupper|Marc Kupper]] 22:25, 22 Dec 2006 (CST)
 
Related to the Gardner Dozois collection was that for nearly all of the titles I found had two title records that were identical other than one was a parent of the other. If it seems safe it seems it would safe some work to to do a sweep for title records that are identical and to auto-merge them. [[User:Marc Kupper|Marc Kupper]] 22:25, 22 Dec 2006 (CST)
  
===EDITOR Titles===
 
  
There are many Magazine Publications that have no EDITOR Title associated with them. This makes them invisible from their Editor's biblio page. Missing EDITOR Titles need to be identified and added throughout. [[User:Ahasuerus|Ahasuerus]] 20:20, 27 Dec 2006 (CST)
 
  
Also need a script to find all non-MAGAZINE Publications that contain an EDITOR Title. [[User:Ahasuerus|Ahasuerus]] 17:22, 29 Mar 2007 (CDT)
+
=== Tags ===
  
 +
It would be nice to standardize user-generated tags.
 +
 +
====Synonyms====
  
Hmm, this SQL script: <pre>
+
-----------------------------
select * from pubs pu
+
* hard science fiction
    where pub_ctype not in ( 'MAGAZINE', 'FANZINE' ) and
+
* hard sf
          exists (select * from titles t, pub_content pc
+
-----------------------------
                      where pu.pub_id = pc.pub_id and
+
* history of sf
                            pc.title_id = t.title_id and
+
* history-of-sf
                            t.title_ttype = 'EDITOR');
+
-----------------------------
</pre> turns up only one publication, ''The Discworld Companion'', which appears to be fixed already. The inverse query turns up a couple thousand rows, but the ones I checked are also already fixed --- my db dump must be getting out of date...  --[[User:WimLewis|WimLewis]] 18:26, 29 Mar 2007 (CDT)
+
* juvenile sf
 +
* juvenile-sf
 +
-----------------------------
 +
* recursive sf
 +
* meta sf
 +
-----------------------------
 +
* 'young-adult humorous sf' -> 'humorous sf' + 'young-adult sf'
  
=== Publess REVIEWs and INTERVIEWs ===
+
etc etc
Reviews and interviews without content records. Those that have variants (e.g. pseudonymous reviews) with are not included:
 
<pre>
 
select t.title_id, t.title_title, t.title_ttype
 
    from titles t
 
        left outer join pub_content pc on t.title_id = pc.title_id
 
        left outer join titles vt on t.title_id = vt.title_parent
 
    where (t.title_ttype = 'REVIEW' or t.title_ttype = 'INTERVIEW')
 
        and pc.title_id is null and vt.title_id is null;
 
</pre>
 
(if a variant didn't have any pub_content record, it would be included on this list; query including titles having variants without pub_content records is
 
<pre>
 
select t.title_id, t.title_title, t.title_ttype
 
    from titles t
 
        left outer join pub_content pc on t.title_id = pc.title_id
 
        left outer join titles vt on t.title_id = vt.title_parent
 
        left outer join pub_content vpc on vt.title_id = vpc.title_id
 
    where (t.title_ttype = 'REVIEW' or t.title_ttype = 'INTERVIEW')
 
        and pc.title_id is null and vpc.title_id is null;
 
</pre>
 
with the same result for the latest backup). --[[User:Roglo|Roglo]] 15:02, 10 Feb 2008 (CST)
 
  
Results from backup February 03, 2008:
+
some one should probably come up with a standard then make some filters for input and run the filters on the existing tags.
<pre>
 
+----------+------------------------------------------------------------------------+-------------+
 
| title_id | title_title                                                            | title_ttype |
 
+----------+------------------------------------------------------------------------+-------------+
 
|  741225 | Test (ï)                                                              | INTERVIEW  |
 
|  103753 | Christopher Priest: The Interrogation                                  | INTERVIEW  |
 
|  505561 | World Peace Through Celluloid ...an environmentally fiendly interview? | INTERVIEW  |
 
|  510861 | The Making of a Legend                                                | INTERVIEW  |
 
|  715953 | Draconian Inheritance                                                  | INTERVIEW  |
 
|  618333 | The Worlds of Fantasy Calendar 1976                                    | REVIEW      |
 
|  163150 | The Painter King                                                      | REVIEW      |
 
|  427041 | I Robot (audio recording)                                              | REVIEW      |
 
|  318461 | The Trikon Deception0                                                  | REVIEW      |
 
|  459871 | The Star Maker                                                        | REVIEW      |
 
|  462791 | Star Wars (film soundtrack)                                            | REVIEW      |
 
|  492621 | The Petrified Planet                                                  | REVIEW      |
 
|  524457 | Conan the Conqueror: The Hyborean Age                                  | REVIEW      |
 
|  530705 | Make Room, Make Room! (Part 3 of 3)                                    | REVIEW      |
 
|  586725 | Films: A Funky, Faustian, Filmorian Fantom                            | REVIEW      |
 
|  586773 | Films: A Funky, Faustian, Filmorian Fantom                            | REVIEW      |
 
|  603109 | Tales of Mystery and Imagination                                      | REVIEW      |
 
|  633445 | Exlporers of the Infinite: Shapers of Science Fiction                  | REVIEW      |
 
|  634649 | The Best Science Fiction: 1955                                        | REVIEW      |
 
|  636541 | The Remarkable Exploits of Lancelot Biggs: Spaceman                    | REVIEW      |
 
|  645509 | The Mad Reader                                                        | REVIEW      |
 
|  645517 | The Treasury of Science Fiction Classics                              | REVIEW      |
 
|  645533 | Celestial Space, Inc.                                                  | REVIEW      |
 
|  645589 | The Best from Fantasy and Science Fiction, Fourth Series              | REVIEW      |
 
|  648513 | Introducing Astronomy                                                  | REVIEW      |
 
|  671413 | The Tale of Dan de lion                                                | REVIEW      |
 
|  689373 | Galactic Patrol                                                        | REVIEW      |
 
|  849483 | The Science Fiction Novel: Imagination and Social Criticism            | REVIEW      |
 
|  741221 | Test (ï)                                                              | REVIEW      |
 
|  761213 | Battle for the Stars                                                  | REVIEW      |
 
|  761221 | Time Is the Simplest Thing                                            | REVIEW      |
 
+----------+------------------------------------------------------------------------+-------------+
 
31 rows in set (0.72 sec)
 
</pre>
 
Some of these have awards and so probably should be left for now. Links:
 
* [http://www.isfdb.org/cgi-bin/title.cgi?741225 Test (ï)] INTERVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?103753 Christopher Priest: The Interrogation] INTERVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?505561 World Peace Through Celluloid ...an environmentally fiendly interview?] INTERVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?510861 The Making of a Legend] INTERVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?715953 Draconian Inheritance] INTERVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?618333 The Worlds of Fantasy Calendar 1976] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?163150 The Painter King] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?427041 I Robot (audio recording)] REVIEW (DELETED)
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?318461 The Trikon Deception0] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?459871 The Star Maker] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?462791 Star Wars (film soundtrack)] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?492621 The Petrified Planet] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?524457 Conan the Conqueror: The Hyborean Age] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?530705 Make Room, Make Room! (Part 3 of 3)] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?586725 Films: A Funky, Faustian, Filmorian Fantom] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?586773 Films: A Funky, Faustian, Filmorian Fantom] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?603109 Tales of Mystery and Imagination] REVIEW (DELETED)
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?633445 Exlporers of the Infinite: Shapers of Science Fiction] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?634649 The Best Science Fiction: 1955] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?636541 The Remarkable Exploits of Lancelot Biggs: Spaceman] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?645509 The Mad Reader] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?645517 The Treasury of Science Fiction Classics] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?645533 Celestial Space, Inc.] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?645589 The Best from Fantasy and Science Fiction, Fourth Series] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?648513 Introducing Astronomy] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?671413 The Tale of Dan de lion] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?689373 Galactic Patrol] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?849483 The Science Fiction Novel: Imagination and Social Criticism] REVIEW
 
* [http://www.isfdb.org/cgi-bin/title.cgi?741221 Test (ï)] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?761213 Battle for the Stars] REVIEW
 
* Fixed [http://www.isfdb.org/cgi-bin/title.cgi?761221 Time Is the Simplest Thing] REVIEW
 
  
::Fixed all but the two tests by Bill Drussaï. [[User:Mhhutchins|Mhhutchins]] 17:02, 10 Feb 2008 (CST)
 
  
==Awards==
 
* [[ISFDB:Repair_Awards]] Need a script that locates bad/missing mapping records between the awards and titles tables.
 
 
==Series==
 
 
* [[ISFDB:Data Consistency/Duplicate Series Names]] keeps track of duplicate series names
 
:All fixed as of 2008-03-23. [[User:Ahasuerus|Ahasuerus]] 22:29, 24 Mar 2008 (CDT)
 
* [[ISFDB:Data Consistency/Series Numbering Issues]] lists suspicious, missing and duplicate series numbers
 
  
 
== Authors ==
 
== Authors ==
  
=== Authors in the Author Directory that don't appear to have written anything ===
 
[[ISFDB:Authors that only exist due to reviews]]
 
I've cleared up most of the "A"s, either by correcting/regularising the name in the review, or adding the missing publication or variant. The above page I created has links to each review causing a problem - many authors can be cleared up with ONE review fix, so have away at them! [[User:BLongley|BLongley]] 13:54, 29 Dec 2007 (CST)
 
  
===Authors who are (probably) doubled===
 
I don't know how we can browse authors summary pages by ids. Advanced search gives access only to editing authors.
 
  
These have the same canonical names and their legal names are either equal or at least one legal name is NULL (so they have to be checked before merging):
 
  
select a1.author_canonical canonical, a1.author_legalname legal, count(1)
 
    from authors a1, authors a2
 
    where a1.author_canonical = a2.author_canonical and a1.author_id <> a2.author_id
 
      and (a1.author_legalname = a2.author_legalname or IsNull(a1.author_legalname) or IsNull(a2.author_legalname))
 
    group by 1, 2;
 
  
Results from backup January 27, 2008:
 
<pre>
 
+-------------+--------------+----------+
 
| canonical  | legal        | count(1) |
 
+-------------+--------------+----------+
 
| Philip Kent | NULL        |        2 |
 
| Simon Clark | NULL        |        1 |
 
| Simon Clark | Clark, Simon |        1 |
 
+-------------+--------------+----------+
 
3 rows in set (1.70 sec)
 
</pre>
 
Looks like one ''Simon Clark'' is new. How do these get created? --[[User:Roglo|Roglo]] 13:15, 28 Jan 2008 (CST)
 
  
:You can't do it by changing author names in Titles and Publications since the software will assume that you want to use the existing author record instead of creating a new one. However, when you change an Author's name in "Author Data", the software doesn't check whether there is another author record with the same name. Thus, if we have "Simon Clark" and "Simon Clarke" on file and the latter is changed to "Simon Clark", we will end up with two "Simon Clark" records. When reviewing submissions that attempt to modify Author records, moderators have no way of telling whether another Author record with the same name already exists, so they often slip through. [[User:Ahasuerus|Ahasuerus]] 15:00, 28 Jan 2008 (CST)
 
 
:P.S. I have deleted the results from the last two passes, but they are available via this page's History if anybody needs them. [[User:Ahasuerus|Ahasuerus]] 15:00, 28 Jan 2008 (CST)
 
 
2008-02-10 results:
 
 
<pre>
 
+-------------+--------------+----------+
 
| canonical  | legal        | count(1) |
 
+-------------+--------------+----------+
 
| Simon Clark | NULL        |        1 |
 
| Simon Clark | Clark, Simon |        1 |
 
+-------------+--------------+----------+
 
</pre>
 
 
::Fixed on 2008-02-29. [[User:Ahasuerus|Ahasuerus]] 23:57, 29 Feb 2008 (CST)
 
 
=== Pseudonym consistency ===
 
 
There are 22 authors in the latest dump who are self-pseudonyms (they have a pseudonym record pointing back to themselves):
 
<pre>
 
select ps.pseudo_id, ps.author_id, au.author_canonical
 
    from pseudonyms ps, authors au
 
    where ps.author_id = ps.pseudonym and ps.author_id = au.author_id;
 
</pre>
 
 
 
There are two pairs of authors who are pseudonyms of each other:
 
* Avram Davidson (author_id 501) and Ellery Queen (author_id 4921), twice
 
* Douglas Stapleton (author_id 12921) and Doug Stapleton (author_id 12922)
 
 
There may also be longer cycles...  [[User:WimLewis|WimLewis]] 22:28, 25 Mar 2007 (CDT)
 
 
==Database integrity constraints==
 
 
It looks liky MySQL does support the standard SQL database-integrity features like '''ADD CONSTRAINT foo FOREIGN KEY bar'''. Some of the inconsistencies in the db could be made impossible by adding the appropriate constraints (after the existing inconsistencies are cleaned up). I don't know the schema well enough to write the constraints myself or I'd suggest some here. --[[User:WimLewis|WimLewis]] 15:36, 29 Mar 2007 (CDT)
 
  
 
==Title vs. Publication Type Consistency==
 
==Title vs. Publication Type Consistency==
Line 731: Line 75:
  
 
*Collections:
 
*Collections:
**[[Data Consistency/Collection-Novel Mismatches|Collection-Novel mismatches]] -- NEW as of 2008-04-06
+
**[[ISFDB:Data Consistency/Collection-Novel Mismatches|Collection-Novel mismatches]]
**[[Data Consistency/Collection-Anthology Mismatches|Collection-Anthology mismatches]] -- FIXED as of 2008-04-06
+
**[[ISFDB:Data Consistency/Collection-Anthology Mismatches|Collection-Anthology mismatches]]
**[[Data Consistency/Collection-Omnibus Mismatches|Collection-Omnibus mismatches]] -- FIXED as of 2008-04-06
+
**[[ISFDB:Data Consistency/Collection-Omnibus Mismatches|Collection-Omnibus mismatches]]
**[[Data Consistency/Collection-Other Mismatches|Collection-Other mismatches]] -- FIXED as of 2008-04-06
+
**[[ISFDB:Data Consistency/Collection-Other Mismatches|Collection-Other mismatches]]
  
 
*Novels:
 
*Novels:
**[[Data Consistency/Novel-Magazine Mismatches|Novel-Magazine mismatches]] -- FIXED as of 2008-04-13
+
**[[ISFDB:Data Consistency/Novel-Magazine Mismatches|Novel-Magazine mismatches]]
**[[Data Consistency/Novel-Anthology Mismatches|Novel-Anthology mismatches]] -- FIXED as of 2008-04-06
+
**[[ISFDB:Data Consistency/Novel-Anthology Mismatches|Novel-Anthology mismatches]]
**[[Data Consistency/Novel-Collection Mismatches|Novel-Collection mismatches]] -- NEW as of 2008-04-06
+
**[[ISFDB:Data Consistency/Novel-Collection Mismatches|Novel-Collection mismatches]]
**[[Data Consistency/Novel-Other Mismatches|Novel-Other mismatches]] -- FIXED as of 2008-04-13
+
**[[ISFDB:Data Consistency/Novel-Other Mismatches|Novel-Other mismatches]]
  
 
*Short fiction:
 
*Short fiction:
**[[Data Consistency/Short Fiction-Non-fiction Mismatches|Short fiction-Non-fiction mismatches]] -- Not much new as of 2008-04-06, will need to rerun in a few months
+
**[[ISFDB:Data Consistency/Short Fiction-Non-fiction Mismatches|Short fiction-Non-fiction mismatches]]
**[[Data Consistency/Short Fiction-Novel Mismatches|Short fiction-Novel mismatches]] -- NEW as of 2008-04-06
+
**[[ISFDB:Data Consistency/Short Fiction-Novel Mismatches|Short fiction-Novel mismatches]]
  
 
*Non-fiction:
 
*Non-fiction:
**[[Data Consistency/Non-fiction-Novel Mismatches|Non-fiction-Novel mismatches]] -- NEW as of 2008-04-06 (All fixed 2008-04-11)
+
**[[ISFDB:Data Consistency/Non-fiction-Novel Mismatches|Non-fiction-Novel mismatches]]
**[[Data Consistency/Non-fiction-Other Mismatches|Non-fiction-Other mismatches]] -- FIXED as of 2008-04-06
+
**[[ISFDB:Data Consistency/Non-fiction-Other Mismatches|Non-fiction-Other mismatches]]
  
 
*Other:
 
*Other:
*[[Data Consistency/Non-genre Mismatches|Non-genre mismatches]] NEW as of 2008-04-06
+
*[[ISFDB:Data Consistency/Non-genre Mismatches|Non-genre mismatches]]
*[[Data Consistency/Serial Mismatches|Serial mismatches]] -- NEW as of 2008-04-06
+
*[[ISFDB:Data Consistency/Serial Mismatches|Serial mismatches]]
*[[Data Consistency/Omnibus Mismatches|Omnibus mismatches]] -- NEW as of 2008-04-06 (All fixed 2008-04-11)
+
*[[ISFDB:Data Consistency/Omnibus Mismatches|Omnibus mismatches]]
*[[Data Consistency/Anthology Mismatches|Anthology mismatches]] -- NEW as of 2008-04-06
+
*[[ISFDB:Data Consistency/Anthology Mismatches|Anthology mismatches]]
*[[Data Consistency/Editor Mismatches|Editor mismatches]] - NEW as of 2008-04-06
+
*[[ISFDB:Data Consistency/Editor Mismatches|Editor mismatches]]
  
 
==Pseudonyms in Collections==
 
==Pseudonyms in Collections==
Line 761: Line 105:
 
The following page lists all known Collection Publications that include pseudonymous Titles as of the August 11, 2007 backup. Although this is not always indicative of an error, we estimate that a significant percentage of these occurrences need to be fixed.
 
The following page lists all known Collection Publications that include pseudonymous Titles as of the August 11, 2007 backup. Although this is not always indicative of an error, we estimate that a significant percentage of these occurrences need to be fixed.
  
*[[Data Consistency/Pseudonyms in Collections|Pseudonyms in Collections]]
+
*[[ISFDB:Data Consistency/Pseudonyms in Collections|Pseudonyms in Collections]] NEW as of 2008-05-06
  
 
==Serial Dates==
 
==Serial Dates==
Line 767: Line 111:
 
The following page lists all known Serial records whose Title dates do not match the dates of the Publications that they appeared in as of the 2008-02-10 backup.
 
The following page lists all known Serial records whose Title dates do not match the dates of the Publications that they appeared in as of the 2008-02-10 backup.
  
*[[Data Consistency/Serial Dates|Serial Dates]]
+
*[[ISFDB:Data Consistency/Serial Dates|Serial Dates]]
 +
 
 +
[[Category:Bibliographic Projects|Data Consistency]]

Latest revision as of 19:28, 19 February 2015

The Data Consistency project is a place to coordinate efforts to identify and repair data consistencies, including Stray Publications, malformed ISBNs, etc.

Publication Records

Invalid characters in Publication titles





Missing data

  • Need a script that finds missing data, e.g. missing page counts, missing pb/tp/hc data, 0000-00-00 dates, etc.




Titles

Safe to auto-merge identical titles?

Related to the Gardner Dozois collection was that for nearly all of the titles I found had two title records that were identical other than one was a parent of the other. If it seems safe it seems it would safe some work to to do a sweep for title records that are identical and to auto-merge them. Marc Kupper 22:25, 22 Dec 2006 (CST)


Tags

It would be nice to standardize user-generated tags.

Synonyms


  • hard science fiction
  • hard sf

  • history of sf
  • history-of-sf

  • juvenile sf
  • juvenile-sf

  • recursive sf
  • meta sf

  • 'young-adult humorous sf' -> 'humorous sf' + 'young-adult sf'

etc etc

some one should probably come up with a standard then make some filters for input and run the filters on the existing tags.


Authors

Title vs. Publication Type Consistency

This section documents mismatches between Title types and associated Publication types:

Pseudonyms in Collections

The following page lists all known Collection Publications that include pseudonymous Titles as of the August 11, 2007 backup. Although this is not always indicative of an error, we estimate that a significant percentage of these occurrences need to be fixed.

Serial Dates

The following page lists all known Serial records whose Title dates do not match the dates of the Publications that they appeared in as of the 2008-02-10 backup.