Difference between revisions of "Rules and standards discussions/Archive/Archive01"

From ISFDB
Jump to navigation Jump to search
(Archive 22 sections)
 
(Archive Non-linear Series Numbering)
Line 513: Line 513:
 
==Serial Data Display==
 
==Serial Data Display==
 
It looks like Serial records that are not associated with book Titles are not displayed as part of [http://www.isfdb.org/cgi-bin/ea.cgi?Arthur_H._Landis Long Works] ''or'' [http://www.isfdb.org/cgi-bin/eas.cgi?Arthur_H._Landis Short Works] listings, to use Arthur H. Landis' ''Let There Be Magick!'' (which was revised as ''A World Called Camelot'', btw) as an example. The serial ''is'' displayed on the [http://www.isfdb.org/cgi-bin/ae.cgi?Arthur_H._Landis Alphabetical] and [http://www.isfdb.org/cgi-bin/ch.cgi?Arthur_H._Landis Chronological pages], though. Is this by design or happenstance? [[User:Ahasuerus|Ahasuerus]] 20:38, 20 Aug 2006 (CDT)
 
It looks like Serial records that are not associated with book Titles are not displayed as part of [http://www.isfdb.org/cgi-bin/ea.cgi?Arthur_H._Landis Long Works] ''or'' [http://www.isfdb.org/cgi-bin/eas.cgi?Arthur_H._Landis Short Works] listings, to use Arthur H. Landis' ''Let There Be Magick!'' (which was revised as ''A World Called Camelot'', btw) as an example. The serial ''is'' displayed on the [http://www.isfdb.org/cgi-bin/ae.cgi?Arthur_H._Landis Alphabetical] and [http://www.isfdb.org/cgi-bin/ch.cgi?Arthur_H._Landis Chronological pages], though. Is this by design or happenstance? [[User:Ahasuerus|Ahasuerus]] 20:38, 20 Aug 2006 (CDT)
 +
 +
==Non-linear Series Numbering==
 +
 +
Take a look at the way Tony Abbott and his publisher number his popular YA series, [http://www.tonyabbottbooks.com/secrets_of_droon.html ''The Secrets of Droon'']. To quote the relevant part:
 +
 +
*'''16'''. ''The Knights of Silversnow''. [description follows]
 +
*'''Special Edition #1'''. ''The Magic Escapes''. This first ever Special Edition picks up right where ''The Knights of Silversnow'' left off and is a Droon adventure like never before, pitting Eric, Julie, Keeah, and Neal against a brand new and particularly mysterious villain. [...]
 +
*'''17'''. [...]
 +
*'''18'''. [...]
 +
etc.
 +
 +
Then we have ''Special Edition #2'' between 21 and 22, ''Special Edition #3'' between 25 and 26 and ''Special Edition #4'' between 28 and 29.
 +
 +
I suppose the logical thing to do would be to call ''Special Edition #1'' volume 17 in the series; the book that is labeled ''Volume 17'' would then become volume 18 according to our numbering scheme, etc. Of course, it would also confuse the heck out of everybody :-(
 +
 +
I guess the question is what is the least painful way to catalog this weirdness that wouldn't break the display logic? Do we (or can we) support "16a" or anything along those lines? [[User:Ahasuerus|Ahasuerus]] 18:55, 18 Dec 2006 (CST)
 +
 +
:Number them both 16 and since the display code seems to sort by Series # and then publication date it'll at least get the specials in the correct spots in your list. I don't think there's an easy solution other than hidden ordinals or some other mechanism for explicitly defining the order. For example, I had suggested earlier to sort unnumbered items by date in the middle of the list but that may not get the specials in the correct spot. 21:15, 18 Dec 2006 (CST)[[User:Marc Kupper|Marc Kupper]]
 +
 +
::I just tried that and [http://www.isfdb.org/cgi-bin/ea.cgi?Tony_Abbott it seems to work].  I actually tried "16+" as the second one, but it stripped off the "+". [[User:Mike Christie|Mike Christie]] [[User_talk:Mike Christie|(talk)]] 21:21, 18 Dec 2006 (CST)
 +
 +
:::Apparently, it doesn't like decimals either :( Al, do you think we could allow "16.1" or would it be abusable? I would really prefer the relationship to be immediately obvious (in part so that helpful editors wouldn't try to correct it), but I am not sure how to accomplish it. We could have a separate ''Secrets Of Droon Special Edition'' subseries, but that would obscure the link between the four "special edition" books and the main series. [[User:Ahasuerus|Ahasuerus]] 22:21, 18 Dec 2006 (CST)
 +
 +
::::A hack that comes to mind is to display the series numbers modulo 1000 with 0 being displayed as blank.  You could then number them
 +
::::* 6 The Sleeping Giant of Goll (2000) with Tim Jessell
 +
::::* 7 Into the Land of the Lost (2000)
 +
::::* 14 Voyage of the Jaffa Wind (2002) with David Merrell and Tim Jessell
 +
::::* 15 The Moon Scroll (2002) with Tim Jessell
 +
::::* 16 The Knights Of Silversnow (2002)
 +
::::* 1000 The Magic Escapes (2002)
 +
::::* 1017 Dream Thief (2003)
 +
::::* 1019 The Coiled Viper (2003)
 +
::::* 1021 Flight of the Genie (2004)
 +
::::* 2000 Wizard or Witch?
 +
::::* 2022 The Isle of Mists
 +
::::* 2023 The Fortress of the Treasure Queen
 +
::::* 2024 The Race To Doobesh (2005)
 +
::::* 2025 The Riddle Of Zorfendorf Castle (2005)
 +
::::* 3000 Voyagers of the Silver Sand
 +
::::* 3026 Moon Dragon
 +
::::* 3027 The Chariot of Queen Zara
 +
::::A low tech but still somewhat user friendly way to manage this is to an edit-series-order page that looks like the following. Up/down would be links pointing at the cgi that would get passed the series, item, and direction which would then recalculate the 1000x numbering and repaint. Or, you could use radio buttons. [[User:Marc Kupper|Marc Kupper]] 00:06, 19 Dec 2006 (CST)
 +
 +
::::'''Control Series order'''
 +
::::{|
 +
|-
 +
! align="left"|Move!! align="right"|# !! align="left"|Title
 +
|-
 +
|-
 +
|down</td>|| align="right"|6||The Sleeping Giant of Goll (2000) with Tim Jessell
 +
|-
 +
 +
|-
 +
|up/down</td>|| align="right"|7|| Into the Land of the Lost (2000)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|14|| Voyage of the Jaffa Wind (2002) with David Merrell and Tim Jessell
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|15|| The Moon Scroll (2002) with Tim Jessell
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|16|| The Knights Of Silversnow (2002)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|&nbsp;|| The Magic Escapes (2002)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|17|| Dream Thief (2003)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|19|| The Coiled Viper (2003)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|21|| Flight of the Genie (2004)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|&nbsp;|| Wizard or Witch?
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|22|| The Isle of Mists
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|23|| The Fortress of the Treasure Queen
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|24|| The Race To Doobesh (2005)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|25|| The Riddle Of Zorfendorf Castle (2005)
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|&nbsp;|| Voyagers of the Silver Sand
 +
|-
 +
 +
 +
|-
 +
|up/down</td>|| align="right"|26|| Moon Dragon
 +
|-
 +
 +
 +
|-
 +
|up</td>|| align="right"|27|| The Chariot of Queen Zara
 +
|-
 +
|}
 +
 +
(unindent) Playing with the series numbers won't help too much as the data types are integers, so the database won't accept strings, or "1A", or "1.5", or anything other than "1". The fundamental problem is that there needs to be an ordinal that describes the order the series items are printed in, and a label that describes the title's series number. At present ordinal=label.
 +
 +
Ordinals are pretty much perfect for determining the printing order, as MySQL can then do the ordering without any postprocessing. An easier change than trying to sort floating point or strings would be to add a label field. From the original example, the ordinals would not be displayed (shown here in parenthesis), but would control the ordering. The labels would be displayed:
 +
 +
* (16) '''16'''. ''The Knights of Silversnow''. [description follows]
 +
* (17) '''Special Edition #1'''. ''The Magic Escapes''. This first ever Special Edition picks up right where ''The Knights of Silversnow'' left off and is a Droon adventure like never before, pitting Eric, Julie, Keeah, and Neal against a brand new and particularly mysterious villain. [...]
 +
* (18) '''17'''. [...]
 +
* (19) '''18'''. [...]
 +
 +
This would require some additional tools to perform inserts between to adjacent ordinals. [[User:Alvonruff|Alvonruff]] 06:33, 19 Dec 2006 (CST)
 +
 +
:If we pursue this approach (which seems to reflect reality better than any previously proposed alternative), then we will need to make very clear to the editors why we have two fields for ''Series Number''. We will also need to have a standard for when this mechanism can and can't be used to insert sub-series into the main series. I can see how it could be very tempting to stuff everything into the main series under certain circumstances. [[User:Ahasuerus|Ahasuerus]] 15:53, 21 Dec 2006 (CST)
 +
 +
::I would not show people the second set of ordinals but rather would let them move items in the series up/down.  Sub-series are an interesting problem and hmm – it almost seems like we may need hidden anchors in the series list to allow people to position sub-series. This is getting messy – the existing mechanism is defined in the title records and the only title-to-title ordering is the Series # ordinal and title_copyright (first-pub-date). The desire seems to be able to create external lists that have any structure/order and numbering method (decimal, Roman numerals, etc.) plus there is a second desire that a title can be a member of more than one series. With that in mind it seems a case could be made for an entirely external series mechanism that references title records. [[User:Marc Kupper|Marc Kupper]] 01:09, 23 Dec 2006 (CST)
 +
 +
:::Series ordering and series numbering have been a messy area for a long time. For example, there is (a) publication order, (b) internal chronological order and (c) preferred reading order; it can be argued that all three have value to ISFDB users and should be displayed independently if they differ. Also, note Feature request 90001 and Bug 30014. I am sure we will revisit this area post-beta, but for now we probably want to save this discussion on some Help template's Talk page or some such. [[User:Ahasuerus|Ahasuerus]] 11:14, 23 Dec 2006 (CST)
 +
 +
::::I agree on the revisit and have cut/pasted the entire thread to the [[Bibliographic_Rules/Archive/Archive01|Archive]] and also into [[Feature:90001]]. [[User:Marc Kupper|Marc Kupper]] 16:40, 23 Dec 2006 (CST)

Revision as of 18:40, 23 December 2006

Work dates for books first published as serials

For serials that were later published in book form, we give the first book publication as the "work" date, right? E.g.:

The Skylark of Space (1946) with Lee Hawkins Garby

  Magazine/Anthology Appearances:
  The Skylark of Space (Part 1 of 3) (1928) with Lee Hawkins Garby
  The Skylark of Space (Part 2 of 3) (1928) with Lee Hawkins Garby
  The Skylark of Space (Part 3 of 3) (1928) with Lee Hawkins Garby

This looks pretty good, but what happens when we don't have serial information properly linked (or at all), e.g.:

Skylark of Valeron (1949)

? This can be misleading since the original magazine publication was in 1933/1934, but now it looks like the work first appeared in 1949 -- and, as a matter of fact, it has already misled one long term contributor on rasf. Ahasuerus 19:44, 30 Apr 2006 (CDT)

I'm open to whatever reduces confusion User:Alvonruff
After reviewing a few more serial-heavy authors, it looks like there is no hard and fast rule in the current version of ISFDB. In some cases, the first magazine publication is given as the "Work" date, in other cases it's the first book publication. Most encyclopedias try to provide both data elements (see Clute/Nicholls). I suspect that this issue will be discussed some more when we tackle the larger issue of how serials should be displayed within ISFDB. Ahasuerus 14:14, 2 May 2006 (CDT)
I updated the help to say that it should be the book date; the magazine date can be noted in the note field if it differs. It sounds like a serialization enhancement, to connect serials to the novels they serialize, would be a nice-to-have long-term feature. Mike Christie 19:48, 30 Nov 2006 (CST)
Except that this already seems to be in place, as I should have known. Does that work lexically? I see nothing in the data to connect serials to their novels. Mike Christie 06:34, 1 Dec 2006 (CST)
Yes, it's currently done as a lexical match on the Title and Author name. It's not perfect -- variant titles, pseudonyms, etc have been known to confuse it and there are some bugs -- but it's all we have for now. Also, there are many cases when serializations were done after the fact, sometimes decades after the fact. Think of the Verne and Wells serials in Amazing in the late 1920s or the reprint digests of the 1960s/early 1970s. Ahasuerus 13:34, 1 Dec 2006 (CST)
Thanks for the explanation. I'll make some additional notes in the help; I think a variant title in the magazine serialization would be a justification for a title record with no corresponding publication, so I'll mention that too. Mike Christie
Sounds about right, I was thinking about this very issue the other day :) Ahasuerus 15:01, 1 Dec 2006 (CST)

Fixups vs. Collections of linked stories vs. Omnibuses

If a novel was first published as a series of stories and they were later combined to make a novel, it's a "fixup". Easy enough. However, if the stories that comprise the final "novel" were themselves longish, does it make the resulting novel a fixup or an omnibus? Or is the determining factor whether the stories were originally published in magazines or as standalone books?

This is not a high priority issue, but there are some complexities around de Camp/Pratt's Incomplete Enchanter stories and such, so I figured I should mention it here before we forget. Also, Jack Williamson's _Seetee Ship_ is listed as a novel and then as a collection ( http://www.isfdb.org/cgi-bin/pl.cgi?STSHP531951), which makes you wonder how many "collections" we have that are really "collections of linked stories" and whether some of them are actually fixups. Ahasuerus 23:07, 30 Apr 2006 (CDT)


This will be a long response that will look off-topic at the beginning. I've been thinking about the verification issue, and verification of bibliographic data falls into two distinct camps: objective and subjective. Objective data is easily found and can be verified without argument. Publication data falls into the objective camp; a book has a verifiable ISBN or it doesn't; it has a verifiable title printed on it's cover; it is either unpaginated or has page numbers printed on pages, allowing one to determine page count. Subjective data is more difficult to find, and even when two people agree, a third can argue that it is incorrect (for instance, numerous wars have been fought right here in the ISFDB on the numbering, naming, and ordering of the books by Larry Niven). Often subjective data cannot be found in the text itself (such as series information), or requires some degree of work to calculate (such as whether or not a piece of short fiction is a short story, novelette, or novella).

Oh, absolutely! This is one of the main reasons why librarians traditional stick to "objective data". MARC-21 has weak-ish support for "subjective" data, but it's mostly about describing physical objects, i.e. books, magazines and other holdings. This makes it very hard to track multiple "editions" of what we would call the same "work" since "editions" are really "subjective" links between different physical objects. For example, if two books are identical except that one of them says "second edition" or "second printing", is it really a different "edition"?
OCLC has similar problems, but they have been working on them -- see their recently added support for links between "editions" and their "fiction" project. Ahasuerus 21:31, 1 May 2006 (CDT)

The classification of a work into novels, collections, anthologies, etc at first seems objective, but there are numerous corner cases which make that classification quickly become subjective. If we start with traditional definitions for novels, collections, anthologies, etc, we start to see odd cases, such as:

  • Fixups. Objectively a collection of short fiction; subjectively a novel.
Genre bibliographers often try to distinguish between "true" fixups, i.e. books that read like a single novel, although perhaps awkwardly so, and "collections of linked stories", which don't. I am not sure we want to add support for this notion, it sounds like more trouble than it's worth. Perhaps we should call the latter collections and add a free text note when applicable? Ahasuerus 21:31, 1 May 2006 (CDT)
  • Collections of novels by different authors. These are popular with certain puplishing houses. If the works were shorter, this would be an anthology. If the works are novel length, is this now an omnibus? What about novella length?
Well, the term "omnibus" is typically reserved for reprints of previously published novel length works between the same covers. The reason for it (IIRC) is that cheap-ish omnibus reprints were quite popular back in the 1930s; you can still find Thorne Smith's "Three Deckers" in used bookstores and it's been 75 years. For our purposes, I would argue that the same basic rule applies and a brand new collection of longish works would be an anthology. "Omnibuses" is what SFBC does and Eric Flint compiles at Baen.
Of course, there are always borderline cases. Sometimes volume 3 will only appear in an omnibus reprint with volumes 1 and 2 or some such. But oh well :) Ahasuerus 21:31, 1 May 2006 (CDT)
  • A traditional collection, with a novel thrown in. Collection or omnibus?
Using the "reprint" rule above, I would be inclined to call it a collection, unless it has been published earlier -- some Jack Williamson and Keith Laumer reprints that I have been sifting through come to mind -- in which case, um, an omnibus, I suppose. But a reprint collection with no novels in it is just a collection, so I guess a collection has to have at least 1 reprint novel length work to be an omnibus. Except that some of Williamson's early "novels" were really more like novellas. Sigh, where is aspirin in this place? :( Ahasuerus 21:31, 1 May 2006 (CDT)
  • Peter F. Hamilton - a book is published in one form in the UK. The book is split into two books for publication in the USA. Is the UK book version considered an omnibus now?
As a first edition, I doubt it would count. On the other hand, if we could add support for "1+2" and similar series numbers, then we could call it a series and the UK version would become "1+2" and the US version will be "1" and "2" separately. I am not sure I like it, though, because that would result in a ton of series in Eugene Sue's or even Lafferty's case, which may not be justified. Ahasuerus 21:31, 1 May 2006 (CDT)
  • Boxed set - Single ISBN. Omnibus? Doesn't feel like one.
This one is interesting since it really adds a whole new layer. We now have a physical object that consists of multiple physical objects. Multiple ISBNs too... Ahasuerus 21:31, 1 May 2006 (CDT)
  • What is a work like The Mind's Eye which is a series of essays on artificial intelligence, structured with essays on a specific topic, each followed by a work of short fiction as an example? Anthology? Nonfiction?
Any number of SF textbooks that include short stories to illustrate their point would also fall under this category. Normally, I would be inclined to file them under "fiction" (anthology or collection depending on the number of authors), but what if non-fiction is 50-80% of the text? Ahasuerus 21:31, 1 May 2006 (CDT)

Fixups mess us up because we subjectively know that the works have history as shortfiction, even though all we can objectively determine from the fixup itself is that it is a novel.

We presumably want to record that the fixup was based on stories A, B and C somewhere, but I am not sure where this information is best kept. Let me toy with the Seetee series a little and see what I think. Ahasuerus 21:31, 1 May 2006 (CDT)

Since categorization of this sort is going to be subjective, I propose we define definitions for the existing categories (and invent new ones if necessary), and as we find exceptions document them here. I've started the section below, and thrown in some starter definitions. Alvonruff 12:46, 1 May 2006 (CDT)

I've updated the help to reflect the above rules. Mike Christie 05:28, 1 Dec 2006 (CST)

Bibliographic Category Definitions

  • Anthology - A collection of short fiction, by different authors.
  • Collection - A collection of short fiction, with at least one author in common.
  • Interview - An essay in which one person interviews another, and writes up the result. Ranges from verbatim transcriptions to highly-edited musings by the interviewee.
  • Magazine - A publication with an intended regular publication schedule. Typically does not have an ISBN, and is not usually bound in a manner consistent with books.
    • "Hardcover magazines" (e.g. "The Pulphouse") can be a pain since they look like hardcover books :( Ahasuerus 21:56, 1 May 2006 (CDT)
  • Nonfiction - Any collection of material, either by a single author, or by multiple authors, which would not be considered fiction.
  • Novel - A work of fiction whose length is greater than 40000 words.
    • Keep in mind that these definitions have been known to change and may change again. Ahasuerus 21:56, 1 May 2006 (CDT)
  • Novella - A work whose length is greater than 17500 words and less than 40000 words.
  • Novelette - A work whose length is greater than 7500 words and less than 17500 words.
  • Omnibus - A collection of novels.
    • See above for my take on omnibuses as reprint artifacts :) Ahasuerus 21:56, 1 May 2006 (CDT)
  • Poem - Hmmm....
    • Poems are easy, but if we ever decide to segregate "poem collections" into a separate category, things can get pretty tricky since poems are often published together with short stories. Ahasuerus 21:56, 1 May 2006 (CDT)
  • Review - An essay in which one author provides a critical review of another author's works.
  • Serial - A work of at least two parts, in which a longer work is published serially, typically in a magazine.
    • At least two parts? Don't we currently list some "complete novels" of the 1930s-1950s as serials? By the way, the more I think about it, the more doubts I have about the fact that we display serial information on the Long Works page. Do people really need to know this up front? Especially when the author's bibliography is long and screen real estate is at a premium? Ahasuerus 21:56, 1 May 2006 (CDT)
  • Shortfiction - A work with a length ranging from short story to novella.
  • Short Story - A work whose length is less than 7500 words.

Alternative Approaches to Identifying and Capturing Biblio Data

There are multiple mutually overlapping approaches to capturing biblio data:

  1. Person-centric. Identify a person of interest (author, editor, artist, reviewer, etc) and find all records that are applicable. Use paper bibliographies (Bleiler, Tuck, Clute, Reginald, ect) for older editions, Web bibliographies (Locus, Contento, Oz biblios pre-1993 at edeion.net, etc) for additional info, www.bookfinder.com and www.addall.com (used.addall.com for used books) to find everything that is currently for sale, library search engines (OCLC, Bookwhere, Sigla, etc) for library holdings, etc. Watch out for bad data on Amazon, misspellings, etc
  2. Content-centric. Take a series of related books, e.g. Ace Doubles (see WP) or everything by DAW (see Steven H. Silver's Web page) and check multiple overlapping sources. Ditto for shared worlds, single author series, etc.
  3. Source-centric. Take a single source (book, Web page, online bookstore, publishers' online catalogs, etc) and transcribe the data.
  4. Web spiders. Start with a single Web source, e.g. Locus Online, and use the New Arrival/Links sections to identify more sources, and then use these new sources to find more biblio sources, etc.
  5. Anything else?

Different editors will have different priorities, so they will choose different paths. We may want to use this (or similar) categorization scheme to keep track of editors and what they are doing. A matrix, perhaps? Encourage editors to list their interests on their User pages and then build a matrix around it? Ahasuerus 10:04, 1 May 2006 (CDT)

  1. Date-centric. Locate books published within a specific time period (includes forthcoming books).
  2. Series-centric. Some publication series are set within large corporate-owned universes (for instance Star Wars) that cut across multiple authors.
  3. Publisher-centric (subset of Content-centric above). Since ISBN prefixes's are handed out to specific publishers, and some publishers specialize in genre fiction, it is possible to data mine the ISBN address space for missing books. (I've done this..) Alvonruff 12:57, 1 May 2006 (CDT)
On a related note, where do we keep information about our progress in the case of each author/source/series/etc? I suggest we create a Wiki page for each one -- unless (potentially) 50,000+ pages will break somethign in the Wiki software -- and also have a rough template with checkboxes or free text fields for all the regular suspects (Tuck, Clute, etc). That way we can see what has and hasn't been done for each author/source/series/etc. Spam might be a problem, though, we will need some way of protecting it without making it hard for legitimate editors to record what they have done. How does Wikipedia handle all that porn/gambling/drugs spam anyway? Oh yes, and we may need more than one template -- no point in checking Tuck and Reginald if the author wasn't even born in 1974 :)
  • I'm not so worried about breakage as slowly moving Wikipedia articles over to the ISFDB. It would be easy enough to automatically add a link to the author bibliography that would point to a unique page in the Wiki (even though it may not yet exist).
Well, WP articles cover all kinds of non-biblio data that we wouldn't want, e.g. detailed, bio data, "Criticism and interpretations", etc. Riht now I am trying to figure out what the arguments are for and against making this verification information available as part of this Wiki vs. part of the ISFDB itself. The more automation we propose to put in, the more it looks like it would be better suited for the ISFDB proper. Ahasuerus 15:27, 3 May 2006 (CDT)
  • For publications (objective data), the goal is to add a verification checkbox, which means that someone has verified the data with the primary source. We'll know who verified it. We can use some number of checks as a threshold to lock most of the fields in the record (stuff like notes or cover artist might be left unlocked).
Since publications, works and authors do not map onto each other neatly (technically speaking, it's a many-to-many-to-heck-who-knows relationship), I can't think of a way to consolidate these checkboxes in a Wiki matrix that wouldn't break something. Let me create a sample Wiki page or three and see where it takes us. Ahasuerus 15:27, 3 May 2006 (CDT)
  • For titles (subjective data), we add the matrix of secondary sources as you've described. We can score each author using some as-yet-to-be-discovered heuristic that relies on the number of boxes checked, and generate some percent confidence/verified in the bibliography that is displayed on the author's page. Another app can show authors with low confidence scores. Also gives readers a feel for the veracity of a given bibliography, which is needed when Dissembler builds accidental bibliographies over time that have never been touched by human hands.
Sounds like a very useful suite of tools! Ahasuerus 15:27, 3 May 2006 (CDT)
  • Authors should have modification timestamps. Whenever a database change occurs that affects their summary bibliography, the timestamp is modified. That will allow patrols of the database. Alvonruff 06:05, 2 May 2006 (CDT)
As I said above, the more I think about the more it looks like these checkboxes and matrixes would be happier in the ISFDB. But I will still create a few pages in the Wiki first. If nothing else, they will serve as a sandbox where we can do proff of concept stuff. Ahasuerus 15:27, 3 May 2006 (CDT)
Also, some sources can be data mined once (checklists), others twice or even three times (Tuck: 1. extract work data; 2. extract English language publication data; 3. extract foreign language publication data), and then we have dynamic sources like Web sites and online bibliographies, etc that need to be checked periodically, perhaps in an automated fashion.
And speaking of automation, do we have a custom Z39.50 client/spider/aggregator/what have you? There are multiple commercial ones like Bookwhere that we could buy a license for (need to check their prices), then there is a free one at Sigla (have to click on the tiny Union Jack to change the language to English), but it's a little awkward. Anything else? Ahasuerus 21:42, 1 May 2006 (CDT)
We'll have to take care as harvesting data from for-pay resources is probably a violation of Terms and Conditions. Once the ISFDB running smoothly, I'll be spending most of my time on this particular topic. Alvonruff 06:05, 2 May 2006 (CDT)
Well, Sigla is a free service and the data that you access is just public library (all 1,600 of them) holdings, so I don't think that would be an issue. Bookwhere and other commercial software packages might be a problem, but we would need to look it up. Ahasuerus 15:17, 3 May 2006 (CDT)

Proposed Scope of the Project

Note: This is very much a first draft subject to merciless discussion and change.

Let's see if we can describe what we are cataloging and what we are not cataloging.

Definitions

  1. Speculative fiction is defined to include:
    • science fiction, including works:
      • set in a future that is now in the past
      • that deal with technological advances that were futuristic at the time they were published
    • fantasy fiction
    • alternative history
    • utopian fiction as long as it is recognizably fiction and not a treatise
    • non-genre speculative fiction
    • fabulations
    • magic realism
    • slipstream
    • proto-science fiction, including but not limited to:
      • lost world tales
      • fantastic voyages
      • scientific romances
      • pre-historic romances
      • future war stories
      • the older the book, the more likely we are to include it even if it is borderline eligible. This is caused by the fact that there were relatively few Works published prior to 1800 and by the difficulties with distinguishing between speculative and non-speculative fiction (or even fiction and non-fiction) when you are dealing with pre-1800 Works
    • the supernatural (with an inclusionist bias), including but not limited to:
      • supernatural horror
      • ghost stories
      • gothic novels with supernatural elements
      • occult fiction
  2. Speculative fiction is defined to exclude:
    • techno-thrillers, political thrillers and satires set in a future indistinguishable from the present (?)
    • fairy tales with no known author (?)
    • animal books for very young children (?)
    • comic books, manga, and graphic novels
    • games, game guides and game paraphernalia -- but works of fiction based on games are included
    • philosophical works of speculative nature unless written as a work of fiction (with an inclusionist bias)

Rules of Acquisition

  1. In - Works of speculative fiction originally published in English, including works published within and outside the genre. "Published" is defined as published by:
    • professional publishers
    • small presses
    • prozines
    • semi-prozines
    • unpublished works by established authors, e.g. John Taine's manuscripts? Or do we just mention them in their respective Wikipedia articles? On their ISFDB Wiki page?
    • print on demand??
    • vanity publishers??
    • fanzines??
    • newspaper publications??
    • ?
  2. In - Foreign language translations of speculative fiction works originally published (or written but not published - Bulmer, Dibell, etc) in English. Support for derivative works (sequels-by-other hands, collections and omnibuses that have no direct analogs in English, etc) may need to be enhanced.
  3. In - English language translations of works of speculative fiction originally published in foreign languages. In these cases, we will also provide information about the original foreign language work.
  4. In - Works of speculative fiction published in a foreign language that haven't been translated into English, but whose author's other works have been translated into English. This is done to make it easier for people who are interested in, e.g., Lem or Barbet to see as full a picture of the author's work as possible.
  5. Debatable - Works of speculative fiction published in a foreign language that haven't been translated into English and whose author's other works have not been translated into English. Arguments for exclusion: avoid duplicating the efforts of foreign language bibliographers in a field where we can't realistically compete with them. (True? False? Revisit if/when we have foreign language editors with extensive expertise in the field who would be willing to merge their biblios into the ISFDB?)
    • Debatable - Works by otherwise ineligible foreign language authors that were only published in a foreign language but that are part of an otherwise English language series. For example, there are numerous Russian language sequels to Conan. Also, foreign language sequels-by-other-hands to prominent works of SF that are otherwise ineligible. I am thinking of things like a few German and Czech language sequels to Jules Verne's works here.
  6. In - Works about speculative fiction published in the English language and their foreign language translations.
  7. In - Works (both fiction and non-fiction) that are not related to speculative fiction, but were produced by authors who have otherwise published works either of or about speculative fiction over a certain threshold (see below). This will include short fiction, but exclude non-fiction that was not published as a standalone book. Thus, Poul Anderson's book about thermonuclear weapons will be included, but Benford's and Forward's professionally published scientific articles will be excluded.
  8. Out - Works that are not related to speculative fiction by authors who have not published works either of or about speculative fiction over a certain threshold. This "certain threshold" is hard to define, but we need to draw the line in a way that would exclude Winston Churchill, who published at least one work of borderline speculative fiction. The goal here is to avoid cataloging everything ever published by James Fenimore Cooper, Robert Louis Stevenson, Honore de Balzac and other popular authors. Instead, we would want to catalog their speculative fiction works only.
  9. In - Otherwise unrelated works found in any publication that will be cataloged based on other criteria. This is done to avoid creating incomplete biblio records for magazines, anthologies, etc.
  10. Debatable - Individual letters to the editor published in magazines. Arguments for inclusion: some of the better and more useful print biblios include them; some of the letters were instrisically interesting, e.g. there was a letter exchange between Philip Jose Farmer and Marion Zimmer Bradley in a mid-1950s pulp magazine that provided a significant amount of background information.
  11. Debatable - Convention programs, guides, etc. We definitely want any convention-published "real books", but probably not the ephemera. What about the book length stuff that cons put out that doesn't have any fiction, but has a lot of related information?
  12. Debatable - Academia-produced magazines. Can we realistically compete with, say, the SFRD?

Ahasuerus 14:42, 4 May 2006 (CDT)

Pseudonym Support

I don't know what the latest redesign/beefing up of pseudonym support involves, but it occurs to me that pseudonyms are a perfect example of "subjective vs. objective" data. The objective component is what's on the book's cover and is usually pretty unambiguous except in cases where the cover says something like "Book 2 in the new Lord Tedric series! By Gordon Eklund based on E.E. 'Doc' Smith's work!". The name may be the real author's legal name or a slightly modified form of his legal name (dropping the middle name, "Bill" instead of "William", etc) or a recognizable form of the legal name ("Lawrence Watt-Evans" for "Lawrence Watt Evans") or something else entirely. Sometimes it may be a joke, e.g. the Farmer/Vonnegut episode. Sometimes it is not listed at all. Sometimes it can be ludicrous -- as when the cover lists a long dead person as the author of a new book, e.g. Alex Raymond is credited as the sole author on the covers of some 1970s/1980s Flash Gordon books. But ludicrous or not, the name on the cover is something that we can all agree on and enter into the database.

On the other hand, Title Author data is subjective. Sometimes it's easy to tell that "Robert Heinlein" and "Robert A. Heinlein" are the same person. Sometimes it's not easy at all, like in the case of the two Dominic Greens. Sometimes the pseudonym may not be disclosed for years like in that Mel Odom episode. Sometimes it may never be disclosed since the records of who wrote what as "Victor Appleton" or as "Roy Rockwood" may never be found. Sometimes "Lewis Padgett" meant "Kuttner and Moore" and sometimes it meant "Kuttner". And so on and so forth.

So, with this in mind, do we want to have two completely data elements for the two types of Author data, i.e. Title Author and Publication Author? If so, we may want to keep in mind that ISFDB users do not think in terms of objective vs. subjective data. The questions that they want answered fall on both sides of the fence. Is this book that I have in my hands a first edition? When was the book first published? Is the name on the cover a pseudonym and, if so, who is the real author? What other books/stories has that person written? We may have to be careful to support these types of "mixed type" queries in the search logic.

It looks like the current version of the search engine does a good job of it, but sometimes I am not sure I understand why it does what it does. For example, a search on "Barnett" retrieves both "Lisa A. Barnett" and "Lisa Barnett", but the latter one has no biblio data associated with her. Ahasuerus 15:46, 4 May 2006 (CDT)

Data Acquisition and Verification -- Proof of Concept

Richard Cowper has been created. I will abuse it in various ways to get a feel for what we need. Ahasuerus 15:11, 5 May 2006 (CDT)

I may rename this page so that we don't have potential collisions in the Wiki article namespace (to something like Author:Richard Cowper or Bibliographic Notes:Richard Cowper). If we name them strictly after the author we might have problems later (we don't have one yet, but would if someone used the pseudonym Main Page). I'd like to automagically put a link on each author bibliography that would point to such a bibliographic discussion area (they wouldn't be created automatically though). Alvonruff 15:24, 5 May 2006 (CDT)
Sounds good to me! BTW, Richard Cowper should be a good ginea pig. He was not a very prolific author, therefore he is more manageable than a Silverberg or an Asimov. On the other hand, he wrote both under his legal name and as "Richard Cowper", had a decent cross-section of genre and non-genre novels, series, stories, non-fiction, translations, chapbooks (including one that didn't have an ISBN since it was included in a limited edition as an add-on, see Locus), essays, reviews, interviews, etc. Should be an interesting exercise. Ahasuerus 16:09, 5 May 2006 (CDT)
You should now be able to create new Bibliographic Notes pages in the Wiki by clicking on the Bibliographic Notes link in the author's bibliography. The wiki links are of the form Author:author name. I chose this so that later we could also implement Title:title name if we so desire. Moved Cowper accordingly. Alvonruff 22:20, 5 May 2006 (CDT)
Great! :-) BTW, I am going to remove the Guide to Supernatural Fiction from the checklist. 360ish early authors of supernatural fiction are probably not worth the real estate and navigation issues. We should be able to check their lists manually. Ahasuerus 20:49, 7 May 2006 (CDT)

Data Verification Matrix - Layout

I know you're still experimenting - vertical seems more readable than horizontal. User:Alvonruff

That's what I am thinking as well, but there is no space in the current incarnation of "Vertical" for Publication data :-( I am also wondering if less computer-savvy folks will be able to maintain tables without making a huge mess every other page. Ahasuerus 20:40, 7 May 2006 (CDT)
At worst we'll use these as matrix designs and fold them into the database proper, so that they won't be able to bother with table layouts. Alvonruff 05:38, 8 May 2006 (CDT)
Having played with various ideas overnight, I suspect that we will have to fold "Verification data" into the core ISFDB database for a number of reasons:
  1. There is no reliable way of extracting "confidence level" data out of what is essentially a free text table.
  2. Wiki tables may be a little easier for non-computer savvy people to handle than HTML tables, but they are still significantly harder than checkboxes.
  3. We can capture username/date/time information much more reliably in the database.
  4. The current (Wiki-centric) design makes it hard to handle Works vs. Publications. On the other hand, we could associate verification information directly with Works and Publications in the database -- just add a "Verification information" link to each Work's and Publication's page.
  5. The Wiki-centric design would either necessitate adding all Verification-related Wiki pages to the distribution copy of the ISFDB backup or, barring that, it would result in the distribution version lacking some ISFDB functionality, i.e. "confidence levels".
Here is my Proposed layout for the Data Verification Form:
  1. The screen will be split into two halves.
  2. The top half will contain all previously entered verification informarion, e.g. "Tuck -- verified by User:XYZ on May 01, 2006 with comment: N/A, all work post-1968" or "Reginald 1700-1974 -- verified by User:ABC on June 04, 2006, User:GHF on July 30, 2006 with comment: Series data also checked".
  1. Multiple rows of verification data for the same source will be allowed.
  2. Previously entered Verification information will not be editable by users, i.e. it will be "Write once".
  3. Abbreviated source names like "Tuck" will be hyperlinked to their associated ISFDB Work entries.
  1. The bottom half will be for entering new Verification data:
  1. There will be one dropdown box on the left hand said for the Bibliographic Source, a checkbox in the center and an optional free text field on the right hand side for free text comments.
  2. There will also be a button called "New Bibliographic Source" displayed under the last row of fields, just like there is a "New Author" button for Works. Clicking on it will result in a new row being added.
  3. The dropdown box will contain a list of pre-approved Bibliographic Sources, "Visual Inspection" and "Additional Source" (wording?). The last one will be used for sources that are not pre-listed in the dropdown box.
Does this approach sound more reasonable than the current Wiki layout? Ahasuerus 12:37, 8 May 2006 (CDT)

Where will we put errors found in the references? Like: "Contento anth/coll checked, except it had a bad ISBN (checksum failed)." or "page count of [1] doesn't match that of [5]". User:Alvonruff

That's a good point. Probably in the same checkbox to make them easier to find. BTW, I am not sure I like the four tildes in every box, they gobble up a lot of real estate, but I guess we need a date/time stamp of some kind.
As an aside, even physical possession of a copy doesn't guarantee that you will transcribe the data correctly. There is one notorious issue of Unknown with a Frank Belknap Long story missing in the table of contents. Ahasuerus 20:40, 7 May 2006 (CDT)

Data verification -- Series data

Series data, series order in particular, is probably the most subjective and the most contentious area. The underlying problem appears to be that "Publication order" != "author's intended order" != "internal chronological order" != "reader's preferred order". For example, in the case of Doc Smith's Lensman saga:

  • publication order -- 1,3,4,5,6,2
  • author's intended order -- at first 3,4,5,6, eventually 1,2,3,4,5,6
  • internal chronologocal order -- 1,2,3,4,5,6
  • reader's preferred order (majority opinion) - 3,4,5,6,1,2

Nested series and series entries published differently by different publishers, e.g. Conan, is a whole different headache.

At the very least I think we may want to add a "Series Notes" field to capture these issues. Eventually, we may want to have support for "Series verification". Ahasuerus 11:08, 9 May 2006 (CDT)

Data Verification -- Available Sources

As I was watching (admiringly) Al enter C. S. Lewis' and Arthur Conan Doyle's biblio data from Reginald 1 and Tuck, I began to wonder how many sources different editors/moderators may have ready access to. Is it a safe assumption that everybody has at least Tuck (1-3), Reginald (1-3), and both Clutes? Probably not since I remember Mike mentioning that he didn't have the Reginalds and access to sources will likely get more diverse and unpredictable as we add more editors to the roster. Should we have a little matrix or a checklist of "Who Has What" so that we could ask fellow editors to help with lacunae? Ahasuerus 21:00, 22 May 2006 (CDT)

I was just thinking this morning that a good "interview question" for potential editors would be "how many of the references listed in the Print Bibliographies section of Sources of Bibliographic Information do you own?" (I'm still missing Reginald 3 myself). Otherwise I have everything there, plus Bleiler's original Checklist of Fantastic Literature, as well as Bleiler's Science Fiction: The Gernsback Years for magazine work (totally awesome, with extremely detailed synopsis, as well as a theme/motif index). A matrix would be nice, as it would tell at a glance what the group reference library looks like. Alvonruff 21:19, 22 May 2006 (CDT)
Ah, I see, that explains the lack of Reginald-3 references :) Sure, I can help with that, I have all 3 Reginalds, Tuck, Day, the Clutes and other goodies handy. Plus a few, well, OK, many thousands of SF books/magazines available for "physical" verification once we have the projected Verification tools in place. BTW, I have been reading up on Python in between other projects. "Significant white space" certainly brought back memories :)
Hm, I suppose I should go through the bibliographies that I have here and add the more useful ones to the list. Ah, here is one! Ahasuerus 22:13, 22 May 2006 (CDT)

Lessons Learned

Here are a few things that I think I learned while cleaning up Richard Cowper and Kris Neville (quite thoroughly in the former case and somewhat thouroughly in the latter case) and then doing a first pass ("internal consistency check") on Zelazny, Bear and Benford:

  • A comprehensive review of all biblio data for a Work is even more time consuming than I suspected.
  • Tuck is pretty good pre-1968, as we know, but not perfect.
  • The Reginalds only cover fist editions, don't list ISBNs and stop in 1991.
  • There are some errors in the Locus database - not many, but they need double-checking as well.
  • Contento updated his lists in 2005 and may do so again, so it's a moving target; perhaps we could capture it offline (HHTRACK?) and do some type of comparison once a year to make sure nothing has changed?
  • Sigla is buggy -- and inevitably slow -- but quite useful. The ability to view a dozen copies of the same Publication record is very nice. It also captures a lot of Danish, Swedish, German, Russian, etc translations, which may help down the road once we beef up translation support. It's difficult to use it as a single entity for Data Verification purposes since the subset of library catalogs that respond in any given case may be subtly different. A proprietary Z39.50 search engine would be probably even better. Need to think about this some more.
  • The fiction-oriented version of OCLC is very nice -- similar to Sigla, but faster and better organized since they have local copies of the records that they search/display -- but can be misleading when they try to derive subjective data from objective data and mess up.

The bottom line is that Publication level data entry will be VERY time consuming and, what's worse, require domain knowledge. We may need a an extra level of verification hierarchy, although I am not sure how best to implement it. We will also need to have Data Verification tools before we undertake it on a grand scale or else we will have the same kind of unverifiable mess that WP often has.

It might be more efficient to enter as many Works as we realistically can while improving pseudonym, collection/atnhology and translation support. After all, we had less than half of Cowper's and Neville's Works when I started a few days ago and they are not particularly obscure. As a side benefit, we may be able to attract more knowledgeable editors once the data is clean enough to encourage (rather than discourage) a casual user/visitor.

Therefore I propose that we first concentrate our data entry efforts on:

  • Internal consistency cleanup
  • Adding missing Works
  • Improving software support for content editing/pseudonyms/translators as well as single author collection editors
  • Adding missing series information, currently a major weakness

Of course, if somebody is just dying to enter every edition of Dune into the database, more power to him :) Ahasuerus 19:55, 15 May 2006 (CDT)

I found xISBN to be a really valuable tool for finding alternate editions. It frequently picks up translations. (And speaking of Dune, here's an xISBN list for one version.) grendel|khan 02:52, 26 May 2006 (CDT)
This is awesome. I just added xISBN support to Dissembler. And its first test case... Dune. Alvonruff 20:57, 26 May 2006 (CDT)

Translators (moved from Ahasuerus' Talk page)

There's a method to deal with translators on a per-title basis, but usually translated editions aren't added as new titles. See the other publications for Starman Jones, which are sometimes translations. grendel|khan 17:48, 16 May 2006 (CDT)

You could argue that translations are really derived works and constitute a separate layer between titles (Works) and Publications. However, in most cases, as you note above, they are currently handled as just another edition of the Work in question. Unfortunately, Publications don't have built-in support for translators' names, so you have to do it via Notes, at least for now. Ahasuerus 18:48, 16 May 2006 (CDT)

The Czech edition I just found (which apparently doesn't have a translated title, according to WorldCat)

FYI, OCLC links are session-specific, so they don't mean anything outside of your current session :) But yes, I found the title and then looked it up in the National Library of the Czech Republic. Here is the underlying MARC-21 code:

001 cpk19980296848 005 19971208000000.0 010 ## $a 80-85782-63-4 $b brož. 100 ## $a 19980525d1996^^^^m^^^0czey0103^^^^|| 101 0# $a cze 102 ## $a CZ 105 ## $a yyyy^^^^000ay 106 ## $a z 200 1# $a Starman Jones $f Robert A. Heinlein $g [z angličtiny přeložil Tomáš Kokoška] 210 ## $a Praha $c Classic And $d 1996 215 ## $a 233 s. $d 19 cm 225 2# $a Science fiction 410 #0 $1 2001 $a Science fiction 454 #0 $1 2001 $a Starman Jones 608 ## $a vědecko-fantastické romány 675 ## $a 820(73)-31 $9 undef 700 #1 $a Heinlein $b Robert A. $g Robert Anson $f 1907-1988 $3 jn19990003337 $4 070 702 #1 $a Kokoška $b Tomáš $4 730 801 #0 $b OLA001 801 #2 $b ABA001 801 #2 $b OLA001 901 ## $o 19980609 $a 80-85782-63-4 $f [1. vyd.] $g Na rubu titulního listu uvedeno chybně: ISBN: 80-85782-63-4 909 ## $a 000296848

Interestingly enough, the publisher is listed as "Classic And". Also, you entered the city as "Praha", which is the Czech name for Prague. That's what I have been doing with German language translations as well, but I suppose we will need a standard at some point. By the way, "brož." is short for "brožovaný", which means "paperback" -- see http://www.slovnik.cz/
D'oh! I thought that was a publisher name. That'll teach me to just copy in the first word that looks like a proper noun. grendel|khan 19:07, 16 May 2006 (CDT)
Yup, foreign languages can be treacherous. I can (sort of) help with Romance and Slavic languages, but my German is extremely rusty and forget about the Finno-Ugric subfamily. Ahasuerus 19:27, 16 May 2006 (CDT)

see publication record STRMNJNSBV1996) has a translator noted in the notes for now; let me know if that sort of thing should be split off into separate title records--that doesn't seem like the right thing to do. grendel|khan 17:48, 16 May 2006 (CDT)

It's something to discuss with Al when he has a little bit of free time :) Ahasuerus 18:48, 16 May 2006 (CDT)
Ah, "support for translator annotations" is scheduled for June 4 or so. Spiffy. I really should check before leaving these notes... grendel|khan 19:19, 16 May 2006 (CDT)

Cyrillic support

To quote User:Grendelkhan's last addition to the bugs page:

I'm having some trouble with a Russian-language version of Starman Jones, publication HHHPWLVGLD2002, "Астронавт Джонс". (I think it transliterates to "Astronavt Dzhons"),

FYI, there are automatic Cyrillic coverters on the net. Try it on "Астронавт Джонс" and you will see that it does indeed transliterate as "Astronavt Dzhons".

I entered the publisher as "Центрполиграф", and it appears that was in the submission confirmation screen, but it shows up as "Центрполи&". See the submission confirmation page for Publication Update #38462.

I don't think there is anything terribly special about Cyrillic via a vis other non-English characters aside from a somewhat bewildering variety of encodings (over two dozen, last I checked) that have been used over the last twentysomething years. I would guess that Al is using Unicode while the record that you were trying to enter was using an older encoding and some encoding transformation went awry. Al has indicated that there are still some problems with Unicode support in the code, e.g. there are problems with distinguishing between different types of apostrophes. Ahasuerus 18:57, 16 May 2006 (CDT)

Ace paperbacks (copied from Grendelkhan's Talk page)

Post-1968 Ace paperbacks are tricky since, as Mike Christie wrote in the relevant Wikipedia article just a couple of weeks ago:

In January 1969, Ace switched to a numeric coding system. The code depended on the title of the book; or specifically on the first significant word in the title. For example, Tom Purdom's The Barons of Behavior was published by Ace in about 1972 as serial number 04760. The first letter of "Barons" is "B", so the code assigned is fairly early in the numeric range 00000 to 99999. This procedure for assigning numeric codes was in use at Ace at least into the early 1990's, and may still be in use today. For Ace doubles, one of the titles was selected and used to determine what serial number should be used. For example, 11560 is the Ace double The Communipaths by Suzette Haden Elgin, backed with Louis Trimble's The Noblest Experiment in the Galaxy. The serial number here is derived from The Communipaths; a serial number derived from the Trimble would have been about 58000.
For the later numeric series titles, the number is also part of the ISBN. To form the ISBN (if it exists) for one of these books one prefixes "0" for English language/US, and "441" (Ace's publisher number), to the serial number. The last digit can then be calculated with an ISBN check digit calculator. For example, Christopher Stasheff's Escape Velocity has serial number 21599; the ISBN is 0-441-21599-8.

Having said that, Pandora's Books, Ltd., a well established genre bookstore, claims that this edition was published in 1970. I suggest that regardless of whether we put "1970" in the Year field or the Note field, we add a comment to the effect that the exact date is not known at this time. Ahasuerus 08:20, 17 May 2006 (CDT)

Web publications

Gutenberg (and similar) scans of previously published Works can be easily handled with a URL or two, as Al indicates elsewhere. However, what are we going to do about previously published Works that are available for download for a fee, especially if there are multiple competing vendors out there? On the one hand, you can get increasingly obscure SF online if you are willing to pay $3-10, and the information may be valuable to the users ("I finally get to re-read that "Astonishing Stories" short-short that I have been looking for since 1951!"). And it's not like we don't have links to B&N, Amazon, etc already. On the other hand, links to commercial resources can present additional challenges.

And while we are on the subject, what are we going to do about Authors the bulk of whose work has been published online, e.g. Paul Marlowe, but not on paper? Marlowe's site is linked to by a number of (traditionally) published Authors and he seems to be reasonably well respected, but are we really equipped to support massive linkage to Web sites, free and otherwise? Ahasuerus 19:16, 17 May 2006 (CDT)

Would it be a terrible perversion of the database design to enter Project Gutenberg etexts as just more publications? So they'd be dated the date of the PG release, their ID would be the etext number, and their publisher would be "Project Gutenberg"? They sort of are a publishing outfit. Could that be integrated with some little sprinkle of magic that would make "Project Gutenberg"-published titles clickable? grendel|khan 22:15, 17 May 2006 (CDT)
It's the easiest from a tools point of view - we'd just be adding 1 column to the pubs table, and adding support for an elink to the etext location in the editing tools. We would need to lock down how we want to treat chapter books, since some popular etexts have been short fiction. And there is the issue of what we do with entries that used to point to something, but have since met their demise. Alvonruff 05:04, 18 May 2006 (CDT)
Yes, that could very easily get out of hand. Unlike paper editions, e-texts are notorious URL-hoppers and do we really want to keep a dozen dead links per title? OTOH, if we don't, then will we end up with a Dead Link Spider constantly trawling for broken links and maintaining a list of "last time accessed" hits etc? There may be some freeware spiders that could be impressed into service if we decide to go down that path. Ahasuerus 18:02, 19 May 2006 (CDT)

Vaporware

Here is a good example of potentially confusing vaporware. Does the "Notes"-centric approach look reasonable? Do we want to come up with a standard disclaimer that we could cut-and-paste into all of these "Last Dangerous Visions" wannabes' entries? Ahasuerus 18:02, 19 May 2006 (CDT)

Something like?:
   This book has been announced, but never published. Please do not submit further
   publication information on this title unless you have sighted a physical copy of
   the book.
Alvonruff 18:29, 19 May 2006 (CDT)
Sure, that should work. Do we want to make it a checkbox on the Edit Work page that only a Moderator can modify? And that would be greyed out when a non-privileged user is in the Edit Work screen? Ahasuerus 18:39, 19 May 2006 (CDT)
I just added a vaporware note for whirlwind and its publication but don't know if there is also a vaporware flag to inform the Dissembler that it need not add new publications or is adding a note sufficient? Also, should I source the vaporware note as I confirmed the publications status with the author? Marc Kupper 03:16, 18 Nov 2006 (CST)
Al and I have discussed ways of letting Dissembler know not to add certain things to the database. So far he has added a list of publishers to ignore (mostly RPGs and comics), but anything more complicated, e.g. see the tale of two Dominic Greens is curently done by hand. Al may be adding more smarts to the algorithm shortly, though, so it may be worth asking him directly. Ahasuerus 12:53, 18 Nov 2006 (CST)

Pre-ISBN serial numbers

Some of the pre-ISBN serial numbers in the database have a "#" prefixed to them. I think the rule should be that the serial number (e.g. 256 for the Ballantine first of Blish's "A Case of Conscience") should be entered as is, without a "#" sign. Mike Christie 22:20, 23 May 2006 (CDT)

I can go with that. The "#" symbol denoted that it was a catalog number and not an ISBN - the only place it's used anymore is in supressing the "Buy This Book At" links, which don't work well with catalog numbers. Alvonruff 04:57, 24 May 2006 (CDT)
Could that be handled by checking for at least ten characters in the string? I doubt there are many ten character serial numbers out there. Mike Christie 08:30, 24 May 2006 (CDT)
That's exactly how I'm going to do it, although we have to match on a length of 13 as well for the upcoming ISBN-13 switch at the end of the year. There's the question of matching *exactly* on 10 and 13, or greater than 9. In the early 70's, Ace created some catalog numbers which look like ISBNs, but were of the form <PublisherId>-<CatalogNumber>-<Price>. These didn't have the mandatory checksum, so they look like ISBNs but actually have a variable length based on the price. Alvonruff 12:47, 24 May 2006 (CDT)
Keep in mind, folks, that these are really two different data elements. Many, if not most, books have both an ISBN and a catalog number. Some people, especially collectors, may be interested in this information and, besides, sometimes it may help to distinguish between different printings or even editions. See, for example, the way DAW changed their catalog numbers for different printings a while back.
As a general rule of thumb, using the same field to store different data elements is asking for trouble. You wouldn't believe the kinds of problems we ran into the last time I decided to store ISBN numbers and ICBM launch codes in the same bucket :( Ahasuerus 16:33, 30 May 2006 (CDT)

Changing the priority of the Award linking task?

Awards are currently linked to Works lexically, so whenever a Work's Title is changed (which happens often, what with subtitles, "the", etc), the Award link is lost. As we ramp up our editing/cleanup effort, a lot of Award links will likely be lost and have to be rebuilt manually at some point. Would it be possible to bump up the fix for this problem within the list of priorities? Or, barring that, allow editors to modify Award records to update Title names so that they would match? Ahasuerus 08:45, 24 May 2006 (CDT)

Serials display in Long Works Bibliography

To quote what I wrote above a few weeks ago:

the more I think about it, the more doubts I have about the fact that we display serial information on the Long Works page. Do people really need to know this up front? Especially when the author's bibliography is long and screen real estate is at a premium?

After playing with the data some more, I see even more problems with the current approach as expemplified by Jack_Williamson's Long List Bibliography. First, there is the real estate problem mentioned above. In Williamson's case, serial data takes up over 70 lines of screen space, which makes the data harder to absorb. Second, the current algorithm matches serials agains both novels and collections, which leads to problems -- see Williamson's "The Alien Intelligence", which is listed once as a novel and once as a collection, both times wiht the matching Serial list. Finally, are we really supplying useful information to the users by displaying half the Publication-level record (no magazine title, just the part number and the year) up front? After all, the data is readily available from the Work's page, where the user can also view other Publications for this Work. Is there some extra benefit to the current way of doing Serials that I am missing? Ahasuerus 16:45, 30 May 2006 (CDT)

Clarifying the purpose of the ISFDB Wiki to new users

Some of our users may not be quite sure what the ISFDB Wiki will be used for, namely its "data collection and verification" aspect -- see this note left on my Talk page. Do you think we should have a canned message automatically displayed on each empty Wiki page that belongs to the "Author" namespace to avoid confusion? Update: I have posted a response on the user's Talk page. Ahasuerus 19:51, 1 Jun 2006 (CDT)

P.S. Also note this User page, this Author page and this Author page. Ahasuerus 19:53, 1 Jun 2006 (CDT)
I also think similar text could appear on the Main Page; something derived from the note Ahasuerus just wrote would work well. It certainly seems to be a common misapprehension. Mike Christie 20:36, 1 Jun 2006 (CDT)

The fine line between information and self-promotion

Where should we draw the line between bibliographic information in "Synopsis"/"Notes" and self-promotion? Would you say that this Synopsis is OK? Ahasuerus 15:47, 17 Jun 2006 (CDT)

Translations Redux

Reviewing the discussion above, I note that the issue of handling translations appears to be still open. Do we list them as "Works" or as "Publications" under their parent Work?

The best argument (that I can think of) for listing translations as Publications is to avoid cluttering Work level bibliographies with dozens of unreadable (to most people) titles. Think of the mess we would have on our hands if we listed every translation of Heinlein's Works as Variant Titles!

The best argument (that I can think of) for listing translations as Works is the fact that there are foreign language omnibuses and collections that have no English language analogs (no pun intended!)

Or do we split the difference and enter translations as Publications if there is a parent Work and as Works if there is none? If so, then how do we handle translators? Ahasuerus 14:04, 19 Jun 2006 (CDT)

How about saying that the Title/Work is the canonical name, usually the first edition? If that's in a foreign language, that's the work.
Actually, that opens yet another can of worms. Say you have a Work by an English language Author that was published in a foreign language first, e.g. Charles Sheffield's Convergence. Should we use the foreign language title as the canonical name when it's first published? And do we then change the canonical name to the English language version when it comes out in English? What about the books that were originally written in English, but have only been published in other languages, e.g. volumes 4 and 5 of "Ansen Dibell"'s The King of Kantmorie series or the last three "cycles" of Ken Bulmer's Prescot books? Ahasuerus 14:50, 19 Jun 2006 (CDT)
So a German Asimov collection that has no corresponding English language version is a Title (and a Publication); a German translation of The Rest of the Robots is just a publication with a variant title. Mike Christie 14:12, 19 Jun 2006 (CDT)
One possible problem with this approach is that it's not always easy to tell whether a foreign language collection is based on an existing English language one or a brand new compilation. For example, OCLC Fiction Finder lists "Il Twonk, Il Tempo E la Follia: Racconti Di Fantascienza" as an Italian Kuttner/Moore collection. Since we don't know which stories are collected in this book, how can we be sure that it's a new Work and not a Publication? Granted, we may have the same problem with obscure English language variant titles, but it is likely to be much more pronounced with translations.
The other possible problem is that sometimes foreign publishers will drop a few stories from a collection for space or cost reasons. For example, German language translations are always longer than English language originals because German words are on average longer than English words. (Robert Jordan's books that were barely publishable as a single paperback in the US had to be split in 2 in Germany). You can see how German publishers may be inclined to drop a story or three from some collections. Do we call this abridged collection a Variant Title and add "(abridged)" in parentheses the way we would handle a partial English language reprint? Ahasuerus 14:50, 19 Jun 2006 (CDT)

Translations: Foreign language originals

A forthcoming book (see Rotten Tomatoes for a discussion of the recent movie based on the novel) by Sergei_Lukyanenko presents a related problem. Dissembler entered a slightly garbled version of its English language title, Night Watch, as the Work title. However, if we follow Mike's suggestion above, we would want to change the canonical Work title to the Russian language original, Nochnoi Dozor. Then, if we follow the precedent set by Jules_Verne et al, we would make Night Watch a Variant title while other, non-English language, translations will become mere Publications.

However, since Night Watch is book one in Lukyanenko's Watch series, what should we do with the other volumes? The second volume is easy since it's already listed on Amazon.com as a projected 2007 title. The canonical title will be Dnevnoi Dozor and Day Watch will become a Variant title. We can always zap the latter if the book turns out to be vaporware just like we can zap other announced-but-never-published books. But what about volumes 3 and 4, Sumerechnyi Dozor and Posledni Dozor respectively, which are yet to be announced in English? Their Russian titles will become the canonical titles, but do we want to list the titles' literal translations, Dusk Watch and Last Watch, as variant titles until and unless they get changed by Lukyanenko's American publisher(s)? As a general rule, I am leery of "working titles" which can change 3 times by the time the book comes out, but I can see how a translated title could be useful. Or do we enter these translated titles in the Notes field and hope that our users find them? Ahasuerus 18:17, 29 Jun 2006 (CDT)

Expansions by other hands

Jim Kjelgaard was a notable YA author some 50 years ago. One of his books , Fire-Hunter, is about "a Paleolithic youth, who's expelled from his tribe for innovation". Standard issue YA pre-historic adventures follow. The book is listed in Reginald1 and I entered it earlier today since our provisional rules (see above) explicitly include pre-historic fiction. So far so good.

However, it turns out that Fire-Hunter was one of Jim Baen's favorite books and when his publishing house became big/stable/profitable enough to allow risking personal favorites, he decided to bring it back in print. Since the original text was only 40,000 words, he hired David Drake to expand it to 65,000 words to bring it in line with the expectations of his readers circa 1990. (As an aside, one wonders how much wordage would have been added 15 years later.) The end result was The Hunter Returns as by Kjelgaard and Drake.

The question then is how do we enter this new Work so that it makes sense to the users? The way I currently have it listed, it looks OK on Kjelgaard's Long Works page, but it doesn't appear on David Drake's Long Works page. Interestingly enough, it does appear on Drake's alphabetical and chronological pages. Is this a bug in the Long Works algorithm or is there more to it? And does my overall approach, i.e. making it a Variant Title and marking it as "expanded", make sense? Ahasuerus 16:53, 2 Jul 2006 (CDT)

Author-specific Notes

Once upon a time, in a database far far away, there was support for Author Notes. The records are still there, but they are no longer displayed by the sofware since the idea is to move all free text data to Wikipedia. Although it sounds like a perfectly reasonable idea in theory, what will we do about Authors biblios that could really use a free text comment? For example, David_Mack the Star Trek guy is not the same person as David (W.) Mack the comic book guy, which causes no end of confusion. There are other "[2]"s in the database as well. Shouldn't we have something in the Author section of the Long Works biblio that would alert our users to these kinds of issues? Ahasuerus 16:10, 12 Jul 2006 (CDT)

And to make matters worse, I have just discovered that David (W.) Mack is doing the cover art for David (A.) Mack's upcoming (10/06) Wolverine novel. That's just plain evil :( Ahasuerus 17:06, 12 Jul 2006 (CDT)
That is beyond kismet - that's a downright conspiracy. Well, it's easy enough to revive the field... a bibliographic note in the wiki is too oblique? Alvonruff
I added a note to David (A.) Mack's Wiki article, but I suspect that it will be seen only by our editors and not by regular users. Ahasuerus 19:35, 12 Jul 2006 (CDT)

Serial Data Display

It looks like Serial records that are not associated with book Titles are not displayed as part of Long Works or Short Works listings, to use Arthur H. Landis' Let There Be Magick! (which was revised as A World Called Camelot, btw) as an example. The serial is displayed on the Alphabetical and Chronological pages, though. Is this by design or happenstance? Ahasuerus 20:38, 20 Aug 2006 (CDT)

Non-linear Series Numbering

Take a look at the way Tony Abbott and his publisher number his popular YA series, The Secrets of Droon. To quote the relevant part:

  • 16. The Knights of Silversnow. [description follows]
  • Special Edition #1. The Magic Escapes. This first ever Special Edition picks up right where The Knights of Silversnow left off and is a Droon adventure like never before, pitting Eric, Julie, Keeah, and Neal against a brand new and particularly mysterious villain. [...]
  • 17. [...]
  • 18. [...]

etc.

Then we have Special Edition #2 between 21 and 22, Special Edition #3 between 25 and 26 and Special Edition #4 between 28 and 29.

I suppose the logical thing to do would be to call Special Edition #1 volume 17 in the series; the book that is labeled Volume 17 would then become volume 18 according to our numbering scheme, etc. Of course, it would also confuse the heck out of everybody :-(

I guess the question is what is the least painful way to catalog this weirdness that wouldn't break the display logic? Do we (or can we) support "16a" or anything along those lines? Ahasuerus 18:55, 18 Dec 2006 (CST)

Number them both 16 and since the display code seems to sort by Series # and then publication date it'll at least get the specials in the correct spots in your list. I don't think there's an easy solution other than hidden ordinals or some other mechanism for explicitly defining the order. For example, I had suggested earlier to sort unnumbered items by date in the middle of the list but that may not get the specials in the correct spot. 21:15, 18 Dec 2006 (CST)Marc Kupper
I just tried that and it seems to work. I actually tried "16+" as the second one, but it stripped off the "+". Mike Christie (talk) 21:21, 18 Dec 2006 (CST)
Apparently, it doesn't like decimals either :( Al, do you think we could allow "16.1" or would it be abusable? I would really prefer the relationship to be immediately obvious (in part so that helpful editors wouldn't try to correct it), but I am not sure how to accomplish it. We could have a separate Secrets Of Droon Special Edition subseries, but that would obscure the link between the four "special edition" books and the main series. Ahasuerus 22:21, 18 Dec 2006 (CST)
A hack that comes to mind is to display the series numbers modulo 1000 with 0 being displayed as blank. You could then number them
  • 6 The Sleeping Giant of Goll (2000) with Tim Jessell
  • 7 Into the Land of the Lost (2000)
  • 14 Voyage of the Jaffa Wind (2002) with David Merrell and Tim Jessell
  • 15 The Moon Scroll (2002) with Tim Jessell
  • 16 The Knights Of Silversnow (2002)
  • 1000 The Magic Escapes (2002)
  • 1017 Dream Thief (2003)
  • 1019 The Coiled Viper (2003)
  • 1021 Flight of the Genie (2004)
  • 2000 Wizard or Witch?
  • 2022 The Isle of Mists
  • 2023 The Fortress of the Treasure Queen
  • 2024 The Race To Doobesh (2005)
  • 2025 The Riddle Of Zorfendorf Castle (2005)
  • 3000 Voyagers of the Silver Sand
  • 3026 Moon Dragon
  • 3027 The Chariot of Queen Zara
A low tech but still somewhat user friendly way to manage this is to an edit-series-order page that looks like the following. Up/down would be links pointing at the cgi that would get passed the series, item, and direction which would then recalculate the 1000x numbering and repaint. Or, you could use radio buttons. Marc Kupper 00:06, 19 Dec 2006 (CST)
Control Series order
Move # Title
down 6 The Sleeping Giant of Goll (2000) with Tim Jessell
up/down 7 Into the Land of the Lost (2000)
up/down 14 Voyage of the Jaffa Wind (2002) with David Merrell and Tim Jessell
up/down 15 The Moon Scroll (2002) with Tim Jessell
up/down 16 The Knights Of Silversnow (2002)
up/down   The Magic Escapes (2002)
up/down 17 Dream Thief (2003)
up/down 19 The Coiled Viper (2003)
up/down 21 Flight of the Genie (2004)
up/down   Wizard or Witch?
up/down 22 The Isle of Mists
up/down 23 The Fortress of the Treasure Queen
up/down 24 The Race To Doobesh (2005)
up/down 25 The Riddle Of Zorfendorf Castle (2005)
up/down   Voyagers of the Silver Sand
up/down 26 Moon Dragon
up 27 The Chariot of Queen Zara

(unindent) Playing with the series numbers won't help too much as the data types are integers, so the database won't accept strings, or "1A", or "1.5", or anything other than "1". The fundamental problem is that there needs to be an ordinal that describes the order the series items are printed in, and a label that describes the title's series number. At present ordinal=label.

Ordinals are pretty much perfect for determining the printing order, as MySQL can then do the ordering without any postprocessing. An easier change than trying to sort floating point or strings would be to add a label field. From the original example, the ordinals would not be displayed (shown here in parenthesis), but would control the ordering. The labels would be displayed:

  • (16) 16. The Knights of Silversnow. [description follows]
  • (17) Special Edition #1. The Magic Escapes. This first ever Special Edition picks up right where The Knights of Silversnow left off and is a Droon adventure like never before, pitting Eric, Julie, Keeah, and Neal against a brand new and particularly mysterious villain. [...]
  • (18) 17. [...]
  • (19) 18. [...]

This would require some additional tools to perform inserts between to adjacent ordinals. Alvonruff 06:33, 19 Dec 2006 (CST)

If we pursue this approach (which seems to reflect reality better than any previously proposed alternative), then we will need to make very clear to the editors why we have two fields for Series Number. We will also need to have a standard for when this mechanism can and can't be used to insert sub-series into the main series. I can see how it could be very tempting to stuff everything into the main series under certain circumstances. Ahasuerus 15:53, 21 Dec 2006 (CST)
I would not show people the second set of ordinals but rather would let them move items in the series up/down. Sub-series are an interesting problem and hmm – it almost seems like we may need hidden anchors in the series list to allow people to position sub-series. This is getting messy – the existing mechanism is defined in the title records and the only title-to-title ordering is the Series # ordinal and title_copyright (first-pub-date). The desire seems to be able to create external lists that have any structure/order and numbering method (decimal, Roman numerals, etc.) plus there is a second desire that a title can be a member of more than one series. With that in mind it seems a case could be made for an entirely external series mechanism that references title records. Marc Kupper 01:09, 23 Dec 2006 (CST)
Series ordering and series numbering have been a messy area for a long time. For example, there is (a) publication order, (b) internal chronological order and (c) preferred reading order; it can be argued that all three have value to ISFDB users and should be displayed independently if they differ. Also, note Feature request 90001 and Bug 30014. I am sure we will revisit this area post-beta, but for now we probably want to save this discussion on some Help template's Talk page or some such. Ahasuerus 11:14, 23 Dec 2006 (CST)
I agree on the revisit and have cut/pasted the entire thread to the Archive and also into Feature:90001. Marc Kupper 16:40, 23 Dec 2006 (CST)