Difference between revisions of "Help:How to work with Records Built by Robots"

From ISFDB
Jump to navigation Jump to search
Line 45: Line 45:
  
 
Robots generally do not have access to Publication Series information, so it needs to be added manually if available.
 
Robots generally do not have access to Publication Series information, so it needs to be added manually if available.
 +
 +
== Series ==
 +
 +
Most records built by robots do not include series information. Note, however, that some sources embed the series name in the title, usually in parentheses, e.g. "Empire of the Dragon (Event Group Thriller)". When dealing with embedded series names, they need to be manually moved from the title field to the series field.

Revision as of 14:07, 14 July 2018

General Issues with Records Built by Robots

Bibliographic records built by robots present unique challenges.

Existence

The first question to ask yourself when reviewing a record built by a robot is whether the book has actually been published or is about to be published. ISFDB robots use a number of sources to find SF-related books, notably library catalogs and online bookstores like Amazon. Some of these sources list books which were announced at some point in the past but were later canceled, renamed or otherwise changed.

If a record built by a robot provides a link to the Web page that served as the source of the data, check the linked page to see if the ISBN looks like it may have been canceled. ISBN cancellation is very likely if the linked record is very sparse, e.g. there is no page count, no publisher, no cover scan, no price, etc. Note that cancelled ISBNs are not deleted from Amazon's databases. Instead Amazon changes their publication date to a date up to 10-15+ years in the future.

Eligibility

The second question which frequently arises when dealing with records created by robots is whether the book is within the scope of the project and should be listed by ISFDB. Many comic books, manga, RPG modules, etc are listed by Amazon and other sources in a way that makes them look like SF books. More often than not, our robots are not able to tell the difference. In addition, a book labeled "horror" may be a "psychological horror" novel with no SF elements and therefore not eligible for inclusion in ISFDB (unless the author is over that hard to define "certain threshold" mentioned by ISFDB:Policy.) Similarly, Amazon may file a non-genre book under "fantasy" when it's actually an "erotic fantasy", under "ghosts" because the title is "Ghosts of the Past", and so on.

Data Quality Issues

Records built by robots can have various data quality issues. The data entry clerks used by online bookstores and even by some libraries tend to make a lot of data entry errors. This includes but is not limited to reversing first and last names, misspelling names and titles, entering irrelevant information as part of the title, and assigning incorrect or misleading subject headings to titles.

Publisher Issues

  • The publisher name used by the robot may be incorrect. This frequently happens when a US store is selling UK books or vice versa. It can also happen when an online bookstore uses the name of the distribution company instead of the name of the publisher. Sometimes a publisher goes out of business and some of its announced books are later released by another publisher.
  • The stated publisher is not disambiguated with a country-specific suffix, e.g. the record says "Tor" instead of "Tor (UK)".
  • The publisher name is missing. If the book looks like it was probably published by a traditional publisher, this is an indication that the ISBN has been canceled or delayed. On the other hand, if the book was self-published, it's possible that the author chose not to use a publisher name, so further research is advised. Amazon's Look Inside can be particularly helpful.
  • The publisher name may be -- as far as ISFDB is concerned -- the name of a Publication Series. For example, a robot may create a record for a book published by "Harlequin Nocturne", which we view as a publication series published by Harlequin.
  • The publisher name doesn't exist in ISFDB. Sometimes it indicates that the robot has found a brand new publisher. However, in most cases it means that the record contains an altered or corrupted version of another publisher name that ISFDB already has on file, e.g. "Berkley" instead of "Berkley Books". When this happens, make sure to research the publisher and correct the data.
  • Amazon's branch "CreateSpace" facilitates self-publishing. Some people who publish books via CreateSpace form a publishing company of their own and include its name in the publication while other people do not. In most cases Amazon lists CreateSpace-published books as by "CreateSpace" regardless of whether another publisher name was specified in the book. For this reason it is important to check what's stated in the book using Amazon's Look Inside. If another publisher name is specified, change the value in the Publisher field to that name. If no publisher name is specified, leave the field blank.

Price Issues

The price recorded by the robot may be incorrect. When a book is originally published in one country and offered for sale in another country (e.g. US/UK), the price may be listed in a different currency. When the price field of a robotic record contains an unlikely looking number like "£4.83", it usually indicates a conversion of a foreign price. Note, however, the opposite is not always true: it's possible for a record to have a normal-looking price even it was published in another country. This is due to the fact that booksellers often adjust prices of imported books to look more normal. Also, some publishers have a US office and a UK office, which can make distinguishing cases of "simultaneous publication" from imports challenging.

Page Count

The page counts listed by Amazon are based on publisher estimates produced months ahead of publication. They are almost invariably different compared to the actual page count of the published book. Publisher Web pages, library catalogs and OCLC frequently have more accurate page counts.

Format

Robots often derive the format information from book dimensions and other data provided by third parties. Sometimes their calculations are incorrect. Also, robots do not always have access to the same data that humans do. If a robot-generated record doesn't have a value in the format field, it's worth checking the source of the robot's information to see if a human can figure it out.

Image URL

A robot can't tell whether an image is good or bad. For example, Amazon may display a placeholder image or even a blank image which says something like "Image to be unveiled prior to publication". These placeholder URLs need to be deleted from the publication record.

Publication Series

Robots generally do not have access to Publication Series information, so it needs to be added manually if available.

Series

Most records built by robots do not include series information. Note, however, that some sources embed the series name in the title, usually in parentheses, e.g. "Empire of the Dragon (Event Group Thriller)". When dealing with embedded series names, they need to be manually moved from the title field to the series field.