User:Fixer

From ISFDB
Revision as of 13:48, 26 November 2022 by Ahasuerus (talk | contribs) (→‎Fixer Dumps and Lists: 2022-11-26 XML dumps)
Jump to navigation Jump to search

Fixer is a robot account used to create submissions via the Web API. It is currently maintained by User:Ahasuerus.

ISBNs ready for manual submission

See User:Fixer/Public for lists of books which are ready for manual submission.

Fixer's Queues

Fixer Dumps and Lists

The linked zipped file contains 5 text files. The 2 "dump" files are XML files with all of Fixer's internal data in a mostly self-explanatory XML format. The "Ranks" data elements show the highest monthly rankings for each Amazon browse node. The "Key" attribute is the browse node number. The last XML file contains brief descriptions of Amazon's SF-related browse nodes.

The 2 non-XML files are lists of all ISBNs and ASINs known to Fixer. Each line of the ISBN file contains 4 "|"-delimited fields: ISBN-10 (for 978 ISBN-13s), ISBN-13, priority, linked ASIN (if available). Each line of the ASIN file contains 3 "|"-delimited fields: ASIN, linked ISBN-13 (if available), priority. Valid priorities are as follows:

    • Queue "n" (for "New") is for ISBNs/ASINs which haven't been prioritized yet
    • Queue 0 is for ISBNs/ASINs which Fixer doesn't have enough information about to create a submission
    • Queue 1 is for ISBNs/ASINs associated with major publishers and established authors (highest priority)
    • Queue 2 is for self-published and other "minor" authors whose other books are already cataloged in ISFDB
    • Queue 3 is the lowest priority queue and mostly contains books by self-published authors not in ISFDB
    • Queue 8 is for ISBNs/ASINs which have been submitted to the main ISFDB system; they may have been approved or rejected
    • Queue 9 is for ISBNs/ASINs which have been rejected on Fixer's side without an ISFDB submission having been created; reasons for rejections include the book being non-genre/never published and the book already existing in the ISFDB database

Prioritized Cleanup Tasks as of 2020-03

  1. Waiting Authors. Straightforward, well understood, limited scope, requires minimum work on the maintainer's side. A few authors may be time-consuming, e.g. Victor Appleton and Chuck Tingle.
  2. Waiting Publishers. 9,255 outstanding ISBNs and ASINs.
  3. 2013-2015 n-p's and 1-p's. The maintainer will need to prioritize around 3,000 n-p ISBNs. Fairly high impact.
  4. Pre-2013 n-p's. With some exceptions, requires the maintainer to prioritize many thousands of n-p's per year. Doable but time-consuming. Submitted ISBNs require more research due to the records' age.
  5. 2012-2017 n-e's. Requires the maintainer to prioritize 1,000-2,000 e-ISBNs per month. Doable but time-consuming. Many ISBNs are no longer recognized by the Amazon API; they will need to be moved to Priority 0 for now and researched later.
  6. Pre-2012 n-e's. 100-1,000 e-ISBNs per month, most of them probably no longer recognized by the Amazon API.

How Fixer Works

Monthly Process

Here is how Fixer's monthly download/upload process works as of 2020-03:

  1. The software package known as "Fixer" resides on the ISFDB development server which is different from the main ISFDB server. Fixer can't be moved to the main ISFDB server without a complete rewrite due to numerous software compatibility issues.
  2. Every month Fixer queries Amazon.com and Amazon UK for new SF books, where "new" means "since Fixer was last run". The captured data is stores in Fixer's internal database, which is separate from the main ISFDB database. Note that the data that Amazon sends back to Fixer is not always the same as the data displayed on Amazon's Web pages. Among other things, this means that Fixer doesn't have access to cover artists or anthology editors.
  3. If the ISBN of a newly captured book has been previously submitted to ISFDB or rejected, then the ISBN is ignored. Otherwise Fixer adds the ISBN and its publication data to the main "queue".
  4. Fixer tries to determine whether to use the US data or the UK data for each ISBN. For example, if Amazon.com says that the publisher is "Baen" and Amazon UK says that the publisher is "Unknown", then Fixer uses the US record. If the data is incomplete or if the publisher is active on both sides of the Atlantic, manual assignment is required.
  5. For each captured ISBN, Fixer determines whether the ISBN should be automatically assigned to Queue 3 (see below for an explanation of Fixer's queues.) This is currently done for:
    • ISBNs starting with 2-9 (non-English pubs)
    • audio, CD and MP3 books
    • books published by the better known vanity publishers
  6. Fixer examines all captured records and separates "likely high priority" ISBNs from "likely low priority" ISBNs. High priority ISBNs are associated with major publishers and/or authors who have records in ISFDB.
  7. Fixer massages the captured data. The process includes assigning publication format codes, regularizing publisher names, adding publication series (if available) and so on.
  8. The robot maintainer (User:Ahasuerus as of 2020) manually reviews the "high priority" list and then the "low priority" list and assigns ISBNs to the following queues:
    • Queue "n" (for "New") is for ISBNs/ASINs which haven't been prioritized yet
    • Queue 0 is for ISBNs/ASINs which Fixer doesn't have enough information about to create a submission
    • Queue 1 is for ISBNs/ASINs associated with major publishers and established authors (highest priority)
    • Queue 2 is for self-published and other "minor" authors whose other books are already cataloged in ISFDB
    • Queue 3 is the lowest priority queue and mostly contains books by self-published authors not in ISFDB
    • Queue 8 is for ISBNs/ASINs which have been submitted to the main ISFDB system; they may have been approved or rejected
    • Queue 9 is for ISBNs/ASINs which have been rejected on Fixer's side without an ISFDB submission having been created; reasons for rejections include the book being non-genre/never published and the book already existing in the ISFDB database
  9. Each numbered queue is further subdivided into a "paper" queue, an "ebook" queue, an "audio" queue and an "other" queue based on each book's format. Each resulting queue is further subdivided into an "AddPub" queue and a "NewPub" queue.
  10. Note that Fixer also captures publications which have Amazon-assigned ASINs but no ISBN. They are currently stored in a separate database and are not prioritized, so they are not submitted as part of the monthly Fixer cycle.
  11. The robot maintainer uses the ISFDB Web API to create batches of 20-200 NewPub or AddPub submissions at a time based on the date in Queue 1. Submissions are created using Fixer's ISFDB account and are automatically put on hold on behalf of the moderator who has volunteered to work on them (User:Anniemod as of 2020.)
  12. The volunteer moderator reviews and approves Fixer's submissions. See Help:How to work with Records Built by Robots and Help:Screen:Moderator#Moderating_Automated_Submissions for a discussion of the challenges associated with processing robotic submissions.
  13. The volunteer moderator provides feedback to the robot maintainer if she notices recurring errors in Fixer's logic, e.g. incorrectly regularized publisher names.

Maintaining the "Clean" lists

Fixer maintains lists of Clean Authors, Waiting Authors, and Waiting Publishers. Any moderator can volunteer to work on one or more "clean" authors or publishers. To volunteer, leave a message on the robot maintainer's Talk page.

Activity