User:Fixer/Python MySQL Migration

From ISFDB
< User:Fixer
Revision as of 15:12, 24 December 2023 by Ahasuerus (talk | contribs) (2023-12-24 update)
Jump to navigation Jump to search
  • 2023-08-01: Since the current version of the Fixer software depends on Windows 10 to function and Windows 10 will no longer be getting security patches after 2025-10-25, we need to migrate all of Fixer's processes to Linux, Python and MySQL. At this time, the ISBN/ASIN identification process and the Fixer-Amazon API use Python.
  • 2023-08-02: Tentative migration plan is as follows:
    1. Move Fixer's data from "data stores" to MySQL tables on Linux. At that point the Fixer software should be talking to MySQL using ODBC. - The ODBC step was removed in late 2023
    2. Rewrite the current version of the Fixer software in Python 3, which will eliminate Windows dependencies and make it possible for other developers to maintain/upgrade Fixer's code if something happens to me. The UI will still be "roll-and-scroll" at that point.
    3. Upgrade Fixer's Amazon interface to support multiple concurrent users, which will require a single-threaded request queue due to Amazon throttling.
    4. Upgrade the UI from "roll-and-scroll" to regular HTML.
    5. Integrate the new HTML interface with the ISFDB code; leverage ISFDB/Wiki account authentication.
    6. Deploy the new Fixer on isfdb2.org and make it available to moderators.
  • 2023-08-03. Fixer-MySQL communications via ODBC appear to be more problematic than expected due to 32 bit/64 bit incompatibilities with the development server's version of MySQL (5.5 32-bit). Upgrading MySQL to 8.0 may be needed for it to work.
  • 2023-08-04. Fixer-Amazon API upgraded to be compatible with Python 3.
  • 2023-08-05. Started designing SQL tables for the MySQL-based version of Fixer.
  • 2023-08-05/09. Download of MARC21 archives made publicly available by major libraries like LOC, BL and DNB. We may be able to leverage them once Fixer is fully migrated.
    • Note that MARC21 files can be read and parsed using the Python library Pymarc -- see documentation. Versions 5+ of Pymarc require Python 3.7+.
  • 2023-08-06. Created a new MySQL database where new Fixer tables and copies of publicly available ISFDB tables will coexist.
  • 2023-08-07. Backed up, deleted and consolidated unused Fixer data stores. Moved remaining Queue 4 and 5 ISBNs/ASINs to Queues 1-3; deleted Queues 4 and 5.
  • 2023-08-09. Finished the first iteration of the MySQL table design.
  • 2023-08-09/10. Upgraded the process of exporting Fixer's data stores to XML.
  • 2023-08-10. Started work on importing Fixer-generated XML into MySQL.
  • 2023-08-12. Upgraded XML dumps made public as part of the 2023-08-12 backups.
  • 2023-11-26. (Hopefully) finished the software process of importing Fixer's core data into MySQL.
  • 2023-12-22. Copied Fixer's ISBN and ASIN data stores to MySQL. It took 22 hours using HDD. Moving the MySQL databases to SSD should cut it down to around 3 hours.

Outstanding tasks:

  • Reconciliation of weekly live backups with Fixer's data
  • Copy lists of "recognized", "low priority", "public domain", "clean" and "waiting" publishers to MySQL
  • Copy lists of "public domain", "clean" and "waiting" authors to MySQL
  • Core internal reports:
    • Show AddPub/NewPub breakdown
    • List ISBNs/ASINs alphabetized by title (primarily used for submitting light novels)
    • ISBN/ASIN counts by period
    • Format/binding breakdown
  • Submission options:
    • Submit a single ASIN
    • Submit high priority ISBNs for a period
    • Submit ASINs for a field
    • Submit ISBNs OR ASINs for one author
    • Submit ISBNs AND ASINs for one author
    • Unsubmit an ASIN
  • Automated priority assignment:
    • Assign priority to selected ASINs
    • Assign priority to high priority ASINs
  • Author maintenance:
    • Add an author to the list of Public Domain Authors
    • Add an author to the list of Clean Authors
    • Add an author to the list of Waiting Authors
  • Publisher maintenance:
    • Add a publisher to the list of Waiting Publishers
    • Add a publisher to the list of Recognized Publishers
    • Add a publisher to the list of Public Domain Publishers
    • Add a publisher to the list of Low Priority Publishers
  • Display lists of authors and publishers
  • Look up an author name to see which lists s/he is a part of
  • Look up a publisher name to see which lists it is a part of
  • Minor and/or rarely used reports:
    • Sort ASINs by publisher
    • Sort ASINs by month
    • Breakdown by binding
    • Author breakdown for a field
    • Year breakdown for a field
    • Display stats for an author list
    • Display Top 10 breakdown by browse node
    • List top ASINs for browse node
  • Clean/waiting authors and publishers:
    • Submit ASINs for clean authors
    • Submit ASINs for clean publishers
    • Submit ASINs for waiting authors
    • Display author breakdown for waiting publishers
    • Display queue breakdown for clean authors
    • Display queue breakdown for waiting authors
    • Display queue breakdown for waiting publishers
  • Monthly download:
    • Import ASINs from a flat file
    • Count of new ASINs/ISBNs that need to be reconciled with the Amazon API
    • Query the Amazon API for new ASINs/ISBNs
    • Count of previously unrecognized ASINs/ISBNs with browse node data
    • Query API for previously unrecognized ASINs/ISBNs with browse node data