< User:ErsatzCulture
Revision as of 21:33, 6 October 2019 by ErsatzCulture (talk | contribs) (Add link to discussion)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

I think I've seen a few comments on this Wiki - this is the main one I'm thinking of, although that content will get archived and the link will break sooner or later - or in the bugs/feature requests about the low usage of tagging, and I agree. To try to help push that area forward, I'm in the middle of hacking together some tooling to help me know where I might be able to improve the tag data.

Report on books in my Goodreads collection that have no or few tags in ISFDB

(This will probably be of little use or interest to anyone who doesn't track their collection/reading on Goodreads.)

I have a script that goes through a CSV export from Goodreads, tries to match up the books against a local copy of the ISFDB database, and report on any books that don't have tags, have few tags, or don't have any of what I call "core" tags (which are the main/top level genres covered on ISFDB e.g. "science fiction", "fantasy", "horror" and a couple of others).

Some example output (the real output is colour coded, so this extract is less legible than it might otherwise be):

   ./ -f read -f science-fiction
   Diaspora by Greg Egan has 10 tags in ISFDB 
   ERROR:root:Could not find M.G. Wheaton/Emily Eternal in ISFDB - skipping
   Lock In by John Scalzi has 3 tags in ISFDB 
   Children of Ruin by Adrian Tchaikovsky has 5 tags in ISFDB 
   Europe at Midnight by Dave Hutchinson has 4 tags in ISFDB 
   Ubik by Philip K. Dick has 2 tags in ISFDB 
   A Memory Called Empire by Arkady Martine has 2 tags in ISFDB 
   Semiosis by Sue Burke has 3 tags in ISFDB 
   Apex by Ramez Naam has 2 tags in ISFDB 
   Starplex by Robert J. Sawyer has 0 tags in ISFDB 
   ERROR:root:Could not find Neal Asher/Gridlinked: An Agent Cormac Novel 1 in ISFDB - skipping
   The City in the Middle of the Night by Charlie Jane Anders has 2 tags in ISFDB 2486327 
   The Left Hand of Darkness by Ursula K. Le Guin has 20 tags in ISFDB 
   ERROR:root:Could not find C.L. Moore/Doomsday Morning in ISFDB - skipping
   Shadow Captain by Alastair Reynolds has 1 tags in ISFDB 
   The Handmaid's Tale by Margaret Atwood has 7 tags in ISFDB 
   The Handmaid's Tale lacks a core tag: sexism,dystopia,into-tv,into-movie,list NPR Top100 (2011),misogyny,near future 
   ERROR:root:Could not find Aliette de Bodard/The Tea Master and the Detective in ISFDB - skipping
   Embers of War by Gareth L. Powell has 2 tags in ISFDB 
   The Fountains of Paradise by Arthur C. Clarke has 4 tags in ISFDB 
   ... etc ...

The current version is pretty dumb/lazy when it comes to matching up author name (e.g. "C.L. Moore vs C. L. Moore") or title discrepancies (e.g. "Gridlinked" vs "Gridlinked: An Agent Cormac Novel 1"), hence the "ERROR"s above.

The idea is that I run this script, and can easily click on the ISFDB URL for any that it thinks could do with improvement.

The code for this report is in my GitHub repos here and here, but requires a fair bit of technical knowledge to set up. If anyone wants me to run it against their own Goodreads data, just make the CSV export of your collection available and I'll happily run the script against it and give you the results back.

(There's no technical reason that something like this couldn't be brought into a state where it was within ISFDB itself, and you could upload the CSV to the site - however, this processing is relatively database intensive, and I don't know that having that sort of high-load job on a public facing site would be a good idea.)