User:ErsatzCulture/Tagging
I think I've seen a few comments on this Wiki - this is the main one I'm thinking of, although that content will get archived and the link will break sooner or later - or in the bugs/feature requests about the low usage of tagging, and I agree. To try to help push that area forward, I'm in the middle of hacking together some tooling to help me know where I might be able to improve the tag data.
Report on books in my Goodreads collection that have no or few tags in ISFDB
(This will probably be of little use or interest to anyone who doesn't track their collection/reading on Goodreads.)
I have a script that goes through a CSV export from Goodreads, tries to match up the books against a local copy of the ISFDB database, and report on any books that don't have tags, have few tags, or don't have any of what I call "core" tags (which are the main/top level genres covered on ISFDB e.g. "science fiction", "fantasy", "horror" and a couple of others).
Some example output (the real output is colour coded, so this extract is less legible than it might otherwise be):
./check_isfdb_tags.py -f read -f science-fiction Diaspora by Greg Egan has 10 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1399 ERROR:root:Could not find M.G. Wheaton/Emily Eternal in ISFDB - skipping Lock In by John Scalzi has 3 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1658286 Children of Ruin by Adrian Tchaikovsky has 5 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2528986 Europe at Midnight by Dave Hutchinson has 4 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1916529 Ubik by Philip K. Dick has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?948 A Memory Called Empire by Arkady Martine has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2500310 Semiosis by Sue Burke has 3 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2300478 Apex by Ramez Naam has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1864582 Starplex by Robert J. Sawyer has 0 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?5748 ERROR:root:Could not find Neal Asher/Gridlinked: An Agent Cormac Novel 1 in ISFDB - skipping The City in the Middle of the Night by Charlie Jane Anders has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi? 2486327 The Left Hand of Darkness by Ursula K. Le Guin has 20 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?7662 ERROR:root:Could not find C.L. Moore/Doomsday Morning in ISFDB - skipping Shadow Captain by Alastair Reynolds has 1 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2469371 The Handmaid's Tale by Margaret Atwood has 7 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1816 The Handmaid's Tale lacks a core tag: sexism,dystopia,into-tv,into-movie,list NPR Top100 (2011),misogyny,near future http://www.isfdb.org/cgi-bin/title.cgi?1816 ERROR:root:Could not find Aliette de Bodard/The Tea Master and the Detective in ISFDB - skipping Embers of War by Gareth L. Powell has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2301133 The Fountains of Paradise by Arthur C. Clarke has 4 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1903 ... etc ...
The current version is pretty dumb/lazy when it comes to matching up author name (e.g. "C.L. Moore vs C. L. Moore") or title discrepancies (e.g. "Gridlinked" vs "Gridlinked: An Agent Cormac Novel 1"), hence the "ERROR"s above.
The idea is that I run this script, and can easily click on the ISFDB URL for any that it thinks could do with improvement.
The code for this report is in my GitHub repos here and here, but requires a fair bit of technical knowledge to set up. If anyone wants me to run it against their own Goodreads data, just make the CSV export of your collection available and I'll happily run the script against it and give you the results back.
(There's no technical reason that something like this couldn't be brought into a state where it was within ISFDB itself, and you could upload the CSV to the site - however, this processing is relatively database intensive, and I don't know that having that sort of high-load job on a public facing site would be a good idea.)