https://isfdb.org/wiki/index.php?title=User:ErsatzCulture/Tagging&feed=atom&action=historyUser:ErsatzCulture/Tagging - Revision history2024-03-28T14:26:40ZRevision history for this page on the wikiMediaWiki 1.35.6https://isfdb.org/wiki/index.php?title=User:ErsatzCulture/Tagging&diff=561765&oldid=prevErsatzCulture: Add link to discussion2019-10-07T01:33:18Z<p>Add link to discussion</p>
<table class="diff diff-contentalign-left diff-editfont-monospace" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 01:33, 7 October 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1" >Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class='diff-marker'>−</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>I think I've seen a few comments on this Wiki or in the bugs/feature requests about the low usage of tagging, and I agree. To try to help push that area forward, I'm in the middle of hacking together some tooling to help me know where I might be able to improve the tag data.</div></td><td class='diff-marker'>+</td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>I think I've seen a few comments on this Wiki <ins class="diffchange diffchange-inline">- [[ISFDB:Community_Portal#Handling_erroneous_tags|this]] is the main one I'm thinking of, although that content will get archived and the link will break sooner or later - </ins>or in the bugs/feature requests about the low usage of tagging, and I agree. To try to help push that area forward, I'm in the middle of hacking together some tooling to help me know where I might be able to improve the tag data.</div></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"></td></tr>
<tr><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Report on books in my Goodreads collection that have no or few tags in ISFDB =</div></td><td class='diff-marker'> </td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>= Report on books in my Goodreads collection that have no or few tags in ISFDB =</div></td></tr>
<!-- diff cache key isfdb-mw_:diff::1.12:old-561508:rev-561765 -->
</table>ErsatzCulturehttps://isfdb.org/wiki/index.php?title=User:ErsatzCulture/Tagging&diff=561508&oldid=prevErsatzCulture: Initial page2019-10-04T11:09:31Z<p>Initial page</p>
<p><b>New page</b></p><div>I think I've seen a few comments on this Wiki or in the bugs/feature requests about the low usage of tagging, and I agree. To try to help push that area forward, I'm in the middle of hacking together some tooling to help me know where I might be able to improve the tag data.<br />
<br />
= Report on books in my Goodreads collection that have no or few tags in ISFDB =<br />
<br />
(This will probably be of little use or interest to anyone who doesn't track their collection/reading on Goodreads.)<br />
<br />
I have a script that goes through a CSV export from Goodreads, tries to match up the books against a local copy of the ISFDB database, and report on any books that don't have tags, have few tags, or don't have any of what I call "core" tags (which are the main/top level genres covered on ISFDB e.g. "science fiction", "fantasy", "horror" and a couple of others).<br />
<br />
Some example output (the real output is colour coded, so this extract is less legible than it might otherwise be):<br />
<br />
./check_isfdb_tags.py -f read -f science-fiction<br />
Diaspora by Greg Egan has 10 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1399 <br />
ERROR:root:Could not find M.G. Wheaton/Emily Eternal in ISFDB - skipping<br />
Lock In by John Scalzi has 3 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1658286 <br />
Children of Ruin by Adrian Tchaikovsky has 5 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2528986 <br />
Europe at Midnight by Dave Hutchinson has 4 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1916529 <br />
Ubik by Philip K. Dick has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?948 <br />
A Memory Called Empire by Arkady Martine has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2500310 <br />
Semiosis by Sue Burke has 3 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2300478 <br />
Apex by Ramez Naam has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1864582 <br />
Starplex by Robert J. Sawyer has 0 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?5748 <br />
ERROR:root:Could not find Neal Asher/Gridlinked: An Agent Cormac Novel 1 in ISFDB - skipping<br />
The City in the Middle of the Night by Charlie Jane Anders has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi? 2486327 <br />
The Left Hand of Darkness by Ursula K. Le Guin has 20 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?7662 <br />
ERROR:root:Could not find C.L. Moore/Doomsday Morning in ISFDB - skipping<br />
Shadow Captain by Alastair Reynolds has 1 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2469371 <br />
The Handmaid's Tale by Margaret Atwood has 7 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1816 <br />
The Handmaid's Tale lacks a core tag: sexism,dystopia,into-tv,into-movie,list NPR Top100 (2011),misogyny,near future http://www.isfdb.org/cgi-bin/title.cgi?1816 <br />
ERROR:root:Could not find Aliette de Bodard/The Tea Master and the Detective in ISFDB - skipping<br />
Embers of War by Gareth L. Powell has 2 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?2301133 <br />
The Fountains of Paradise by Arthur C. Clarke has 4 tags in ISFDB http://www.isfdb.org/cgi-bin/title.cgi?1903 <br />
... etc ...<br />
<br />
The current version is pretty dumb/lazy when it comes to matching up author name (e.g. "C.L. Moore vs C. L. Moore") or title discrepancies (e.g. "Gridlinked" vs "Gridlinked: An Agent Cormac Novel 1"), hence the "ERROR"s above.<br />
<br />
The idea is that I run this script, and can easily click on the ISFDB URL for any that it thinks could do with improvement.<br />
<br />
The code for this report is in my GitHub repos [https://github.com/JohnSmithDev/GRAnalysis here] and [https://github.com/JohnSmithDev/ISFDB-Tools here], but requires a fair bit of technical knowledge to set up. If anyone wants me to run it against their own Goodreads data, just make the CSV export of your collection available and I'll happily run the script against it and give you the results back.<br />
<br />
(There's no technical reason that something like this couldn't be brought into a state where it was within ISFDB itself, and you could upload the CSV to the site - however, this processing is relatively database intensive, and I don't know that having that sort of high-load job on a public facing site would be a good idea.)</div>ErsatzCulture