User:ErsatzCulture/GoodreadsExportCheck

From ISFDB
Jump to navigation Jump to search

Goodreads allows you to export your book collection there as a CSV. I wrote a bunch of Python command line tools to interrogate and report on the data in these CSVs.

I have a couple of rough works-in-progress that also compare this CSV data against a local copy of the ISFDB database, with the intention of using them to flag up possible errors and omissions in ISFDB. (Obviously, in many cases, any inconsistency is at least as likely to be on the Goodreads side.) These use some more Python code I wrote to work with an ISFDB database.

None of this stuff is yet in a state that it's ready to be made publicly available - and it's inherently database intensive, so probably not something you'd want to make available to anyone on the public ISFDB website - but if anyone cares to help me out by providing a copy of their Goodreads data for me to test with, that would be appreciated. (Warning: I absolutely will look through the data and make judgements on how shocking your taste in books is ;-)

Check Goodreads CSV for ISBNs not in ISFDB

Sample output:

   goodreads_analysis $ ./check_goodreads_isbns.py -f science-fiction
   ERROR:root:No ISBN known for 'Machines Like Me' by Ian McEwan [2019], read (1787331679 / 9781787331679)
   ERROR:root:No ISBN known for 'The Power' by Naomi Alderman [2016], to-read (0241983444 / None)
   ERROR:root:No ISBN known for 'The City in the Middle of the Night' by Charlie Jane Anders [2019], read (None / 9781789091618)
   ERROR:root:No ISBN known for 'Slow Bullets' by Alastair Reynolds [2015], read (9781473218 / None)
   ERROR:root:No ISBN known for 'The Forever War' by Joe Haldeman [1974], read (1407230085 / 9781407230085)
   ERROR:root:No ISBN known for 'Dying Inside' by Robert Silverberg [1972], read (1591760097 / 9781591760092)
   ERROR:root:No ISBN known for '84K' by Claire North [2018], to-read (0356507386 / 9780356507385)
   ERROR:root:No ISBN known for 'The Human Division (Old Man's War #5)' by John Scalzi [2013], to-read (144729047X / None)
   ERROR:root:No ISBN known for 'The Mighty One: My Life Inside the Nerve Centre' by Steve MacManus [2016], read (1781084750 / 9781781084755)
   ERROR:root:No ISBN known for 'The Dark Forest (Remembrance of Earth’s Past, #2)' by Liu Cixin [2008], read (None / 9781784971588)
   ERROR:root:No ISBN known for 'Tau Zero' by Poul Anderson [1970], read (1407239139 / 9781407239132)

This presumes you have your collection on Goodreads shelved in a way (e.g. "science-fiction" in the above example) that makes it easy to filter out non-ISFDB-relevant books; otherwise this is going to generate a lot of irrelevant results.

Report on books that don't have many tags on ISFDB

... TODO ...