User talk:Fixer/Archive/2008

From ISFDB
Jump to navigation Jump to search

Welcome!

Hello, Fixer/Archive/2008, and welcome to the ISFDB Wiki! I hope you like the place and decide to stay. Here are some pages that you might find helpful:

I hope you enjoy editing here! Please sign your name on talk pages using four tildes (~~~~); this will insert your name and the date. If you need help, check out the community portal, or ask me on my talk page. Again, welcome! MHHutchins 06:19, 27 November 2008 (UTC)

Fixer says "Thanks!" while adding record number 288,005 to the target list. As you can imagine, he is too busy munching on data to answer questions here, so he authorized me to speak on his behalf, at least for now :) Ahasuerus 18:26, 27 November 2008 (UTC)
Fixer is a "he"? I never would have guessed. I am curious about the name though as while there's multiple meanings for "fix" and "fixer" none seem to match up with what your bot does unless it's the source of product for the penguins. That reminds me, I have not heard any recent news about what Dextre has been up to recently. http://www.asc-csa.gc.ca/eng/missions/sts-123/dextre.asp. --Marc Kupper|talk 21:57, 27 November 2008 (UTC)
Well, the original idea behind Fixer was to automate mundane housekeeping tasks like adding EDITOR Titles to MAGAZINE pubs, thus "Fixer". I even managed to teach it to find and merge duplicate titles some time during the summer. However, like so many other projects, this one mutated along the way and now most of Fixer's logic deals with finding new ISBNs at library catalogs and online stores like Amazon.
As far as Fixer's gender goes, let's be realistic. How many female contributors do we have here again? Besides, I am not even sure that Fixer belongs to a species with 2 sexes. The way he talks to 50 catalogs at the same time suggests almost Eddorian abilities... Ahasuerus 00:49, 28 November 2008 (UTC)

Using this page for comments

Just thought I'd go through the formalities of creating a talk page for your new incarnation. Should we use this page to bring up any issues or concerns when approving (or disapproving, as the case may be) these automated entries? MHHutchins 06:21, 27 November 2008 (UTC)

Sure, that will work. I check "Recent Changes" all the time and it's probably better to keep things separate so that verification requests and administrivia do not muddy the waters. Ahasuerus 18:28, 27 November 2008 (UTC)

Next of Kin by Russell

The steps taken to approve this submission:

  1. Removed the "large print" from the title (in the pub record and the title record) MHHutchins 06:27, 27 November 2008 (UTC)
    Unfortunately, there is no easy way to determine where the real title ends and where Amazon's "creative contribution" begins. Every time I think I have found a reliable way to separate the two, they come up with something different. My fear is that if I make the AI too complicated, it will eventually start dismembering legitimate titles, so I figure it's better to leave these issues for human moderators to handle. I'll probably add code to strip "known offenders" like "(Large Print)" later today, though. Ahasuerus 18:51, 27 November 2008 (UTC)
  2. Changed the author name from "Eric, Frank Russell" to "Eric Frank Russell" (in the pub record and the title record) MHHutchins 06:27, 27 November 2008 (UTC)
    This is another example of something that Amazon does that I have no control over. If they add random punctuation or misspell words, I have no way of telling what the intent is, so I can't do anything about it programmatically :( Ahasuerus 18:51, 27 November 2008 (UTC)
    After stripping the "(role)" which is in parentheses you could hunt through the authors table to see if the Amazon author string could be arranged to match a canonical author. If so, rearrange if needed and add a note that's one of
    • Amazon credits "abc" which matched "abc" in ISFDB and is assumed to be the same.
    • Amazon credits "abc" which is assumed to be "cba" in ISFDB. (a rearrangement matched)
    • Amazon credits "abc" which was not found in ISFDB. The author credit may be in error or this is a new author.
    The same logic would be used for processing titles so that the moderator knows if manual merges, etc. are needed. Titles are far more likely to not match as people append add series names, etc. and sometimes a series is named after the first book in a series meaning parts of the Amazon title will be found in a scan of titles for the selected author.
    My own author and title parsing logic looks for a "(" and whacks off the entire remainder of the string (trimming off the trailing space too). I've been using this method for years to look up the Fantastic Fiction and AbeBooks records and it's very rare that I end up pointing at no record or the the wrong record as a result. --Marc Kupper|talk 21:31, 27 November 2008 (UTC)
    I am hesitant to delete everything after the first "(" since, as you pointed out, the parenthetical section often contains useful information, e.g. "(Vampire Lover #7)". I guess I could move it to Notes, but won't that increase the probability that the approving moderator will miss it entirely? As far as the Author field goes, Amazon abuses middle initials and punctuation in all kinds of creative ways -- see the Russell example in this section -- so I am afraid to touch it for fear of confusing the approving moderator. Hm, let me sleep on it... Ahasuerus 03:32, 28 November 2008 (UTC)
  3. Removed the Browse node, but kept the link to amazon. (I wasn't sure how to handle the pub notes. Let me know what should be kept and what should be removed.) MHHutchins 06:27, 27 November 2008 (UTC)
    Well, the original idea was that everything below the words "MODERATOR NOTES:" would be deleted after approval/massaging, but if you think that something should be preserved, let me know and I will move it above that line. In this case, the reason that I included a link back to Amazon was to make it easier for the approving moderator to see what the Amazon record says -- e.g. check user reviews to see whether the book is really SF -- without having to open a separate window and search Amazon. Once the pub has been approved, it is automatically linked to Amazon.com (and other online sources) via the navbar on the left, which works off the pub's ISBN, so I figured another link to Amazon would be redundant. But again, nothing is cast in stone and all ideas are welcome! Ahasuerus 18:51, 27 November 2008 (UTC)
    One suggestion is that the initial "Data from Amazon.com" be "Data uploaded from Amazon.com 2008-11-27 13:09" so that down the road someone can compare this record against Amazon to see if things have changed. For example, the price often changes. --Marc Kupper|talk 21:31, 27 November 2008 (UTC)
    I like the idea of adding the date too. BLongley 00:11, 28 November 2008 (UTC)
    Good point, added! Ahasuerus 01:29, 28 November 2008 (UTC)
    I personally like the browse nodes stuff. --Marc Kupper|talk 21:31, 27 November 2008 (UTC)
    Amazon is apparently in the process of making Browse Nodes much more dynamic and therefore less reliable, which is one reason why I decided to pounce sooner rather than later. Also, leaving the Browse Node information in the Notes field after approval would increase the size of our backup file significantly. Ideally, we would create tags based on that information before blowing the details away. Unfortunately, you can't trust Amazon's node and subject codes any more than you can trust the rest of their data :( Ahasuerus 01:29, 28 November 2008 (UTC)
    BTW, the "Detailed information available here" link should not include your Amazon subscription ID as that's somewhat of a "secret." I personally link by the ISBN/ASIN using http://www.amazon.com/exec/obidos/ASIN/#isbn# as it's a short URL format that's been used by external links to Amazon for so long it's likely they will never remove support for it. --Marc Kupper|talk 21:31, 27 November 2008 (UTC)
    Thanks, I will fix it shortly! I am using a throwaway account in case Amazon disapproves of my activities, but still. Their guidelines are somewhat ambiguous in this area. Ahasuerus 01:29, 28 November 2008 (UTC)
    Fixed, although I had to use the ASIN field and not the ISBN field since they occasionally differ. Supposedly, Amazon.com has been known to changes ASINs on rare occasions, but I don't think we have to worry about that too much. Ahasuerus 03:32, 28 November 2008 (UTC)
  4. Merged new title record with the current title record. MHHutchins 06:27, 27 November 2008 (UTC)
    As this example makes abundantly clear, Amazon's data is so dirty that it's hard to be sure what the actual title is and who the author of the book is. If I were to try to create an "Add Pub" submission instead of a "New Pub" one, I estimate that I would be linking to a wrong title in a small number of cases. I can do it either way, but would you say that it's better to merge new titles all the time, which would ensure a low error rate, or to unmerge incorrectly merged titles when warranted? The latter would involve fewer cases and therefore would be faster, but it would also be likely more error-prone. Ahasuerus 18:51, 27 November 2008 (UTC)

Pretties

I'll leave the submissions in the queue. You are not checking to see if the title exists but maybe that's because the title includes "(Thorndike Press Large Print Literacy Bridge Series)" and that was assumed to be part of the title. --Marc Kupper|talk 08:51, 27 November 2008 (UTC)

Yup -- see my responses in the Russell section above. Ahasuerus 18:52, 27 November 2008 (UTC)

Capitalisation and special characters

These needed fixing: "Harry Potter and the Half-blood Prince" - capitalised "Blood", "ATLANTIS" - changed to Initial Capital only, " Alice's Adventures in Wonderland" - changed to real apostrophe. Could any of these have been easily done before submission? BLongley 12:06, 29 November 2008 (UTC)

Also "The Witch's Boy" and "Lucinda's Secret" . BLongley 13:57, 29 November 2008 (UTC)
Apostrophes are a known issues since XML doesn't like unescaped single quotes and the Web API doesn't like it when I escape them. If I can't find a way around it today, I'll have to ask Al to look into it.
As far as capitalization goes, there are cases of weird-yet-legitimate capitalization that the software can't predict, e.g. earlier today Fixer submitted Screams BeNeath Pandora and there are always things like The Man from U.N.C.L.E. or SuperHero ABC. If I were to change the logic to enforce "regular" capitalization, I suspect that we would garble most of these cases. The only sub-case that I think is safe to always capitalize is the post-hyphen one. Ahasuerus 14:06, 29 November 2008 (UTC)

Animorphs / Submission order / Large print

It looks like no Mod wants to touch the Animorphs submissions. I'm not happy to add "Katherine Applegate" as a variant of "K. A. Applegate" which the covers seem to show, I think Amazon might be looking at Copyright pages, which do show the "Katherine" now. Unfortunately Look-inside isn't showing Title pages so there's no real way to check. I'd abandon those entries for some things we're happier to deal with. BLongley 01:01, 20 December 2008 (UTC)

OCLC reports that the "responsibility" -- which usually reflects what's printed on the title page -- is "K. A. Applegate"'s, so I have approved the submissions and changed the name to the "K. A." version. Ahasuerus 04:38, 21 December 2008 (UTC)

Which brings me on to a second topic: can you order the submissions by title and author? I've seen a group of submissions that I can recognise as being by the same author, and can go do all the merges at once after approving the entries rather than do them individually, and recently you've added 2 editions of the same title on the same day: but you've been adding several L. Frank Baum and Roger Zelazny titles over multiple days, whereas they could be more easily dealt with in one go. BLongley 01:01, 20 December 2008 (UTC)

I am submitting records in the order they were downloaded from Amazon.com. Since I was downloading by "browse node" and then by subject, we are handling large print reprints at the moment, which I thought was a decent start since most of them would be well known books and easy to adjudicate. However, I can order submissions in all kinds of ways, we just need to decide what we want to be processed first. Let me see how many, um, lets say Zelaznys we have in the queue...
I have been extremely busy making the world safe for genre bibliography over the last few weeks, so all I could do at night was create 10-20 submissions, reject the obviously non-genre stuff and perhaps approve 1 or 2 submissions. I am likely to remain equally busy through mid-January, but I will have a few hours on weekends and during the holidays, so I can make relatively simple changes to Fixer. Ahasuerus 04:38, 21 December 2008 (UTC)

Oh, and while I'm moaning, can you find something more interesting than Large Print editions of stuff we've already got? It's always more interesting for me when I discover a new title or especially a new author during Moderating Activity. Otherwise, I seem quite capable of finding such by looking at obscure anthologies and searching a little harder. BLongley 01:01, 20 December 2008 (UTC)

Unfortunately, looking for new authors in the 290,000 records that we have sitting in the queue is likely to produce more cases similar to "Katherine Applegate", but I can give it a try :) Ahasuerus 04:38, 21 December 2008 (UTC)

Can Fixer take requests?

One of the things that comes up with the DAW list is I'll know the ISBN of a book and would like to have it uploaded to ISFDB. For example the title Moon In The Mirror does not include a publication record for Amazon.com 075640486X which is a paperback reprint. I've been uploading these by hand. --Marc Kupper|talk 00:53, 30 December 2008 (UTC)

If you happen to have a list of ISBNs that we are missing, Fixer can use it to upload any matching records that I have on file, i.e. the 290,000 ASINs that I have downloaded from Amazon.com. If any of the ISBNs are not among the 290,000 on file, Fixer can query Amazon.com directly and submit the results, but that will take a bit more time. Ahasuerus 04:49, 30 December 2008 (UTC)
Maybe that'll motivate me to write that auto-snagging of data from Amazon I've been meaning to do for a couple of years... In my personal book db I have a set of fields that capture selected Amazon data and I keep running into putting time into keeping the db up to date manually vs. time into writing code that would automate some of keeping the db up to date. --Marc Kupper|talk 23:23, 31 December 2008 (UTC)

Related would be if I could give Fixer an ISBN where a record already exists and Fixer would report if things like the price have changed. It's pretty common that a book will be announced at one price and later released at a different (and nearly always higher) price or for an in-print title for there to be a price increase. --Marc Kupper|talk 00:53, 30 December 2008 (UTC)

Amazon.com allows you to retrieve different types of prices: "current prices", "offer prices", "list prices", etc. I thought that Fixer was capturing list prices, but I'll have to double check. I would be leery of asking Fixer to create automatic submissions for price changes since Amazon's price data is not the most reliable in the world, but we can give it a shot and see what happens. Ahasuerus 04:49, 30 December 2008 (UTC)
I've generally found Amazon's list prices to be accurate but that the publishers are indecisive, particularly with yet to be released titles. The only data field I'm entirely skeptical of is the page count and then the binding type. --Marc Kupper|talk 23:23, 31 December 2008 (UTC)
I've got lucky this week in that I've actually been paid to research web-service capabilities from our current (work) infrastructure. So I chose Amazon as an example (as the in-house capabilities aren't available yet), and therefore wasted two days dealing with Proxy Authentication and WSDL incompatibilities, etc. But I've seen list-prices from Amazon and even when they're not the generally available ones (they won't list them for things they can't sell) there's no way to match them to anything official for many years. There might be some years you can trust, but I'd not yet take an Amazon price of any kind over something we have here that might have come from Locus or a publisher website. Even I (as a "If you tell people where the data came from, they'll trust it as it's got a Source") sceptic would add notes. Actually, I wouldn't add it unless it had a new confirmable ISBN of something we already have. Which is probably why I don't do that - another version of something we already have is not really interesting. BLongley 00:35, 10 January 2009 (UTC)
Just a note that I have confirmed that Fixer extracts "Listed Price" from the XML. How accurate Amazon's data is, well, that's a different question :) Ahasuerus 06:27, 11 January 2009 (UTC)