Difference between revisions of "Category talk:Publishers"

From ISFDB
Jump to navigation Jump to search
 
(48 intermediate revisions by 5 users not shown)
Line 39: Line 39:
  
 
::: I personally feel that making every page a non-automatic redirect will add aggravation. At present the database has 8,363 publishers referenced by 127,250 publications. I suspect most people would be happier and less confused by a system that has 8000 automatic redirects to a core set of publisher articles than to have 8,363 landing pages with next to zero content other than a link to a publisher page. There is a project underway to pair the 8,363 publishers down by merging similar names but that's another can of worms. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 02:02, 5 September 2008 (UTC)
 
::: I personally feel that making every page a non-automatic redirect will add aggravation. At present the database has 8,363 publishers referenced by 127,250 publications. I suspect most people would be happier and less confused by a system that has 8000 automatic redirects to a core set of publisher articles than to have 8,363 landing pages with next to zero content other than a link to a publisher page. There is a project underway to pair the 8,363 publishers down by merging similar names but that's another can of worms. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 02:02, 5 September 2008 (UTC)
 +
::::I am all for making pages true (automatic) redirects when there is no significant difference. For example, of [[Publisher:Baen]] and [[Publisher:Baen Books} one should surely be a redirect. I must admit that in the matter of regularization I would tend to favor the shorter names in many cases: "DAW" not "DAW Books, Inc", "Baen", not "Baen Books", "Tor", not "Tor Books". But that is really a discussion for a somewhat different place. Forcing selection from a list (or even making it possible) in the DB proper will not, i suspect, and should not come until publisher regularization has progressed a good deal farther than it has at present. Wiki redirects on publisher pages are IMO needed precisely because people will enter variants. Suppose, for example, we decide to standardize on "Baen" (or the other way round). Then [[Publisher:Baen Books]] will still exist, because people will continue to enter publications that way in some cases, but it would redirect to [[Publisher:Baen]], so that anyone following the link would know the standard name accepted here. Frankly, having two different wiki pages referring to the same publisher strikes me as only an invitation for things to get out of sync, for inconsistent or incompatible data to exist on different pages. People will be forced to do extra work to try to keep them in sync, and even so the sync will be, at best, imperfect. My view would be that ideally, for any given actual publisher, there would be ideally a '''single''' wiki page, with '''all''' other variations being redirects. (Cases like DAW that need sub-pages are fine, as long as there is a single root of the collection.) Imprints that have a truly separate identity should probably have their own pages, particularly when the imprint has been part of more than one publisher over its lifetime. But each imprint should ideally have only a single page, as well. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 15:45, 5 September 2008 (UTC)
  
 
== Categories on redirect pages ==
 
== Categories on redirect pages ==
Line 80: Line 81:
  
 
: Entry and verification of publications has generally been considered mechanical "non-thinking" work. It should be completely objective. When we move to titles and authors there is often quite a bit of of subjective work. As we have over 8000 publisher names to deal with though it sounds like we should have a near "mechanical" method for determining which names should be canonical. We can't put the individual names up for votes.  My "vote" would be the commonly known names, the most frequently used version (per counting pub records), and that the names be be two words or more.  Use "DAW Books" rather than "DAW" so that we catch Pocket, Inc. vs. Pocket Books, Inc. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 04:35, 5 September 2008 (UTC)
 
: Entry and verification of publications has generally been considered mechanical "non-thinking" work. It should be completely objective. When we move to titles and authors there is often quite a bit of of subjective work. As we have over 8000 publisher names to deal with though it sounds like we should have a near "mechanical" method for determining which names should be canonical. We can't put the individual names up for votes.  My "vote" would be the commonly known names, the most frequently used version (per counting pub records), and that the names be be two words or more.  Use "DAW Books" rather than "DAW" so that we catch Pocket, Inc. vs. Pocket Books, Inc. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 04:35, 5 September 2008 (UTC)
 +
::See [[Publisher:Methuen]].  I have collected 3 verified datapoints, with two variant names.  Eventually the datapoints themselves will tell the story, and the 'common elements' would form the canonical name. (At this point for Methuen, the data recommends 'Methuen & Co.'. At least for early twentieth century works... [[User:Kpulliam|Kevin]] 05:18, 5 September 2008 (UTC)
 +
 +
::: That's a good article.  In this case I'd say the canonical name is Methuen and that in the database we will see
 +
:::* Methuen & Co.
 +
:::* Methuen & Co. Ltd
 +
:::* [[Publisher:Eyre Methuen|Eyre Methuen]] which has it's own page.
 +
:::ISFDB happens to have a handful of entries
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?3024 Eyre Methuen]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?226 London: Methuen]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?7031 London: Methuen &amp; Co., Ltd.]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?994 Methuen]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?21529 Methuen (UK)]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?25565 Methuen Australia]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?2021 Methuen Children's Books]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?3571 Methuen Drama]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?24341 Methuen Young Books]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?2506 Methuen/Arkana]
 +
:::* [http://www.isfdb.org/cgi-bin/publisher.cgi?22581 Methuen/Magnum]
 +
:::I don't see any obvious safe merge candidates and so if it were me I'd make all the wiki pages for those names redirects to [[Publisher:Methuen|Methuen]] and as the [[Publisher:Eyre Methuen|Eyre Methuen]] article is so small I'd also move it's content into [[Publisher:Eyre Methuen|Eyre Methuen#Eyre Methuen]] and redirect that too.  At some point the publications with "Methuen (UK)" for example will get verified and the publisher name changed to one of the preferred versions that also reflects accurately what's stated. At that point the wiki page for Methuen (UK) can be dropped. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 09:03, 5 September 2008 (UTC)
 +
 +
::::You say that you don't see any safe merge candidates. I would be included to merge at least "London: Methuen", "London: Methuen &amp; Co., Ltd.", and "Methuen (UK)", and to separately merge "Methuen & Co." and "Methuen & Co. Ltd". I really don't believe that which of these got entered in the ISFDB corresponds to what was in the publication (and still less to the actual identity of the publishers) in any meaningful way. What you have here is not data, it is noise. First of all, in the many cases where the initial entry was from a secondary source, it is known that different sources report the same publisher in different ways, according the to the standard in effect (or the whim of the cataloger when there was no standard) when the book was cataloged -- indeed the same source may well report the same publisher differently when there are multiple entries for the same edition of the same book. Secondly, some editors will have entered fully whatever was in the book or secondary source, some will have automatically omitted city names, and some will have entered only the key word of the publisher's names, omitting things like "& Co." or "Ltd" or even "and sons". Trying to base any useful deductions on such entries is not merely onerous, it is fundamentally misguided. It is IMO far better to do separate research into publisher histories, determine the names actually in use at any particular period, confirm this with a few questions to verifiers about the actual form of the publisher's name on specific publications, and use all this data to determine a canonical name and known or plausible variations for a given publisher. Where the name actually changed at particular points in time, this should be noted. In some cases this may be an aid to dating undated pubs. In others it may help to enter pubs of known date with the proper name for the date. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 16:06, 5 September 2008 (UTC)
 +
::::It was said above that "Baen books" is an imprint, not a publisher. In a technical sense this is true, but "Baen Publishing Enterprises" has never had more than one imprint, nor has that imprint ever changed, either identity or name. Thus the distinction is not, in that case at least, useful. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 16:06, 5 September 2008 (UTC)
 +
 +
::::: The reason I did not see safe merge candidates is that from [[Publisher:Methuen]] I got the impression that the publisher uses several distinct, though similar names.  "Methuen & Co." and "Methuen & Co. Ltd" were used at different times and as publication records should accurately reflect the name we will see both in ISFDB.  As the ISFDB data gets more accurate it can get used for secondary functions such as estimating the date for an otherwise undated publication.  <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 19:13, 5 September 2008 (UTC)
 +
::::::If when you say "publication records should accurately reflect the name" you mean that they '''ought''' to reflect what is actually in the publication, you may have a point, although i tend to disagree. If you mean that you think that in a majority of cases they currently '''do''' reflect what is in the publication, than I think you are badly mistaken. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 19:48, 5 September 2008 (UTC)
 +
::::::: I'm a TTL based system and never got the upgrade that allows me to parse the distinction between should and ought. I believe nearly all of the ISFDB, and even Amazon, data has some basis in fact and the current does does reflect what's stated in the publications. Note that I use "reflects" and not "matches."  Even though the publisher field contents are much derided I've also found that it's very rare for it to be wrong. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 23:17, 5 September 2008 (UTC)
 +
::::::::In this case, as far as I am concerned "should"="ought", but neither is the same as "deos in fact". I agree that the vast majority of current ISFDB data does indeed "have some vbasis in fact".  If the publisher field says "Methuen & Co." I'm pretty sure that the boom wasn't published by another firm altogehter, but by Methuen under some name. But the resolution (so to seapk) of tjhis accuracy is limnited. What I '''don't''' feel confident is that the book said "Methuen & Co." rather than "Methuen & Co. Ltd", or some other varient. If the use of these varients was in fact significant (which i would want evidence for, rather than merely assuming) I don't think we have been relaibly or consistantly capturing data at that level of precision -- I know that I haven't always done so, in my edits. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:16, 6 September 2008 (UTC)
 +
 +
::::::As to what '''ought''' to be in the publication records, i am inclined to think that we should move (slowly and carefully) towards a system in which each publisher or imprint has a single canonical name, or a single name for any given point in time. (For example, one entry might read "Use 'Harper' before 1817, use 'Harper & Bros' from 1817-1960, use 'Harper & Row' from 1961-1997, thereafter use 'HarperCollins'. If there is good reason to think there is a meaningful variation, list it in the notes and record it on the wiki page."). Editors would be encouraged to use the established canonical names, and periodic scans would be made for non-canonical entries, which would be checked and converted if there was no good reason not to. '''Note''', I do '''not''' think we are ready for this style yet, we aren't nearly ready to establish canonical names for publishers firmly enough for this in any but a very few cases. (Baen is one, there being no name changes or multiple imprints involved there.) But i hope and expect that we will eventually move to that model, just as we established canonical names for authors. And just as an author's canonical name need not be the author's legal name, or even the author's preferred nickname, so a publisher's canonical name need not and mostly will not be the same as that publisher's legal corporate name, nor need it shift if the publisher made a minor change in corporate name. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 19:48, 5 September 2008 (UTC)
 +
 +
::::::: Agreed - I sent an e-mail to Al this morning asking about cloning the ISFDB system into a vmware image. That'll allow us to start prototyping ideas as one thing I want to add before we do publisher merges is a history log for every record that'll have something like 20080905162123|Marc Kupper|pub-224321|Publisher changed from "Methuen (UK)" to "London: Methuen & Co., Ltd."|Reason...  I'd be less concerned about data loss. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 23:28, 5 September 2008 (UTC)
 +
:::::::: That I could  agree with. If such a system wre expcted soon, I would be willign to postpone any publisher merges until it was in place. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:16, 6 September 2008 (UTC)
 +
 +
::::: As for the assertion that the existing data is noise.  That's true - it's all noise, particularly as there are no defined/agreed standards for the publisher name field. This is why I made the five people looking twice can arrive at ten answers comment.  I'd like to see us moving in the direction of physically verifying the publisher names using an agreed standard so that we can get some "signal" in the noise.  The agreed standard may well be that we don't care about the details and that "Methuen & Co." and "Methuen & Co. Ltd" are the same for example and if that's the case then yes, merge away. I'm personally advocating that we do document the details but to really improve the results I believe we need to have new code that supports a publisher selector thing that'll pop up an image of what the bottom of the title page (or whatever we agree to) should look like. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 19:13, 5 September 2008 (UTC)
 +
::::::I'm glad you agree that much of the existing variation is noise. No doubt some of it is signal: some editors or verifiers have on some occasions been careful to record exactly what was in the actual book in a consistent way. But is there an reasonable method to determine which records are accurate in that way -- to filter the signal from the noise -- to make it worth trying? Or is it better to simply abandon the attempt to draw fine distinctions from our existing DB? (Do note "fine", I am sure that when a publication record says  "Methuen & Co." that means it isn't "Macmillian", but I doubt that when one pub record says  "Methuen & Co." while another says  "Methuen & Co. Ltd", and yet another says  "London:Methuen & Co." that the differences are worth even trying to sort out. I would merge them all, discard the noise, and not worry about any signal I might be losing, because I don't think the signal could ever be reliably filtered out. Remember that making distinctions based on unreliable data is quite probably worse than making no distinctions at all. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 19:57, 5 September 2008 (UTC)
 +
 +
::::: The method you suggested of merge now and then sort out the details by asking verifiers may be less work. I'm not opposed to it though it means that we will loose the existing "signal" in the haystack. For example, in light of this discussion I just went back and re-verified my most recent publications to make sure the publisher field matches exactly what's stated.  My previously verified "Harcourt Brace" became "Harcourt Brace &amp; Company" for example. Any merges of Harcourt will loose that signal. I thought I've seen "Harcourt, Brace &amp; Company" (with a comma) and see a smattering of [http://www.isfdb.org/cgi-bin/publisher.cgi?2410 Harcourt, Brace] in the database. I have no idea if this is introduced noise or if these reflect what's stated. It's interesting that all of them are in a tight date range implying it's stated. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 19:13, 5 September 2008 (UTC)
 +
 +
::::::But note that even if those are as stated, even if Harcourt actually used different forms of its name on different publications, it was still the same entity, and should probably have only a single publisher record. Perhaps we should have a field for "canonical publisher name" and one for "publisher name as stated". Note also that, unless the change in form of name was regular enough to help determine publication dates, it is of no particular value. Granted, collecting it might help establish whether it was in fact regular enough to be useful for a given publisher during a given period. In any case I think we need to start with the researched facts. Knowing when Harcourt went through various mergers and made various changes in its official, documented name (info that is surely available to a little research) would give us a starting skeleton of known fact. Then we could check with verifiers about books that were published at known dates, and compare the results with that skeleton. Before long, we should have a dataset that is firm enough to actually help us. Doing this publisher by publisher would be tedious, but i think it is the only reliable way forward, and it is still better than going at it publication by publication, never knowing how meticulous an editor was, nor whether a non-verified entry took the publisher name from a secondary source, nor whether multiple forms that may actually have been used were meaningful distinctions. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 20:08, 5 September 2008 (UTC)
 +
 +
::::: The business about Baen Books being a imprint was more directed at Bill.  I'm still trying to understand what he wants us to do other than to stop fiddling and organizing.  I'm just wondering if he wants "Publisher" to be reserved for the actual publishers (which are sometimes difficult to determine these days) and that there be separate name spaces for publishing groups and imprints.  It would be a "project" though certainly possible and probably outside of the publication centric scope of ISFDB though having accurate source data in ISFDB will help greatly. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 19:13, 5 September 2008 (UTC)
 +
:::::: I want the "fiddling and organizing" that doesn't actually ADD any data to stop interfering with my work - I don't like being told that I "should" also do this or that to a wiki-page that I will not get any benefit from when I take an ISFDB backup and use it. If it's a database field, yes, we need pretty rigid guidance or it's useless. We're stuck with using the wiki-pages until we can resolve what should be done in the database - and although I do try and use the wiki pages to ADD data, and even ORGANISE data (linking between "publishers" is only possible on the Wiki side so far), I have to treat the Wiki as a temporary solution and a guide to what we want in the database. This conversation will be in the wiki indefinitely - do I want it in the '''database'''? No. Do I want to improve the publisher data in the database itself? Yes. Do I want to destroy the data that people have so carefully entered despite the fact that we've got no rules for entering it? No. Please, EVERYONE, add more data to support anything you want back from the database. I really want Al to allow publisher regularisation that does NOT mess with an intent of "I want to call the publisher of this a scumbag vanity-press ripoff-merchant" - and that means we have to have more fields available to mess with or NOT mess with. Add examples in the wiki to show Al we want regularisation for some reasons and "as stated" for others. We've got a "BFG" set of tools to use on the database, that we really shouldn't use without general consent. We've got a Wiki where we can dump all our thoughts. Yes, we have to use the wiki a bit more wisely so that people can FIND those thoughts, but basically I find that if we work around the database limitations TOO much we end up with something that only works on the web. In which case, go post it on Wikipedia or a specialist site, you've lost the "database" view. Organising the ISFDB Wiki so that we have a wonderful website looks good. Concentrating on that TOO much makes it web-only. [[User:BLongley|BLongley]] 22:32, 5 September 2008 (UTC)
 +
:::::::Frankly as far as I'm concerned, the ISFDB basically '''is''' web only. That is why it's name begins with '''Internet'''. Yes it is possible to download a copy of the database, but that is only a tool for testing and doing queries hard to do on the main web interface -- and if they are wanted often, they should be added to that interface when possible. The Wiki is, IMO just as much a part of the ISFDB as the database is.
 +
:::::::Also frankly, i don't for an instant believe that most of the publisher names have been "so carefully entered" and i honestly doubt that there is any value whatsoever in the more minute variations now existing in the database. The majority of the publisher name fields, after all, were initially filled by import from amazon or from various secondary sources. Even when a publication record has been primary verified, I am not at all convinced that in the majority of cases the editor corrects a basically accurate publisher name to precisely match the form stated in the publication. If the earlier, secondary-source-derived record said "Methuen" would most primary verifiers carefully correct this to "Methuen & Co Ltd"? I rather doubt it. Indeed would there be any value to their having done so? -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 22:43, 5 September 2008 (UTC)
 +
 +
===Methuen Merge - A test case discussion===
 +
 +
I see these obvious merge candidates in this publisher.
 +
*[http://www.isfdb.org/cgi-bin/publisher.cgi?226 London: Methuen]
 +
*[http://www.isfdb.org/cgi-bin/publisher.cgi?7031 London: Methuen &amp; Co., Ltd.]
 +
*[http://www.isfdb.org/cgi-bin/publisher.cgi?994 Methuen]
 +
*[http://www.isfdb.org/cgi-bin/publisher.cgi?21529 Methuen (UK)]
 +
 +
I personally recommend "Methuen & Co." as the 'common part of both verified variants so far.  Anyone indexing a book with 'Methuen', 'Methuen & Co.', 'Methuen & Co. Ltd', and other generic 'Methuen Corp, Co, Comp, Company, etc etc' will likely gravitate to this publisher choice (if they look up possible publishers first). (But I also don't have a problem with :UK, or :London or something similar appended to the name because of the Australian Methuen variant.)[[User:Kpulliam|Kevin]] 22:28, 5 September 2008 (UTC)
 +
 +
We have documented in the Wiki that this publisher is in London, and we know that London is in the UK. If we further state "The following Publisher Variant names should be indexed here" then any later input under previously merged names can be remerged. We have also documented (With online viewable copies) Both "& Co." and "& Co. Ltd."  If we document in the wiki that both existed (and continue to fill in the verified/confirmed data table to fill in missing years) then I don't have a problem merging these mild corporate variants.  We should ask that if anyone finds something named in such a way that it doesn't fit with previously verified data, they should post it to the Wiki with a note about that strangeness.[[User:Kpulliam|Kevin]] 22:28, 5 September 2008 (UTC)
 +
 +
We have not documented/verified anything about the multiple various Methuen 'imprints' and other subnames, and they should not be merged at this time. Thoughts?[[User:Kpulliam|Kevin]] 22:28, 5 September 2008 (UTC)
 +
:If I were working on the various "Methuen" publisher records, the '''first''' thing I would do is some '''outside''' research into the publisher. I would, for one thing, try to determine whether or not it had ever had a non-UK branch or existence. If it had not, then "Methuen " and "Methuen (UK)" are redundant and should be merged, if it had, then they aren't and shouldn't. I would also determine which name forms could be decimated as having been used by the firm, and during which periods. If as a result of that research, I determined that all 4 of the forms you list above (well 3 of the 4) should remain distinct, i would in any case remove the "London:" from the names of the first two, putting it in the publisher notes (and probably also in a publisher wiki page). I would add (UK) whenever that was needed for disambiguation. Since this would leave no difference between the 1st and 3rd forms, those would be merged in any case. Note this is all in the subjunctive -- I am not planning to work on Methuen at this time. But that is how I would approach the matter. Imprints would be trickier, and i would incline to leave them unmerged, except for ones with names differing only in the presence of city names, or of terms such as "Ltd", where if outside research did not indicate a significant difference of actual organizational identity, i would probably merge. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 22:55, 5 September 2008 (UTC)
 +
::I'll also point out that the need for a (UK) variant may be isolated in time.  If you could determine when the australian office opened... then let Methuen be acceptable for pre 1980 data (as an example) and have Methuen (UK) be the canonical name after 1980.  I point out that the (UK) version of this publisher has no datapoints prior to that.[[User:Kpulliam|Kevin]] 04:16, 6 September 2008 (UTC)
 +
 +
: My current thoughts are that Wiki pages are a really LOUSY form of discussion, as my long-considered response to one issue is now in a totally wrong area.:-(  (Yes, I'm currently pissed off about Wiki in general - bear with me, I AM trying to add serious comments where needed.) [[User:BLongley|BLongley]] 23:32, 5 September 2008 (UTC)
 +
:: No Problem Bill - Actually what we need for complex discussion is a threaded messaging system like usenet (Or google groups as a modern day successor), unfortunately this is what we have. I started this subsection as a place to limit discussion to one example, so we could identify the issues we all have with this one example.  As far as I am concerned, you are welcome to copy or move your comments about this example to this area. [[User:Kpulliam|Kevin]] 01:20, 6 September 2008 (UTC)
 +
: I'd go for "Methuen" in this case. Yes, there's a need for disambiguation with some publishers - and the "London" list for Methuen is likely to be different from the Australian one. The "& Co", "& Co. Ltd." adds nothing useful. "and Co" or "and Co. Ltd." doesn't either. And "Limited" rather than "Ltd" adds nothing. I really want "shortest name that can be used while still being clear". Nobody's going to add an "Inc" or "Gmbh" or "Ltd" or "PLC" or "LLC" if they think it's clear enough already. We should guide people to add a suffix IF we think it's useful disambiguation though - I have no idea what "Pty" means but I avoid messing with those, which is probably a good start. [[User:BLongley|BLongley]] 23:32, 5 September 2008 (UTC) 
 +
:: I'm wondering about "shortest name that can be used while still being clear". Why not the opposite "The longest name using the most common elements of all variant names"?  It's not like data is expensive to store... it's just a table lookup... and it's not like we can't cut-n-paste it into the form, and regular editors will have major publishers in the auto-fill list.[[User:Kpulliam|Kevin]] 01:26, 6 September 2008 (UTC)
 +
:::I rahter agree with bill here that "& Co" and "Ltd" and the like generally add nothing, adn that in general i prefer a shoreter form to a longer, provided that the shorter form is clear, and not ambigious. When a firm ahs invariably used a longer form ther is some arguemte for retaining it for tradidion and clear identification. For example i would be inclined to retain "Harper & Row" even if ther wern't the need to distinguish other varients on "Harper". That is why assignign a cannonical name will ultimately need to be done case by case. But in genereal incluiding things like "Ltd" or "inc" simply adds more keystrokes to data entry and takes up more screen estate on diusplay, to no particular good, unless the prsence or absence of such is a consistant relaibe indiactor of date or firm. At least that is my view. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:16, 6 September 2008 (UTC)
 +
::: My only problem with longer names is that on the title display where we list all the publications the display looks better with short names.  Perhaps the publisher database needs to have both long and short forms. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 03:53, 6 September 2008 (UTC)
 +
::::Ummmm looks better in what resolution, with what system fonts installed, and what size fonts selected as 'normal', in what browser, with which operating system, with LCD or CRT monitor, what size monitor, what distance from the viewer to the screen, and with or without reading glasses? I admit I'm being an ass with some of those questions, but they are honestly all part of the discussion of 'how something looks'. I think the argument that it 'looks better' if there are 7 fewer ascii characters on a screen is not relevent to the question of what we store in the database. Please see {{T|197177}} and {{P|MXCRRDSZFR1914}}. This title and pub is published by Methuen, with City, with '& Co.' and with 'ltd.'. Both pages appear to function properly and in the title bibliography if I make the browser window skinnier the line wraps properly.  What exactly about it doesn't look good to you?
 +
:::::I dilike any publisher name with a prefixed city. For oen thing, that isn't the actual name of the firm. For another, when city names are included in actual publications, they generally aren't prefixed but postfixed, and generally on a subsequent line, as if to indicate that they are not part of the publisher name. For a third thing, they make publishers sort on the city name rather than the key element of the publishwer name which is IMO a '''Bad Thing'''. For yet another, they are often merely historical relics for modern publishers, that are apt to have many offices and many productions sites that are part of the same entity. When I see a publisher name like "London:Methuen" what I firat think is "Data entered from library catalog, details not reliable". -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:16, 6 September 2008 (UTC)
 +
::::::You have provided a valid argument against prefixing the cityname and I agree. No more prefixing of citynames for me.[[User:Kpulliam|Kevin]] 15:13, 6 September 2008 (UTC)
 +
 +
::::: I suspect in the case of titles that Methuen selects the length of the name used does not matter as you will only see one or two printings. When a title has 50 reprints by multiple publishers I've had trouble visually filtering through the results at times.  It's probably not big a deal because where there are 50 pubs I often copy/paste the list into a spreadsheet and parse it into into columns as usually I'm doing this to document all of the Bantam printings for example. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 07:14, 6 September 2008 (UTC)
 +
 +
::::::If you have a problem "visually filtering through the results at times", then that is a limitation of the database display software. Perhaps the 'concise' view could be modded to display only the first 2 or so words of the publisher name. But again, that's a software limitation, and you are arguing that the database accuracy should be reduced because (paraphrased) 'it's hard to read sometimes'.  That is equivalent to arguing that 640k is enough RAM for anyone, 2GB is the biggest harddrive we ever need to worry about; a decision made by someone else, in order to make it easier/simpler for management. [[User:Kpulliam|Kevin]] 15:13, 6 September 2008 (UTC)
 +
::::::Personally I really like the idea of a (Future) upgrade to the web system allowing, 'Display as table' for easy web copying of data, and 'download as spreadsheet' for easy web extraction of large datasets. [[User:Kpulliam|Kevin]] 15:13, 6 September 2008 (UTC)
 +
 +
:I've actually verified a lot of Methuen pubs, and imprints of such like "Magnum" too. If we can agree on a target publisher and imprints and just go add all the data we can, we might get some idea of who wants what data. And yes, feel free to ask me why "Methuen Drama" should be separated, and other awkward questions - I'd rather we all talked a bit and discussed some general aims rather than annoy editors with Mod overrides and suchlike.  [[User:BLongley|BLongley]] 23:32, 5 September 2008 (UTC)
 +
::I'm willing enough to talk -- until this discussion started up no one seemed to be talking about this issue. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:16, 6 September 2008 (UTC)
 +
::: I'm glad people are talking - the publisher improvements in the database seemed to go mostly unremarked, increased activity in the Wiki went unnoticed, I'm not sure what sparked all this. Of course, it may lead nowhere but we should extract some agreements from this  and make them desired goals (I hate to make them "Rules", but "preferred entry styles" in the help would be a start - and removing City Name prefixes would be a very good start, IMO.) [[User:BLongley|BLongley]] 21:37, 6 September 2008 (UTC)
 +
::: Still, a test case would be good: Methuen may not be the best example as Kevin's examples don't match with anything I own, and I really want the most argumentative people here to work with similar books and present their different conclusions as to what was worth recording and what wasn't, and how it should be recorded, and linked, etc. We've got to the stage where I can FIND a publisher here and extract the books I own by that publisher, so I could work with Bantam and Corgi (there's a relationship there we're not clear about) or Ballantine (did have a UK version for a while) or we could go for the whole Robert Maxwell 1990s nightmare (not recommended). Something multi-national, several imprints, not too recent or they'll all be "Scholastic" or "Random House" or "Hachette Livre" in the end - suggestions welcome. [[User:BLongley|BLongley]] 21:37, 6 September 2008 (UTC)
  
 
== Publisher Selector ==
 
== Publisher Selector ==
Line 95: Line 176:
 
* Sphere Books Limited<br>30/32 Gray's Inn Road, WC1X 8JL
 
* Sphere Books Limited<br>30/32 Gray's Inn Road, WC1X 8JL
 
<span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 04:35, 5 September 2008 (UTC)
 
<span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 04:35, 5 September 2008 (UTC)
 +
 +
== Merging USA and Canada, and other countries ==
 +
 +
Bill wrote in ramble #1 ""I've merged the UK, US and Canadian versions of this as the price should be clear enough" (something I've been very tempted to do with recent titles)."  I'm not sure who Bill is quoting and so can't credit the first part.
 +
 +
:: The original comment wasn't a specific quote from anyone - I try and avoid naming people when I'm moaning, but obviously some complaints are going to be fairly obvious as to who I think is responsible (ISFDB Software Bug? Must be Al, although he might be able to blame Roglo a bit now. Design Fault from ISFDB1? Al or Ahasuerus are probably the only ones around to listen. Faulty Old Help? Probably Mike Christie. Who messed up the publishers with "(first printing)" suffixes and such? ... ;-) I don't moan to assign blame, I moan when it's messing with my ability to work here and whoever is active here NOW can help fix things. [[User:BLongley|BLongley]] 21:28, 5 September 2008 (UTC)
 +
 +
::: If you ever meet me in person you'll find I disregard moaning, howls, screams, noxious smells, etc. coming from the cellar. :-) <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 00:49, 6 September 2008 (UTC)
 +
 +
: I like seeing the (country) suffix as it raises awareness that people need to pay attention to where a book is printed and it makes it easier to search for these and to spot them in the list of publications for a title. For years when DAW printed in both the USA and Canada the books were identical other than the price was different and at the bottom of the copyright page it would say "Printed in Canada" or "Printed in the U.S.A." The Canadian printings often had something about New American Library on the title page either immediately over or underneath the DAW Books stuff. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 04:51, 5 September 2008 (UTC)
 +
 +
:: I appreciate that country of origin may be of interest, and useful for separating different real publications, but it's currently undefined and pretty much noise. Even if I assume that all publishers without a suffix are to be treated as "(US)", can I assume that that's where they were physically '''printed'''? I doubt it - and the more recent the publication, the bigger the doubt. My comment "(something I've been very tempted to do with recent titles)" is because I've seen Dissembler-added data from Amazon US for publications that are/will be sourced from the UK, with Amazon US prices just being currency-conversions rather than true printed prices. Those I'm always tempted to merge (if the correct original publication data is available too) and put the "canonical" price ("first listed", IMO) on them with the rest in notes. If you want to go down to Country of Printing, rather than Country of Originating Publisher, then we'll have an even bigger mess on our hands: I've seen the same book, same publisher, same cover, same printing number, same prices, printed by two different Scottish printers and one German one. I don't want to go down to that level of detail. When it truly distinguishes something useful ("has this edition had "color" corrected to "colour" for the UK market?") it might be worth recording. But such problems are mostly for current publications, with all the globalisation issues. (I'm avoiding such.) I'm reworking a lot of my original verifications now (adding cover images mostly) but I'm adding pricing data too. This is because I do NOT want eight editions of the same book entered just because it has a British price first and seven other prices listed - if I don't list them then an Australian editor might think he's got a different edition, a South African might enter another, a New Zealander another, etc. I guess I'm calling for better pricing support, but that's a minor point if the countries represented here by editors aren't as confused as I am - after all, I'm in the originating country and have no idea how confusing this all is elsewhere.... Oh well, I'm off-topic again. [[User:BLongley|BLongley]] 21:28, 5 September 2008 (UTC)
 +
:::On this, I tend to agree with Bill. I don't much want to see country suffixes on '''publications''' unless there is no other way to distinguish actually and usefully different pubs. On '''publishers''' i want to see them only when needed for disambiguation. "Harper & Row" was a US publisher, there was no English or other publisher by that name AFAIK. Thus there is no need for "Harper & Row (US)". "HarperCollins" does operate in multiple countries, and as I understand it, as basically separate entities in at least some of them. Thus "HarperCollins (UK)" and "HarperCollins (Australia)" are probably needed. I also agree that a single book, published with multiple prices for multiple countries, does not need multiple publication entries here. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 23:03, 5 September 2008 (UTC)
 +
 +
:::: Bill, if you see someone doing something that you don't understand rather than dismissing it with with "currently undefined and pretty much noise" why not ask the people involved with that practice? <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 00:49, 6 September 2008 (UTC)
 +
 +
::::: I will, and do, if I see something ''being'' done I don't understand, but I'm talking about something ALREADY done and it boils down to "Because I can't FIND the people involved with that practice". Even if you've been diligently following some rule about DAW, it hasn't been documented and I can't tell whether any particular publication was entered by you. (OK, you DO have a very distinctive style, but cloning means your style of notes are adopted by other editors.) The help hasn't covered this before (and still doesn't) so there's no real assurance that any particular publication is entered with "correct" publisher info, whenever/if ever we define "correct". If there's good cause to suspect some data is better in certain cases then that probably needs to be taken into consideration. For instance, "Verified by Marc Kupper" for a DAW book probably means good data, and pretty complete. Unverified, and with an exact day of publication, and an image URL with ZZZZZZZZ in, probably means Dissembler data from Amazon. I stand by my assertion that it's undefined. I'll also stand by the remark "pretty much noise" as we've had to do so much work cleaning it all up just from typo and copy and paste errors - Amazon import errors are a big pain too, and will be an ongoing one in the short-term until we get Dissembler improved. Library data too. At the moment I trust publisher data to almost always contain a nugget of truth, but it takes my own knowledge to find which bit that is, and find the "important" info elsewhere - and importance is relative. DAW is actually a simple case compared to the publishers with non-SF titles too, that have changed hands far more times. [[User:BLongley|BLongley]] 20:56, 6 September 2008 (UTC)
 +
 +
:::: With DAW a 1st U.S.A. printing will be $3.99 and a 1st Canadian printing will be $4.50. We already have had confusion several times when someone missed the "Printed in Canada" notice and we had a collision of two people with a 1st but at different prices. Thus I've been using "DAW Books (Canada)" as the Canadian editions have their own printing history and set of prices. The discussion did make me realized that I should set up a wiki page that explains why I use the publisher name [[Publisher:DAW Books (Canada)|DAW Books (Canada)]]. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 00:49, 6 September 2008 (UTC)
 +
 +
::::: Good work - it still won't fix past entries though, unless we reverify everything already done to make sure it's verified to the new standard. (And do we want a standard for each publisher, or just ONE we can put in help?) I believe Al was/is looking into edit histories though, in which case we might spot what was changed after a verification and make our own decision on data reliability. 20:56, 6 September 2008 (UTC)
 +
 +
:::: BTW - one reason I really like using the publisher name to designate something like DAW Books (Canada) is that it's [http://www.isfdb.org/cgi-bin/publisher.cgi?13301 easy] to get a list of when that name was used. Assuming that an advanced publisher search for DAW Books with a price of C did not crash it would still be much slower and have a much harder to view result set.
 +
 +
:::: Bill - when I have a publication with multiple prices I enter the one that seems to be first in the Price field and document the price block in the notes.  Hopefully someone in Australia, Malta, etc. will view the publication record, at least in anticipation of cloning it, and see that I've already documented it. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 07:03, 6 September 2008 (UTC)
 +
 +
::::: That's pretty much what I'm doing now, in the hope it helps, after seeing some Australian works entered that seemed to be the same as my own. But there's still the thorny issues of some books not just having different prices for different markets, but also listing different publishers and different ISBNs. I'm not sure where to go with those (two entries here, or one with LOTS of notes? I suspect that as ISBN is a common search we'll need separate entries for exactly the same book, which looks bad till we cross-reference better.) [[User:BLongley|BLongley]] 20:56, 6 September 2008 (UTC)
 +
::::: Still, having stirred things up I think I'll just take a back-seat for a bit and see what ensues. If anyone wants me to add data points for an example publisher feel free to ask if I have some - just don't make it Pocket please. I change my mind about them with every new example I acquire. :-/ [[User:BLongley|BLongley]] 20:56, 6 September 2008 (UTC)
 +
 +
== Publisher names- Merging(methods and thoughts) ==
 +
 +
Here's how I treated Gulliver Books when I merged the three different names(Gulliver, Gulliver Books & Gulliver Books Paperbacks)[http://www.isfdb.org/cgi-bin/publisher.cgi?3980]. All of the info comes from the copyright pages, covers, Amazon and sellers on line. My method is to use the cover and copyright page and then look at Amazon and then check on Abebooks or some of the other online book sites, if I can build a history of the publisher or imprint I'll add the data to notes with links to Wikipedia or their current web site. So far I've done over 235 merges(plus 765 updates) and over 90% of the time the parent name I've used is the one that has been verified by most of the current mods. The variant names by and large have had no verifiers over 95% of the time. The 5% that are verifed I either leave the name as is for later or I will ask the verifier on their talk page if their still active or responding. From my point of view most of these variant names are from Amazon which is inconsistent , I've found several cases where they've used the distributors name for 5 different imprints out of the UK. Amazon UK lists the proper imprint/publisher while the US site lists the distributor. The other problem is a lot of the early data in the database is from book sellers on the web, some cut and paste from Amazon and each other, the rest are a mixed bag, some are professional and have good data right from the book and at the other end of the spectrum some dealers have minimal or crap data. For names that I can't resolve I generally leave them as I find them and only add any info I've found into notes.[[User:Kraang|Kraang]] 02:05, 6 September 2008 (UTC)
 +
:By "a lot of the early data in the database" do you mean data that has been in the database longer, or data that is from earlier publication dates (1700's, 1800's)?  If it's older publications that you've been having problems with... I HIGHLY recommend Google Books and The Internet Archive. [[User:Kpulliam|Kevin]] 04:10, 6 September 2008 (UTC)
 +
:::No, I meant data that had ben in the ISFDB longer, that was enterd when entry standards were less well defined than they are now. Alomst none of this came from teh iA, although a good deal of it did come from actual books. but a good deal moare came from secondary sources such as Locus, Clute, Tuck, and Amazon, and a fair amount from OCLC and LOC records. Several of those sources are known to abbreviate publisher names, and in soem cases assign misleading city designators. Data from such sources will be relaible as to whehte it is Baen or Tor, but not as to teh precise form of the publisher's name used in the actual book. Data entered fropm a physical book could, of course, be accurate on the precise form of publisher name used, but I strongly suspect that it often isn't. Note that the help now says "''The publisher has in the past not been a key entity in the ISFDB, but publisher and imprint support is in the process of being improved, and a process of determining canonical names for publishers and imprints is in progress. For the time being you are free to choose an imprint ("Ace Books"), a division ("Berkley") or the parent corporation ("Penguin Group USA") as you wish''". Untiel recently is said "''is not a key entity''", thus infoming editors that precision here was not vital. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:27, 6 September 2008 (UTC)
 +
::The Internet Archive derives it's book data from actual page scans and always lets you see the actual pages of the work. It also allows searching by Publisher name (Which have so far been accurate to the printed TP).  The one downside of TIA is that it is primarily limited to works published prior to 1923, for North American library sources, but there is a Library in India that has been uploading scans of stuff at least through the 1950's. Even if the work you are researching isn't in the archive, you can often get an idea of what the publisher was calling itself in that decade or even the same year. TIA is scanning something on the order of 1000 books a day.[[User:Kpulliam|Kevin]] 04:10, 6 September 2008 (UTC)
 +
:::Of course the next thing I loked up abbreviated Methuen as "London: Methuen" and I clicked the link and found 1859 publications under that publisher name. So it's not as accurate as I thought.[[User:Kpulliam|Kevin]] 04:55, 6 September 2008 (UTC)
 +
::Google books also derives it's 'older' data from actual page scans. Items published in the last 15 years appears it uses data provided from the publisher. Older items have publisher data entered from page scans.  For items published prior to 1923, they 'usually' have the page scans available for viewing. They also have alot of works entered from catalog data. For stuff they have scanned, but that is after 1923, you can search inside the book and get the text for a line above and below. (As an example, they have Bleiler's 1948 Checklist of Fantastic Fiction scanned, but you can't get page images because it's still in copyright.[[User:Kpulliam|Kevin]] 04:10, 6 September 2008 (UTC)
 +
::I don't think amny ISFDB records to date have been based on google books scans. No doubt more will be in future. -[[User:DESiegel60|DES]] <sup>[[User talk:DESiegel60|Talk]]</sup> 07:27, 6 September 2008 (UTC)
 +
:::I am now (but wasn't at first so there are some I need to still sweep up) '''tagging''' books I've entered from scanned pages as a primary source as [http://www.isfdb.org/cgi-bin/tag.cgi?3196 Google Books] and [http://www.isfdb.org/cgi-bin/tag.cgi?3195 Internet Archive]
 +
 +
== Publishing groups and imprints ==
 +
 +
I'm doing an experiment when verifying my next few publications of using them to identify and document the actual publishing group names, publisher names, imprints, street addresses, etc,.  An area where I'm undecided is where to document the raw data that's stated in the publication.  For now I'm using the imprint's page though I'm thinking of shifting it into the publication notes.  Obviously, anyone's willing to add data to the wiki pages/categories but I'd prefer that any additions to the new categories be based strictly on what's stated in publications and that every gets sourced. The idea is to keep the signal level as high, and verifiable, as possible.
 +
 +
* [[:Category:Publishing Groups]] would just be companies identified as publishing groups.  I'm undecided if things such as Jim Baen's "Publishing group" that only owns one publisher that only uses one imprint qualifies.
 +
* [[:Category:Publishers]] This is the existing category and has a fair amount of noise.  I'm not too concerned about that though am thinking of a new category that would contain bona-fide publishers that would be "high signal" as based based on physically verifiable sources.
 +
* [[:Category:Imprints]] would be imprints. Deciding if something is an imprint is tricky at times and so ideally we document and source what's stated in a publication that seems to indicate a name is an imprint. I just looked at a DAW book and there's a rather vague "DAW Books is distributed by Penguin Group (USA)". <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 07:47, 6 September 2008 (UTC)
 +
 +
: Well, the next book I picked up contains [[Publisher:Berkley Sensation#Data Points|this]] beast of a copyright page. At the moment I have no idea how I want to file this in terms of publishing group pages though am leaning towards a single page titled [[Publisher:Penguin Books Ltd.]] to document the state of the empire. I'm off to a library book sale. :-) <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 15:59, 6 September 2008 (UTC)
 +
::Who says you can't have Publishing Groups owning Publishing Groups? (fake links added for effect)
 +
::*Imprint: Berkley Sensation; Wiki page indicates is an imprint of [[Publisher:Berkley Publishing Group]]; ISFDB entered as Berkley Sensation (If we are going to admit that we mix Publishers and Imprints in the Publisher datafield)
 +
::*Publishing Group: Berkley Publishing Group; Wiki page will indicate it is owned by [[Publisher:Penguin Group (USA)]]
 +
::*Publishing Group: Penguin Group (USA); Wikipage will indicate it is owned by [[Publisher:Penguin Books Ltd (Worldwide)]]
 +
::*Publishing Group: Penguin Books Ltd (Worldwide); Wiki page will indicate that this is a parent Publisher of numerous Publishing Groups (insert list), and that the top parent company is at 80 Strand, London WC2R 0RL, England.
 +
 +
::Have fun at the Library Sale!!  [[User:Kpulliam|Kevin]] 16:29, 6 September 2008 (UTC)
 +
 +
::: The library sale netted the usual full bag of books though this time I got a lot of hardcover anthologies and fewer paperbacks.
 +
 +
::: The hierarchy makes sense.  I see you manufactured a new name [[Publisher:Penguin Books Ltd (Worldwide)|Penguin Books Ltd (Worldwide)]] that does not seem to be used by Penguin.  I'm undecided on if that's a good thing as while it's plainly descriptive I had been thinking of sticking to names "as stated" by the publishers.  I'm going to try to separate the mix of publishers and imprints. Obviously if a name is both that we'd add it to both categories.  I'm thinking now that imprints should not be indiscriminately mixed in the publisher's category.  Obviously, some names such as Penguin Books will be an imprint, publisher, publishing group, and the owner of publishing groups meaning it'll end up in all the categories we could devise. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 18:59, 6 September 2008 (UTC)
 +
::::The only thing I really intended to 'manufacture' was the (Worldwide) addition.  It seemed clear to me that there were names and addresses of various sister corporate entities.. and below them all was "Penguin Books Ltd." with no parenthetical country notation, and the statement 'Registered offices'.  This seemed clear to me that this line (noting that it was separated) applied to everything above, and did not apply to a particular country.[[User:Kpulliam|Kevin]] 21:30, 6 September 2008 (UTC)
 +
 +
::::: Penguin Group (USA) just goes under "Penguin Group", see http://www.penguin.com/ for the top level - there appear to be nine geographical groups under that. I'm not sure "Penguin Books Ltd" exists anymore - "Penguin Group UK" seems to be the current name. And even Penguin Group (no suffix) is not really the top parent company as they're owned by [http://www.pearson.com/index.cfm?pageid=11 Pearson] - some of the Nonfiction we have goes under their Education section. Why do I keep wanting to draw diagrams? :-/ [[User:BLongley|BLongley]] 19:32, 8 September 2008 (UTC)
 +
::::: Still, the hierarchy makes it look simpler than some of my books where most of the copyright page is Penguin companies - the average seems to be five in my books, but there's been well over a dozen in some cases. [[User:BLongley|BLongley]] 19:32, 8 September 2008 (UTC)
 +
:::::: I'm still experimenting and am undecided how to handle conflicts such as a Penguin book that only lists a couple of the companies.  Does that mean that at the time the other companies did not exist or for whatever reason they did not get mentioned? I'm also mulling over how to handle references to the source documents. For now I'm copy/pasting it into every page but just realized one way to centralize this is to use the Publication namespace. <span style="border: 1px solid #f0f; border-bottom: none; padding: 0 2px">[[User:Marc Kupper|Marc&nbsp;Kupper]]&nbsp;([[User talk:Marc Kupper|talk]])</span> 05:46, 13 September 2008 (UTC)
 +
 +
 +
and oddities.  For example, I just verified an "Ace" book that stated "Ace Science Fiction Books."  Either no one ever used that as an ISFDB publisher name before or someone merged it back into Ace.  At the time, June 1983, it was an imprint of [[Publisher:Charter Communications, Inc.|Charter Communications, Inc.]].

Latest revision as of 01:46, 13 September 2008

General concepts

  • If you create a new publisher wiki page then please add it here by adding a category link or template. Do the same if you find one not that's not listed here. See below for details on how to do this.
  • If you change the name of a publisher page using the Wiki "move" function then the change will be automatically reflected here as long as you do not delete the category link or template.
  • If you redirect a publisher page to the canonical version for a publisher then include it in this category if it's a well known imprint.

How to include a Publisher Wiki page in this list

The easiest way to get a publisher Wiki added to this list is to add {{Publisher Category}} to the page. Just copy/paste {{Publisher Category}} to the page and save it. As a convention, this should be added to the bottom of the page.

Note that if you are including a redirect page in the publisher category that the {{Publisher Category}} needs to be on the same line as the redirect. For example

#REDIRECT [[Publisher:Target Page]] {{Publisher Category}}

A more complete approach for including an article in the publisher category is to use Template:PublisherHeader the top of the publisher's page. This will add a standard header to the page and will include the publisher in this category without any further effort.

With both of these the edits are done to the individual publisher's page. For example, to include the publisher Fontana in this category you would edit Publisher:Fontana and not this page.

Don't redirect publisher pages

In my opinion.. this line "If you redirect a publisher page to the canonical version for a publisher then include it in this category if it's a well known imprint. " is a bad idea. We should not encourage redirects of publisher wiki pages. You should leave the page with a link to the primary house name page. Imagine if you had automatic redirects for pseudonyms? I check the wiki page to make sure I'm picking the correct publisher. If both 'Smith and Jones' and 'S. & Jones' take me to the correct place.. some people will think S & Jones is correct and other think Smith is correct... further muddying up the database. IF however the editor found a note on S & Jones wiki page saying "Variant Imprint for Smith and Jones - Used between 1910 and 1923. For other dates please see Smith and Jones" then there is no confusion. Kevin 22:29, 4 September 2008 (UTC)

We use wiki redirects when there is little or zero information available about a name. If there is a little bit of information available then it make sense to add that as a section of a longer article and to redirect that name directly to the section.
I agree with you completely that if there could be confusion about a name then the article should be a disambiguation/explanation page and not a flat redirect.
I'm also trying to keep the help worded as tightly as possible and this page is not an effort to document all of the possible exceptions and side-rules that could apply. I'm of that opinion as 1) This is a header for the category and the header will be included with each page of the category. 2) I believe this page will mostly be used by people interested in browsing the publishers and not people seeking the rules related to this category. If we need more detailed rules for managing the category I'd like to either have them on this talk page or a separate page.
My thinking at the time I did a number of redirects and added that line is that on the ISFDB database side we have many similar names for the same imprint or publisher. The redirects are intended to merge the variant names to the appropriate article. If an imprint is well known then it would get added to the category so that someone browsing for a name from the category can find it. As it is, usually pages that are silent redirects are very similar names to the well known ones.
Many of the "publishers" people have been using for ISFDB publication records are shorthand for an imprint. For example, DAW and DAW Books commonly found in ISFDB and both are short for DAW Books, Inc. which is an imprint of Donald A. Wollheim, Publisher which itself was a subsidiary of New American Library (NAL) which is now owned by Penguin Group (which I don't think has ever published any books). I suspect the formal article on DAW should be DAW Books, Inc. with all of the other names except for NAL being redirects to it.
The choice of when and if to make a name part of a category is subjective and my thinking of "Well used name" was that we would take a look at DAW for example and pick one of the names as the "Well used" one to be included in the category. I'd probably vote for DAW Books as that's more likely to be recognized by someone not familiar with the publisher than just "DAW". Marc Kupper (talk) 00:30, 5 September 2008 (UTC)
Thanks for explaining. I was just opposed to 'advertizing' the solution of redirecting pages. I think it invites confusion always because separate publisher names in the database are linking to the same wikipage (implying that the separate database entries are equal). I think my concern was muddied by my inclusion of the rest of the sentance, so let me ask it again plainer. Are you comfortable with separate database entries, pointing to the same wikipage, when the database implies that it should be pointing to a separate and individual page? When you use a redirect 'when there is little or zero information available about a name' you are automatically inviting confusion as to which name data you later collect refers to. I was proposing that a non-automatic redirect (a link) prevents this confusion from ever occuring.Kevin 00:56, 5 September 2008 (UTC)
I've moved the rules to this page so that we can feel free to expand/elaborate on them. I'm comfortable with the present system of multiple/similar names in the database. For example, I brought up Baen Books below. Someone looking at and verifying a Baen publication is likely to use either the name Baen or Baen Books. Neither is correct as a "publisher" name but both are widely recognized. We could fix up the ISFDB software to force people to use "Baen Books" and that would certainly help clean up some of the mess. A concern I have with this in that people like to take the least resistance path (we are lazy). If someone happens to have a Baen Starline book and their only choice is "Baen Books" the odds are they will pick "Baen Books" even though it does not match their publication rather than doing whatever it takes to get Baen Starline added as an imprint. Ideally they add a publication note that the publication states "Baen Starline." Should we make it as easy to add "Baen Starline" to the system as it would be to select "Baen Books?" How do we deal with people who are not sure what the publisher is? For example, they may be adding a book based on a seller's description and thus may not know if this is a "Harcourt Brace & Company" or "Harcourt, Brace & World, Inc." books. If they pick one that implies great precision in that it *is* a "Harcourt Brace & Company" publication.
One other downside to a system that encourages selecting names from a defined set is that the names in the defined set will tend to be the longer ones. We'll have "DAW Books, Inc." in the list as that's what's stated. This means though that the list of publications on a title record will get harder to read as it's no longer books by DAW, Ace, Baen, etc. Marc Kupper (talk) 02:02, 5 September 2008 (UTC)
I personally feel that making every page a non-automatic redirect will add aggravation. At present the database has 8,363 publishers referenced by 127,250 publications. I suspect most people would be happier and less confused by a system that has 8000 automatic redirects to a core set of publisher articles than to have 8,363 landing pages with next to zero content other than a link to a publisher page. There is a project underway to pair the 8,363 publishers down by merging similar names but that's another can of worms. Marc Kupper (talk) 02:02, 5 September 2008 (UTC)
I am all for making pages true (automatic) redirects when there is no significant difference. For example, of Publisher:Baen and [[Publisher:Baen Books} one should surely be a redirect. I must admit that in the matter of regularization I would tend to favor the shorter names in many cases: "DAW" not "DAW Books, Inc", "Baen", not "Baen Books", "Tor", not "Tor Books". But that is really a discussion for a somewhat different place. Forcing selection from a list (or even making it possible) in the DB proper will not, i suspect, and should not come until publisher regularization has progressed a good deal farther than it has at present. Wiki redirects on publisher pages are IMO needed precisely because people will enter variants. Suppose, for example, we decide to standardize on "Baen" (or the other way round). Then Publisher:Baen Books will still exist, because people will continue to enter publications that way in some cases, but it would redirect to Publisher:Baen, so that anyone following the link would know the standard name accepted here. Frankly, having two different wiki pages referring to the same publisher strikes me as only an invitation for things to get out of sync, for inconsistent or incompatible data to exist on different pages. People will be forced to do extra work to try to keep them in sync, and even so the sync will be, at best, imperfect. My view would be that ideally, for any given actual publisher, there would be ideally a single wiki page, with all other variations being redirects. (Cases like DAW that need sub-pages are fine, as long as there is a single root of the collection.) Imprints that have a truly separate identity should probably have their own pages, particularly when the imprint has been part of more than one publisher over its lifetime. But each imprint should ideally have only a single page, as well. -DES Talk 15:45, 5 September 2008 (UTC)

Categories on redirect pages

(moved from User_talk:Marc_Kupper#Category:Publishers) (and split to it's own thread try to prevent topic drifts)

Part of the reason why I wanted redirects excluded from Category:Publishers was that I wanted to use it to help drive merges. If editors saw similar names on the list (presumably near each other), it might be worth looking to see if they could productively be merged, either just in the wiki, or also in the DB. But if such looks commonly resulting in finding that the similar publishers were already redirs, then I fear editors will stop checking and miss the remaining plausible merge targets. At the least i would like to suggest that unless the imprint is both quite well known, and significantly different from the parent, it should not be listed. For example, I don't see much gain to listing both "HarperVoyager" and "HarperVoyager/HarperCollins". But what I particularly didn't want was to see both "Baen" and "Baen Books" or similar variations, if the wiki pages have already been merged.

Also, it can be just slightly tricky to include a redirect in a category. it can be done, but the category link must be on the same line as the redirect link, if it is on an lower line (the link must be on the first line of the page, or it isn't a redirect), the wiki software will simply drop it when the page is saved -- all content after the first line is dropped from any redirect page. If you want to encourage listing redirect we should mention this or people will be frustrated. -DES Talk 21:51, 4 September 2008 (UTC)

My thinking that only the canonical name for a publisher or imprint would get the category designation and that it does not matter if that name is an article or redirect page. As it is, we rarely use the full names in the database meaning we may well have a page such as DAW Books Inc., that redirects to the DAW article and it's only purpose is to define DAW Books Inc. as the canonical name for the imprint. I'll update the help to mention the business about that redirects need to be one line files. Thank you for a heads up on that. Marc Kupper (talk) 02:54, 5 September 2008 (UTC)
I started to update the help and realized there's two conflicting goals. 1) That we designate a name as canonical and include it in the publisher category. 2) That we will have articles about publishers or imprints but they may not always be filed under the canonical name. For example, I suspect DAW Books Inc. should be the canonical name and that the article is filed under DAW. One solution would be to make it a rule that the canonical name always be the name the article is filed under and that other names such as DAW and DAW Books would redirect to this. Marc Kupper (talk) 03:29, 5 September 2008 (UTC)

Rambling discussion 1

(moved from User_talk:Marc_Kupper#Category:Publishers)

Frankly, my initial enthusiasm at seeing more people finally working on publishers has been almost entirely lost. Most people seem to be trying to organise the little publisher data we have rather than improve it. Who has decided that we need Category:Publishers and why has nobody considered Category:Imprint? When a stub Publisher wiki-page mentions other Publishers that have wiki-pages why aren't they being linked? Why are there no discussions about "canonical publisher"? Or about what should be on a publisher page anyway? There's nobody being brave enough to do some BIG merges like all the Putnam or Harper pubs (thankfully - I don't know much about Putnam's but the Harper categories are a mess, and I don't think we SHOULD deal with those until we establish, for instance, when Voyager was a plain COLLINS imprint rather than a HarperCollins one.) There's big risks here - e.g. it's valid (from some viewpoints) to merge "Point SF", "Point Horror", and "Point Fantasy" into "Scholastic". But Amazon messed a lot of that up for us anyway and it should be our duty to SEPARATE some useful imprints, IMO. BLongley 22:47, 4 September 2008 (UTC)
I'd suggest people back off from all this wiki-fiddling: ensure there IS a wiki-page for any imprint or publisher that deserves recording, and on it ASK for the data you want. ADD the data you have. Or create/expand the wiki-page to say why it SHOULDN'T be there and point people to where the right place should be. For example, I suspect "Transworld" has never been an imprint - most pubs should be Bantam or Corgi. I definitely don't want to merge Bantam and Corgi. I think we could consolidate a lot of the publishers we have under those names though - but "Yearling" leads into other investigations. I'd really prefer us to actually work on publishers and imprints rather than on organising the wiki pages for all the temporary notes for such. SOME organisation and linking is needed for that research, but too much and you make it look like a temporary note is a canonical entry. The Wiki entries we need are central "what the f**k are we trying to do here?" discussions still, not the "I've linked to the 27 different variants we have of this name" (of which I'll destroy a few just by fixing my own entries) or the "I've merged the UK, US and Canadian versions of this as the price should be clear enough" (something I've been very tempted to do with recent titles). Put some aims in place and we can work toward those - just categorising for the sake of it isn't helping. BLongley 22:47, 4 September 2008 (UTC)
For the record - I'm opposed to merges. People enter what's stated in publications and a merge will result in publication records getting changed without direct inspection and verification that the new/merged name is the same as what's stated in the publication. In the past I've done merges within my own databases and I've come to regret each of them.
I also agree that we have a problem in that publisher names are all over the map and that too often if five if us looked at a single publication and hour later the same five of us looked again that we'd have ten different publisher names.
As for publishers vs. imprints. 99% of what we call "publishers" are imprints or even logos or trademarks and in fact I don't think we have any pages for publishers. For example Publisher:Baen Books is an imprint of Publisher:Baen Publishing Enterprises and then often just have a Publisher:Baen logo on the title page. So what do you want to do? Historically we have called the "publisher" "Baen" as that's what's on the title page and less often people use "Baen Books" for publication records. So what's the correct canonical publisher name? Is it the most often used "Baen", the less used but more precise "Baen Books" or the correct "Baen Publishing Enterprises" that not a single person has ever used for a publication record? Marc Kupper (talk) 23:47, 4 September 2008 (UTC)
I agree on that for post 1950 works, and disagree for earlier works. Maybe we can work towards organzing the information enough to propose an upgrade of the database where we can add imprints as a subset of Publisher. Or maybe we can hack it for now, by overloading the Publisher field in some way as it is sometimes overloaded with citynames now. Perhaps like so PUBLISHERNAME:IMPRINT_NAME:EVEN SUBIMPRINT NAME, two examples "MACMILLAN: TOR: FORGE" and "MACMILLAN: TOR: TOR". Thoughts?Kevin 00:49, 5 September 2008 (UTC)
Bill. Please stop complaining about other people working on something that you don't see as productive at this moment. For your record, I'm the one who said 'Why don't we categorize them', and the reason there isn't a Category:Imprint is because the database doesn't have that field, and the database doens't automatically make a link to a wikipage for working on that field. Complaining about other volunteers donated efforts, which don't negatively impact your donated efforts is classist and rude. (End of rant) Kevin 00:49, 5 September 2008 (UTC)
And if you are still concerned about lost 'database effort' well then don't you worry about that. I reserved most of that cutting and pasting to time I was not going to be working on the database due to the party I was having with 'my good friends' Jack Daniels, and Jim Beam whilst watching TV. I promise you it was Cut and paste or nothing at that time. (Now where did my good friends go?) Kevin 00:49, 5 September 2008 (UTC)
Bill. I almost proposed a merge yesterday based on the work I have been doing on the wiki pages. But I didn't know the best place to discuss it, nor what level of documentation I would need to support the merge. Maybe this is a good place to discuss that. I would also love to participate in a discussion of how to determine Canonical Publisher names. Please start em up below.Kevin 00:49, 5 September 2008 (UTC)
As to merges in general, as long as they are researched, and documented, and there is a consensus that the changed entries were in fact incorrect, I'll happily support cleaning things up. Kevin 00:49, 5 September 2008 (UTC)
As to Redirects of Publisher Variant1 to Variant2, I'm Opposed, see above.Kevin 00:49, 5 September 2008 (UTC)
As to putting Redirected pages into the Category:Publishers, It depends on why the redirect exists in the first place, but probably opposed.Kevin 00:49, 5 September 2008 (UTC)

how to determine Canonical Publisher names

I would also love to participate in a discussion of how to determine Canonical Publisher names. Please start em up below.Kevin 00:49, 5 September 2008 (UTC)

Entry and verification of publications has generally been considered mechanical "non-thinking" work. It should be completely objective. When we move to titles and authors there is often quite a bit of of subjective work. As we have over 8000 publisher names to deal with though it sounds like we should have a near "mechanical" method for determining which names should be canonical. We can't put the individual names up for votes. My "vote" would be the commonly known names, the most frequently used version (per counting pub records), and that the names be be two words or more. Use "DAW Books" rather than "DAW" so that we catch Pocket, Inc. vs. Pocket Books, Inc. Marc Kupper (talk) 04:35, 5 September 2008 (UTC)
See Publisher:Methuen. I have collected 3 verified datapoints, with two variant names. Eventually the datapoints themselves will tell the story, and the 'common elements' would form the canonical name. (At this point for Methuen, the data recommends 'Methuen & Co.'. At least for early twentieth century works... Kevin 05:18, 5 September 2008 (UTC)
That's a good article. In this case I'd say the canonical name is Methuen and that in the database we will see
  • Methuen & Co.
  • Methuen & Co. Ltd
  • Eyre Methuen which has it's own page.
ISFDB happens to have a handful of entries
I don't see any obvious safe merge candidates and so if it were me I'd make all the wiki pages for those names redirects to Methuen and as the Eyre Methuen article is so small I'd also move it's content into Eyre Methuen#Eyre Methuen and redirect that too. At some point the publications with "Methuen (UK)" for example will get verified and the publisher name changed to one of the preferred versions that also reflects accurately what's stated. At that point the wiki page for Methuen (UK) can be dropped. Marc Kupper (talk) 09:03, 5 September 2008 (UTC)
You say that you don't see any safe merge candidates. I would be included to merge at least "London: Methuen", "London: Methuen & Co., Ltd.", and "Methuen (UK)", and to separately merge "Methuen & Co." and "Methuen & Co. Ltd". I really don't believe that which of these got entered in the ISFDB corresponds to what was in the publication (and still less to the actual identity of the publishers) in any meaningful way. What you have here is not data, it is noise. First of all, in the many cases where the initial entry was from a secondary source, it is known that different sources report the same publisher in different ways, according the to the standard in effect (or the whim of the cataloger when there was no standard) when the book was cataloged -- indeed the same source may well report the same publisher differently when there are multiple entries for the same edition of the same book. Secondly, some editors will have entered fully whatever was in the book or secondary source, some will have automatically omitted city names, and some will have entered only the key word of the publisher's names, omitting things like "& Co." or "Ltd" or even "and sons". Trying to base any useful deductions on such entries is not merely onerous, it is fundamentally misguided. It is IMO far better to do separate research into publisher histories, determine the names actually in use at any particular period, confirm this with a few questions to verifiers about the actual form of the publisher's name on specific publications, and use all this data to determine a canonical name and known or plausible variations for a given publisher. Where the name actually changed at particular points in time, this should be noted. In some cases this may be an aid to dating undated pubs. In others it may help to enter pubs of known date with the proper name for the date. -DES Talk 16:06, 5 September 2008 (UTC)
It was said above that "Baen books" is an imprint, not a publisher. In a technical sense this is true, but "Baen Publishing Enterprises" has never had more than one imprint, nor has that imprint ever changed, either identity or name. Thus the distinction is not, in that case at least, useful. -DES Talk 16:06, 5 September 2008 (UTC)
The reason I did not see safe merge candidates is that from Publisher:Methuen I got the impression that the publisher uses several distinct, though similar names. "Methuen & Co." and "Methuen & Co. Ltd" were used at different times and as publication records should accurately reflect the name we will see both in ISFDB. As the ISFDB data gets more accurate it can get used for secondary functions such as estimating the date for an otherwise undated publication. Marc Kupper (talk) 19:13, 5 September 2008 (UTC)
If when you say "publication records should accurately reflect the name" you mean that they ought to reflect what is actually in the publication, you may have a point, although i tend to disagree. If you mean that you think that in a majority of cases they currently do reflect what is in the publication, than I think you are badly mistaken. -DES Talk 19:48, 5 September 2008 (UTC)
I'm a TTL based system and never got the upgrade that allows me to parse the distinction between should and ought. I believe nearly all of the ISFDB, and even Amazon, data has some basis in fact and the current does does reflect what's stated in the publications. Note that I use "reflects" and not "matches." Even though the publisher field contents are much derided I've also found that it's very rare for it to be wrong. Marc Kupper (talk) 23:17, 5 September 2008 (UTC)
In this case, as far as I am concerned "should"="ought", but neither is the same as "deos in fact". I agree that the vast majority of current ISFDB data does indeed "have some vbasis in fact". If the publisher field says "Methuen & Co." I'm pretty sure that the boom wasn't published by another firm altogehter, but by Methuen under some name. But the resolution (so to seapk) of tjhis accuracy is limnited. What I don't feel confident is that the book said "Methuen & Co." rather than "Methuen & Co. Ltd", or some other varient. If the use of these varients was in fact significant (which i would want evidence for, rather than merely assuming) I don't think we have been relaibly or consistantly capturing data at that level of precision -- I know that I haven't always done so, in my edits. -DES Talk 07:16, 6 September 2008 (UTC)
As to what ought to be in the publication records, i am inclined to think that we should move (slowly and carefully) towards a system in which each publisher or imprint has a single canonical name, or a single name for any given point in time. (For example, one entry might read "Use 'Harper' before 1817, use 'Harper & Bros' from 1817-1960, use 'Harper & Row' from 1961-1997, thereafter use 'HarperCollins'. If there is good reason to think there is a meaningful variation, list it in the notes and record it on the wiki page."). Editors would be encouraged to use the established canonical names, and periodic scans would be made for non-canonical entries, which would be checked and converted if there was no good reason not to. Note, I do not think we are ready for this style yet, we aren't nearly ready to establish canonical names for publishers firmly enough for this in any but a very few cases. (Baen is one, there being no name changes or multiple imprints involved there.) But i hope and expect that we will eventually move to that model, just as we established canonical names for authors. And just as an author's canonical name need not be the author's legal name, or even the author's preferred nickname, so a publisher's canonical name need not and mostly will not be the same as that publisher's legal corporate name, nor need it shift if the publisher made a minor change in corporate name. -DES Talk 19:48, 5 September 2008 (UTC)
Agreed - I sent an e-mail to Al this morning asking about cloning the ISFDB system into a vmware image. That'll allow us to start prototyping ideas as one thing I want to add before we do publisher merges is a history log for every record that'll have something like 20080905162123|Marc Kupper|pub-224321|Publisher changed from "Methuen (UK)" to "London: Methuen & Co., Ltd."|Reason... I'd be less concerned about data loss. Marc Kupper (talk) 23:28, 5 September 2008 (UTC)
That I could agree with. If such a system wre expcted soon, I would be willign to postpone any publisher merges until it was in place. -DES Talk 07:16, 6 September 2008 (UTC)
As for the assertion that the existing data is noise. That's true - it's all noise, particularly as there are no defined/agreed standards for the publisher name field. This is why I made the five people looking twice can arrive at ten answers comment. I'd like to see us moving in the direction of physically verifying the publisher names using an agreed standard so that we can get some "signal" in the noise. The agreed standard may well be that we don't care about the details and that "Methuen & Co." and "Methuen & Co. Ltd" are the same for example and if that's the case then yes, merge away. I'm personally advocating that we do document the details but to really improve the results I believe we need to have new code that supports a publisher selector thing that'll pop up an image of what the bottom of the title page (or whatever we agree to) should look like. Marc Kupper (talk) 19:13, 5 September 2008 (UTC)
I'm glad you agree that much of the existing variation is noise. No doubt some of it is signal: some editors or verifiers have on some occasions been careful to record exactly what was in the actual book in a consistent way. But is there an reasonable method to determine which records are accurate in that way -- to filter the signal from the noise -- to make it worth trying? Or is it better to simply abandon the attempt to draw fine distinctions from our existing DB? (Do note "fine", I am sure that when a publication record says "Methuen & Co." that means it isn't "Macmillian", but I doubt that when one pub record says "Methuen & Co." while another says "Methuen & Co. Ltd", and yet another says "London:Methuen & Co." that the differences are worth even trying to sort out. I would merge them all, discard the noise, and not worry about any signal I might be losing, because I don't think the signal could ever be reliably filtered out. Remember that making distinctions based on unreliable data is quite probably worse than making no distinctions at all. -DES Talk 19:57, 5 September 2008 (UTC)
The method you suggested of merge now and then sort out the details by asking verifiers may be less work. I'm not opposed to it though it means that we will loose the existing "signal" in the haystack. For example, in light of this discussion I just went back and re-verified my most recent publications to make sure the publisher field matches exactly what's stated. My previously verified "Harcourt Brace" became "Harcourt Brace & Company" for example. Any merges of Harcourt will loose that signal. I thought I've seen "Harcourt, Brace & Company" (with a comma) and see a smattering of Harcourt, Brace in the database. I have no idea if this is introduced noise or if these reflect what's stated. It's interesting that all of them are in a tight date range implying it's stated. Marc Kupper (talk) 19:13, 5 September 2008 (UTC)
But note that even if those are as stated, even if Harcourt actually used different forms of its name on different publications, it was still the same entity, and should probably have only a single publisher record. Perhaps we should have a field for "canonical publisher name" and one for "publisher name as stated". Note also that, unless the change in form of name was regular enough to help determine publication dates, it is of no particular value. Granted, collecting it might help establish whether it was in fact regular enough to be useful for a given publisher during a given period. In any case I think we need to start with the researched facts. Knowing when Harcourt went through various mergers and made various changes in its official, documented name (info that is surely available to a little research) would give us a starting skeleton of known fact. Then we could check with verifiers about books that were published at known dates, and compare the results with that skeleton. Before long, we should have a dataset that is firm enough to actually help us. Doing this publisher by publisher would be tedious, but i think it is the only reliable way forward, and it is still better than going at it publication by publication, never knowing how meticulous an editor was, nor whether a non-verified entry took the publisher name from a secondary source, nor whether multiple forms that may actually have been used were meaningful distinctions. -DES Talk 20:08, 5 September 2008 (UTC)
The business about Baen Books being a imprint was more directed at Bill. I'm still trying to understand what he wants us to do other than to stop fiddling and organizing. I'm just wondering if he wants "Publisher" to be reserved for the actual publishers (which are sometimes difficult to determine these days) and that there be separate name spaces for publishing groups and imprints. It would be a "project" though certainly possible and probably outside of the publication centric scope of ISFDB though having accurate source data in ISFDB will help greatly. Marc Kupper (talk) 19:13, 5 September 2008 (UTC)
I want the "fiddling and organizing" that doesn't actually ADD any data to stop interfering with my work - I don't like being told that I "should" also do this or that to a wiki-page that I will not get any benefit from when I take an ISFDB backup and use it. If it's a database field, yes, we need pretty rigid guidance or it's useless. We're stuck with using the wiki-pages until we can resolve what should be done in the database - and although I do try and use the wiki pages to ADD data, and even ORGANISE data (linking between "publishers" is only possible on the Wiki side so far), I have to treat the Wiki as a temporary solution and a guide to what we want in the database. This conversation will be in the wiki indefinitely - do I want it in the database? No. Do I want to improve the publisher data in the database itself? Yes. Do I want to destroy the data that people have so carefully entered despite the fact that we've got no rules for entering it? No. Please, EVERYONE, add more data to support anything you want back from the database. I really want Al to allow publisher regularisation that does NOT mess with an intent of "I want to call the publisher of this a scumbag vanity-press ripoff-merchant" - and that means we have to have more fields available to mess with or NOT mess with. Add examples in the wiki to show Al we want regularisation for some reasons and "as stated" for others. We've got a "BFG" set of tools to use on the database, that we really shouldn't use without general consent. We've got a Wiki where we can dump all our thoughts. Yes, we have to use the wiki a bit more wisely so that people can FIND those thoughts, but basically I find that if we work around the database limitations TOO much we end up with something that only works on the web. In which case, go post it on Wikipedia or a specialist site, you've lost the "database" view. Organising the ISFDB Wiki so that we have a wonderful website looks good. Concentrating on that TOO much makes it web-only. BLongley 22:32, 5 September 2008 (UTC)
Frankly as far as I'm concerned, the ISFDB basically is web only. That is why it's name begins with Internet. Yes it is possible to download a copy of the database, but that is only a tool for testing and doing queries hard to do on the main web interface -- and if they are wanted often, they should be added to that interface when possible. The Wiki is, IMO just as much a part of the ISFDB as the database is.
Also frankly, i don't for an instant believe that most of the publisher names have been "so carefully entered" and i honestly doubt that there is any value whatsoever in the more minute variations now existing in the database. The majority of the publisher name fields, after all, were initially filled by import from amazon or from various secondary sources. Even when a publication record has been primary verified, I am not at all convinced that in the majority of cases the editor corrects a basically accurate publisher name to precisely match the form stated in the publication. If the earlier, secondary-source-derived record said "Methuen" would most primary verifiers carefully correct this to "Methuen & Co Ltd"? I rather doubt it. Indeed would there be any value to their having done so? -DES Talk 22:43, 5 September 2008 (UTC)

Methuen Merge - A test case discussion

I see these obvious merge candidates in this publisher.

I personally recommend "Methuen & Co." as the 'common part of both verified variants so far. Anyone indexing a book with 'Methuen', 'Methuen & Co.', 'Methuen & Co. Ltd', and other generic 'Methuen Corp, Co, Comp, Company, etc etc' will likely gravitate to this publisher choice (if they look up possible publishers first). (But I also don't have a problem with :UK, or :London or something similar appended to the name because of the Australian Methuen variant.)Kevin 22:28, 5 September 2008 (UTC)

We have documented in the Wiki that this publisher is in London, and we know that London is in the UK. If we further state "The following Publisher Variant names should be indexed here" then any later input under previously merged names can be remerged. We have also documented (With online viewable copies) Both "& Co." and "& Co. Ltd." If we document in the wiki that both existed (and continue to fill in the verified/confirmed data table to fill in missing years) then I don't have a problem merging these mild corporate variants. We should ask that if anyone finds something named in such a way that it doesn't fit with previously verified data, they should post it to the Wiki with a note about that strangeness.Kevin 22:28, 5 September 2008 (UTC)

We have not documented/verified anything about the multiple various Methuen 'imprints' and other subnames, and they should not be merged at this time. Thoughts?Kevin 22:28, 5 September 2008 (UTC)

If I were working on the various "Methuen" publisher records, the first thing I would do is some outside research into the publisher. I would, for one thing, try to determine whether or not it had ever had a non-UK branch or existence. If it had not, then "Methuen " and "Methuen (UK)" are redundant and should be merged, if it had, then they aren't and shouldn't. I would also determine which name forms could be decimated as having been used by the firm, and during which periods. If as a result of that research, I determined that all 4 of the forms you list above (well 3 of the 4) should remain distinct, i would in any case remove the "London:" from the names of the first two, putting it in the publisher notes (and probably also in a publisher wiki page). I would add (UK) whenever that was needed for disambiguation. Since this would leave no difference between the 1st and 3rd forms, those would be merged in any case. Note this is all in the subjunctive -- I am not planning to work on Methuen at this time. But that is how I would approach the matter. Imprints would be trickier, and i would incline to leave them unmerged, except for ones with names differing only in the presence of city names, or of terms such as "Ltd", where if outside research did not indicate a significant difference of actual organizational identity, i would probably merge. -DES Talk 22:55, 5 September 2008 (UTC)
I'll also point out that the need for a (UK) variant may be isolated in time. If you could determine when the australian office opened... then let Methuen be acceptable for pre 1980 data (as an example) and have Methuen (UK) be the canonical name after 1980. I point out that the (UK) version of this publisher has no datapoints prior to that.Kevin 04:16, 6 September 2008 (UTC)
My current thoughts are that Wiki pages are a really LOUSY form of discussion, as my long-considered response to one issue is now in a totally wrong area.:-( (Yes, I'm currently pissed off about Wiki in general - bear with me, I AM trying to add serious comments where needed.) BLongley 23:32, 5 September 2008 (UTC)
No Problem Bill - Actually what we need for complex discussion is a threaded messaging system like usenet (Or google groups as a modern day successor), unfortunately this is what we have. I started this subsection as a place to limit discussion to one example, so we could identify the issues we all have with this one example. As far as I am concerned, you are welcome to copy or move your comments about this example to this area. Kevin 01:20, 6 September 2008 (UTC)
I'd go for "Methuen" in this case. Yes, there's a need for disambiguation with some publishers - and the "London" list for Methuen is likely to be different from the Australian one. The "& Co", "& Co. Ltd." adds nothing useful. "and Co" or "and Co. Ltd." doesn't either. And "Limited" rather than "Ltd" adds nothing. I really want "shortest name that can be used while still being clear". Nobody's going to add an "Inc" or "Gmbh" or "Ltd" or "PLC" or "LLC" if they think it's clear enough already. We should guide people to add a suffix IF we think it's useful disambiguation though - I have no idea what "Pty" means but I avoid messing with those, which is probably a good start. BLongley 23:32, 5 September 2008 (UTC)
I'm wondering about "shortest name that can be used while still being clear". Why not the opposite "The longest name using the most common elements of all variant names"? It's not like data is expensive to store... it's just a table lookup... and it's not like we can't cut-n-paste it into the form, and regular editors will have major publishers in the auto-fill list.Kevin 01:26, 6 September 2008 (UTC)
I rahter agree with bill here that "& Co" and "Ltd" and the like generally add nothing, adn that in general i prefer a shoreter form to a longer, provided that the shorter form is clear, and not ambigious. When a firm ahs invariably used a longer form ther is some arguemte for retaining it for tradidion and clear identification. For example i would be inclined to retain "Harper & Row" even if ther wern't the need to distinguish other varients on "Harper". That is why assignign a cannonical name will ultimately need to be done case by case. But in genereal incluiding things like "Ltd" or "inc" simply adds more keystrokes to data entry and takes up more screen estate on diusplay, to no particular good, unless the prsence or absence of such is a consistant relaibe indiactor of date or firm. At least that is my view. -DES Talk 07:16, 6 September 2008 (UTC)
My only problem with longer names is that on the title display where we list all the publications the display looks better with short names. Perhaps the publisher database needs to have both long and short forms. Marc Kupper (talk) 03:53, 6 September 2008 (UTC)
Ummmm looks better in what resolution, with what system fonts installed, and what size fonts selected as 'normal', in what browser, with which operating system, with LCD or CRT monitor, what size monitor, what distance from the viewer to the screen, and with or without reading glasses? I admit I'm being an ass with some of those questions, but they are honestly all part of the discussion of 'how something looks'. I think the argument that it 'looks better' if there are 7 fewer ascii characters on a screen is not relevent to the question of what we store in the database. Please see 197177 and MXCRRDSZFR1914. This title and pub is published by Methuen, with City, with '& Co.' and with 'ltd.'. Both pages appear to function properly and in the title bibliography if I make the browser window skinnier the line wraps properly. What exactly about it doesn't look good to you?
I dilike any publisher name with a prefixed city. For oen thing, that isn't the actual name of the firm. For another, when city names are included in actual publications, they generally aren't prefixed but postfixed, and generally on a subsequent line, as if to indicate that they are not part of the publisher name. For a third thing, they make publishers sort on the city name rather than the key element of the publishwer name which is IMO a Bad Thing. For yet another, they are often merely historical relics for modern publishers, that are apt to have many offices and many productions sites that are part of the same entity. When I see a publisher name like "London:Methuen" what I firat think is "Data entered from library catalog, details not reliable". -DES Talk 07:16, 6 September 2008 (UTC)
You have provided a valid argument against prefixing the cityname and I agree. No more prefixing of citynames for me.Kevin 15:13, 6 September 2008 (UTC)
I suspect in the case of titles that Methuen selects the length of the name used does not matter as you will only see one or two printings. When a title has 50 reprints by multiple publishers I've had trouble visually filtering through the results at times. It's probably not big a deal because where there are 50 pubs I often copy/paste the list into a spreadsheet and parse it into into columns as usually I'm doing this to document all of the Bantam printings for example. Marc Kupper (talk) 07:14, 6 September 2008 (UTC)
If you have a problem "visually filtering through the results at times", then that is a limitation of the database display software. Perhaps the 'concise' view could be modded to display only the first 2 or so words of the publisher name. But again, that's a software limitation, and you are arguing that the database accuracy should be reduced because (paraphrased) 'it's hard to read sometimes'. That is equivalent to arguing that 640k is enough RAM for anyone, 2GB is the biggest harddrive we ever need to worry about; a decision made by someone else, in order to make it easier/simpler for management. Kevin 15:13, 6 September 2008 (UTC)
Personally I really like the idea of a (Future) upgrade to the web system allowing, 'Display as table' for easy web copying of data, and 'download as spreadsheet' for easy web extraction of large datasets. Kevin 15:13, 6 September 2008 (UTC)
I've actually verified a lot of Methuen pubs, and imprints of such like "Magnum" too. If we can agree on a target publisher and imprints and just go add all the data we can, we might get some idea of who wants what data. And yes, feel free to ask me why "Methuen Drama" should be separated, and other awkward questions - I'd rather we all talked a bit and discussed some general aims rather than annoy editors with Mod overrides and suchlike. BLongley 23:32, 5 September 2008 (UTC)
I'm willing enough to talk -- until this discussion started up no one seemed to be talking about this issue. -DES Talk 07:16, 6 September 2008 (UTC)
I'm glad people are talking - the publisher improvements in the database seemed to go mostly unremarked, increased activity in the Wiki went unnoticed, I'm not sure what sparked all this. Of course, it may lead nowhere but we should extract some agreements from this and make them desired goals (I hate to make them "Rules", but "preferred entry styles" in the help would be a start - and removing City Name prefixes would be a very good start, IMO.) BLongley 21:37, 6 September 2008 (UTC)
Still, a test case would be good: Methuen may not be the best example as Kevin's examples don't match with anything I own, and I really want the most argumentative people here to work with similar books and present their different conclusions as to what was worth recording and what wasn't, and how it should be recorded, and linked, etc. We've got to the stage where I can FIND a publisher here and extract the books I own by that publisher, so I could work with Bantam and Corgi (there's a relationship there we're not clear about) or Ballantine (did have a UK version for a while) or we could go for the whole Robert Maxwell 1990s nightmare (not recommended). Something multi-national, several imprints, not too recent or they'll all be "Scholastic" or "Random House" or "Hachette Livre" in the end - suggestions welcome. BLongley 21:37, 6 September 2008 (UTC)

Publisher Selector

I'd love to see the Publisher field for publisher records have some sort of select thing so that people could choose/enter exactly what's stated on the title page. Looking at the stack that's right at hand I found these

  • Ace Books, Inc.
    1120 Avenue of the Americas
    New York, N.Y. 10036
  • Ace Books
    A Division of Charter Communications Inc.
    1120 Avenue of the Americas
    New York, N.Y. 10036
  • Ace Books, a division of
    Charter Communications Inc.
    A Grosset & Dunlap Company
    51 Madison Ave, New York. N.Y. 10010.
  • Ballantine Books • New York
  • (Corgi logo)
    Corgi Books
  • Collier Books
    Macmillan Publishing Company
    New York
    Maxwell Macmillon Internalional
    New York   Oxford   Singapore   Sydny
  • DAW Books, Inc.
    Donald A. Wollheim, Publisher
    1301 Avenue of the Americas
    New York, N.Y. 10019
  • (Berkley logo)
    A Berkley Medallion Book
    Published by
    Berkley Publishing Corporation
  • Paperback Library
    New York
  • Paperback Library, Inc.
    New York
  • Sphere Books Limited
    30/32 Gray's Inn Road, WC1X 8JL

Marc Kupper (talk) 04:35, 5 September 2008 (UTC)

Merging USA and Canada, and other countries

Bill wrote in ramble #1 ""I've merged the UK, US and Canadian versions of this as the price should be clear enough" (something I've been very tempted to do with recent titles)." I'm not sure who Bill is quoting and so can't credit the first part.

The original comment wasn't a specific quote from anyone - I try and avoid naming people when I'm moaning, but obviously some complaints are going to be fairly obvious as to who I think is responsible (ISFDB Software Bug? Must be Al, although he might be able to blame Roglo a bit now. Design Fault from ISFDB1? Al or Ahasuerus are probably the only ones around to listen. Faulty Old Help? Probably Mike Christie. Who messed up the publishers with "(first printing)" suffixes and such? ... ;-) I don't moan to assign blame, I moan when it's messing with my ability to work here and whoever is active here NOW can help fix things. BLongley 21:28, 5 September 2008 (UTC)
If you ever meet me in person you'll find I disregard moaning, howls, screams, noxious smells, etc. coming from the cellar. :-) Marc Kupper (talk) 00:49, 6 September 2008 (UTC)
I like seeing the (country) suffix as it raises awareness that people need to pay attention to where a book is printed and it makes it easier to search for these and to spot them in the list of publications for a title. For years when DAW printed in both the USA and Canada the books were identical other than the price was different and at the bottom of the copyright page it would say "Printed in Canada" or "Printed in the U.S.A." The Canadian printings often had something about New American Library on the title page either immediately over or underneath the DAW Books stuff. Marc Kupper (talk) 04:51, 5 September 2008 (UTC)
I appreciate that country of origin may be of interest, and useful for separating different real publications, but it's currently undefined and pretty much noise. Even if I assume that all publishers without a suffix are to be treated as "(US)", can I assume that that's where they were physically printed? I doubt it - and the more recent the publication, the bigger the doubt. My comment "(something I've been very tempted to do with recent titles)" is because I've seen Dissembler-added data from Amazon US for publications that are/will be sourced from the UK, with Amazon US prices just being currency-conversions rather than true printed prices. Those I'm always tempted to merge (if the correct original publication data is available too) and put the "canonical" price ("first listed", IMO) on them with the rest in notes. If you want to go down to Country of Printing, rather than Country of Originating Publisher, then we'll have an even bigger mess on our hands: I've seen the same book, same publisher, same cover, same printing number, same prices, printed by two different Scottish printers and one German one. I don't want to go down to that level of detail. When it truly distinguishes something useful ("has this edition had "color" corrected to "colour" for the UK market?") it might be worth recording. But such problems are mostly for current publications, with all the globalisation issues. (I'm avoiding such.) I'm reworking a lot of my original verifications now (adding cover images mostly) but I'm adding pricing data too. This is because I do NOT want eight editions of the same book entered just because it has a British price first and seven other prices listed - if I don't list them then an Australian editor might think he's got a different edition, a South African might enter another, a New Zealander another, etc. I guess I'm calling for better pricing support, but that's a minor point if the countries represented here by editors aren't as confused as I am - after all, I'm in the originating country and have no idea how confusing this all is elsewhere.... Oh well, I'm off-topic again. BLongley 21:28, 5 September 2008 (UTC)
On this, I tend to agree with Bill. I don't much want to see country suffixes on publications unless there is no other way to distinguish actually and usefully different pubs. On publishers i want to see them only when needed for disambiguation. "Harper & Row" was a US publisher, there was no English or other publisher by that name AFAIK. Thus there is no need for "Harper & Row (US)". "HarperCollins" does operate in multiple countries, and as I understand it, as basically separate entities in at least some of them. Thus "HarperCollins (UK)" and "HarperCollins (Australia)" are probably needed. I also agree that a single book, published with multiple prices for multiple countries, does not need multiple publication entries here. -DES Talk 23:03, 5 September 2008 (UTC)
Bill, if you see someone doing something that you don't understand rather than dismissing it with with "currently undefined and pretty much noise" why not ask the people involved with that practice? Marc Kupper (talk) 00:49, 6 September 2008 (UTC)
I will, and do, if I see something being done I don't understand, but I'm talking about something ALREADY done and it boils down to "Because I can't FIND the people involved with that practice". Even if you've been diligently following some rule about DAW, it hasn't been documented and I can't tell whether any particular publication was entered by you. (OK, you DO have a very distinctive style, but cloning means your style of notes are adopted by other editors.) The help hasn't covered this before (and still doesn't) so there's no real assurance that any particular publication is entered with "correct" publisher info, whenever/if ever we define "correct". If there's good cause to suspect some data is better in certain cases then that probably needs to be taken into consideration. For instance, "Verified by Marc Kupper" for a DAW book probably means good data, and pretty complete. Unverified, and with an exact day of publication, and an image URL with ZZZZZZZZ in, probably means Dissembler data from Amazon. I stand by my assertion that it's undefined. I'll also stand by the remark "pretty much noise" as we've had to do so much work cleaning it all up just from typo and copy and paste errors - Amazon import errors are a big pain too, and will be an ongoing one in the short-term until we get Dissembler improved. Library data too. At the moment I trust publisher data to almost always contain a nugget of truth, but it takes my own knowledge to find which bit that is, and find the "important" info elsewhere - and importance is relative. DAW is actually a simple case compared to the publishers with non-SF titles too, that have changed hands far more times. BLongley 20:56, 6 September 2008 (UTC)
With DAW a 1st U.S.A. printing will be $3.99 and a 1st Canadian printing will be $4.50. We already have had confusion several times when someone missed the "Printed in Canada" notice and we had a collision of two people with a 1st but at different prices. Thus I've been using "DAW Books (Canada)" as the Canadian editions have their own printing history and set of prices. The discussion did make me realized that I should set up a wiki page that explains why I use the publisher name DAW Books (Canada). Marc Kupper (talk) 00:49, 6 September 2008 (UTC)
Good work - it still won't fix past entries though, unless we reverify everything already done to make sure it's verified to the new standard. (And do we want a standard for each publisher, or just ONE we can put in help?) I believe Al was/is looking into edit histories though, in which case we might spot what was changed after a verification and make our own decision on data reliability. 20:56, 6 September 2008 (UTC)
BTW - one reason I really like using the publisher name to designate something like DAW Books (Canada) is that it's easy to get a list of when that name was used. Assuming that an advanced publisher search for DAW Books with a price of C did not crash it would still be much slower and have a much harder to view result set.
Bill - when I have a publication with multiple prices I enter the one that seems to be first in the Price field and document the price block in the notes. Hopefully someone in Australia, Malta, etc. will view the publication record, at least in anticipation of cloning it, and see that I've already documented it. Marc Kupper (talk) 07:03, 6 September 2008 (UTC)
That's pretty much what I'm doing now, in the hope it helps, after seeing some Australian works entered that seemed to be the same as my own. But there's still the thorny issues of some books not just having different prices for different markets, but also listing different publishers and different ISBNs. I'm not sure where to go with those (two entries here, or one with LOTS of notes? I suspect that as ISBN is a common search we'll need separate entries for exactly the same book, which looks bad till we cross-reference better.) BLongley 20:56, 6 September 2008 (UTC)
Still, having stirred things up I think I'll just take a back-seat for a bit and see what ensues. If anyone wants me to add data points for an example publisher feel free to ask if I have some - just don't make it Pocket please. I change my mind about them with every new example I acquire. :-/ BLongley 20:56, 6 September 2008 (UTC)

Publisher names- Merging(methods and thoughts)

Here's how I treated Gulliver Books when I merged the three different names(Gulliver, Gulliver Books & Gulliver Books Paperbacks)[1]. All of the info comes from the copyright pages, covers, Amazon and sellers on line. My method is to use the cover and copyright page and then look at Amazon and then check on Abebooks or some of the other online book sites, if I can build a history of the publisher or imprint I'll add the data to notes with links to Wikipedia or their current web site. So far I've done over 235 merges(plus 765 updates) and over 90% of the time the parent name I've used is the one that has been verified by most of the current mods. The variant names by and large have had no verifiers over 95% of the time. The 5% that are verifed I either leave the name as is for later or I will ask the verifier on their talk page if their still active or responding. From my point of view most of these variant names are from Amazon which is inconsistent , I've found several cases where they've used the distributors name for 5 different imprints out of the UK. Amazon UK lists the proper imprint/publisher while the US site lists the distributor. The other problem is a lot of the early data in the database is from book sellers on the web, some cut and paste from Amazon and each other, the rest are a mixed bag, some are professional and have good data right from the book and at the other end of the spectrum some dealers have minimal or crap data. For names that I can't resolve I generally leave them as I find them and only add any info I've found into notes.Kraang 02:05, 6 September 2008 (UTC)

By "a lot of the early data in the database" do you mean data that has been in the database longer, or data that is from earlier publication dates (1700's, 1800's)? If it's older publications that you've been having problems with... I HIGHLY recommend Google Books and The Internet Archive. Kevin 04:10, 6 September 2008 (UTC)
No, I meant data that had ben in the ISFDB longer, that was enterd when entry standards were less well defined than they are now. Alomst none of this came from teh iA, although a good deal of it did come from actual books. but a good deal moare came from secondary sources such as Locus, Clute, Tuck, and Amazon, and a fair amount from OCLC and LOC records. Several of those sources are known to abbreviate publisher names, and in soem cases assign misleading city designators. Data from such sources will be relaible as to whehte it is Baen or Tor, but not as to teh precise form of the publisher's name used in the actual book. Data entered fropm a physical book could, of course, be accurate on the precise form of publisher name used, but I strongly suspect that it often isn't. Note that the help now says "The publisher has in the past not been a key entity in the ISFDB, but publisher and imprint support is in the process of being improved, and a process of determining canonical names for publishers and imprints is in progress. For the time being you are free to choose an imprint ("Ace Books"), a division ("Berkley") or the parent corporation ("Penguin Group USA") as you wish". Untiel recently is said "is not a key entity", thus infoming editors that precision here was not vital. -DES Talk 07:27, 6 September 2008 (UTC)
The Internet Archive derives it's book data from actual page scans and always lets you see the actual pages of the work. It also allows searching by Publisher name (Which have so far been accurate to the printed TP). The one downside of TIA is that it is primarily limited to works published prior to 1923, for North American library sources, but there is a Library in India that has been uploading scans of stuff at least through the 1950's. Even if the work you are researching isn't in the archive, you can often get an idea of what the publisher was calling itself in that decade or even the same year. TIA is scanning something on the order of 1000 books a day.Kevin 04:10, 6 September 2008 (UTC)
Of course the next thing I loked up abbreviated Methuen as "London: Methuen" and I clicked the link and found 1859 publications under that publisher name. So it's not as accurate as I thought.Kevin 04:55, 6 September 2008 (UTC)
Google books also derives it's 'older' data from actual page scans. Items published in the last 15 years appears it uses data provided from the publisher. Older items have publisher data entered from page scans. For items published prior to 1923, they 'usually' have the page scans available for viewing. They also have alot of works entered from catalog data. For stuff they have scanned, but that is after 1923, you can search inside the book and get the text for a line above and below. (As an example, they have Bleiler's 1948 Checklist of Fantastic Fiction scanned, but you can't get page images because it's still in copyright.Kevin 04:10, 6 September 2008 (UTC)
I don't think amny ISFDB records to date have been based on google books scans. No doubt more will be in future. -DES Talk 07:27, 6 September 2008 (UTC)
I am now (but wasn't at first so there are some I need to still sweep up) tagging books I've entered from scanned pages as a primary source as Google Books and Internet Archive

Publishing groups and imprints

I'm doing an experiment when verifying my next few publications of using them to identify and document the actual publishing group names, publisher names, imprints, street addresses, etc,. An area where I'm undecided is where to document the raw data that's stated in the publication. For now I'm using the imprint's page though I'm thinking of shifting it into the publication notes. Obviously, anyone's willing to add data to the wiki pages/categories but I'd prefer that any additions to the new categories be based strictly on what's stated in publications and that every gets sourced. The idea is to keep the signal level as high, and verifiable, as possible.

  • Category:Publishing Groups would just be companies identified as publishing groups. I'm undecided if things such as Jim Baen's "Publishing group" that only owns one publisher that only uses one imprint qualifies.
  • Category:Publishers This is the existing category and has a fair amount of noise. I'm not too concerned about that though am thinking of a new category that would contain bona-fide publishers that would be "high signal" as based based on physically verifiable sources.
  • Category:Imprints would be imprints. Deciding if something is an imprint is tricky at times and so ideally we document and source what's stated in a publication that seems to indicate a name is an imprint. I just looked at a DAW book and there's a rather vague "DAW Books is distributed by Penguin Group (USA)". Marc Kupper (talk) 07:47, 6 September 2008 (UTC)
Well, the next book I picked up contains this beast of a copyright page. At the moment I have no idea how I want to file this in terms of publishing group pages though am leaning towards a single page titled Publisher:Penguin Books Ltd. to document the state of the empire. I'm off to a library book sale. :-) Marc Kupper (talk) 15:59, 6 September 2008 (UTC)
Who says you can't have Publishing Groups owning Publishing Groups? (fake links added for effect)
  • Imprint: Berkley Sensation; Wiki page indicates is an imprint of Publisher:Berkley Publishing Group; ISFDB entered as Berkley Sensation (If we are going to admit that we mix Publishers and Imprints in the Publisher datafield)
  • Publishing Group: Berkley Publishing Group; Wiki page will indicate it is owned by Publisher:Penguin Group (USA)
  • Publishing Group: Penguin Group (USA); Wikipage will indicate it is owned by Publisher:Penguin Books Ltd (Worldwide)
  • Publishing Group: Penguin Books Ltd (Worldwide); Wiki page will indicate that this is a parent Publisher of numerous Publishing Groups (insert list), and that the top parent company is at 80 Strand, London WC2R 0RL, England.
Have fun at the Library Sale!! Kevin 16:29, 6 September 2008 (UTC)
The library sale netted the usual full bag of books though this time I got a lot of hardcover anthologies and fewer paperbacks.
The hierarchy makes sense. I see you manufactured a new name Penguin Books Ltd (Worldwide) that does not seem to be used by Penguin. I'm undecided on if that's a good thing as while it's plainly descriptive I had been thinking of sticking to names "as stated" by the publishers. I'm going to try to separate the mix of publishers and imprints. Obviously if a name is both that we'd add it to both categories. I'm thinking now that imprints should not be indiscriminately mixed in the publisher's category. Obviously, some names such as Penguin Books will be an imprint, publisher, publishing group, and the owner of publishing groups meaning it'll end up in all the categories we could devise. Marc Kupper (talk) 18:59, 6 September 2008 (UTC)
The only thing I really intended to 'manufacture' was the (Worldwide) addition. It seemed clear to me that there were names and addresses of various sister corporate entities.. and below them all was "Penguin Books Ltd." with no parenthetical country notation, and the statement 'Registered offices'. This seemed clear to me that this line (noting that it was separated) applied to everything above, and did not apply to a particular country.Kevin 21:30, 6 September 2008 (UTC)
Penguin Group (USA) just goes under "Penguin Group", see http://www.penguin.com/ for the top level - there appear to be nine geographical groups under that. I'm not sure "Penguin Books Ltd" exists anymore - "Penguin Group UK" seems to be the current name. And even Penguin Group (no suffix) is not really the top parent company as they're owned by Pearson - some of the Nonfiction we have goes under their Education section. Why do I keep wanting to draw diagrams? :-/ BLongley 19:32, 8 September 2008 (UTC)
Still, the hierarchy makes it look simpler than some of my books where most of the copyright page is Penguin companies - the average seems to be five in my books, but there's been well over a dozen in some cases. BLongley 19:32, 8 September 2008 (UTC)
I'm still experimenting and am undecided how to handle conflicts such as a Penguin book that only lists a couple of the companies. Does that mean that at the time the other companies did not exist or for whatever reason they did not get mentioned? I'm also mulling over how to handle references to the source documents. For now I'm copy/pasting it into every page but just realized one way to centralize this is to use the Publication namespace. Marc Kupper (talk) 05:46, 13 September 2008 (UTC)


and oddities. For example, I just verified an "Ace" book that stated "Ace Science Fiction Books." Either no one ever used that as an ISFDB publisher name before or someone merged it back into Ace. At the time, June 1983, it was an imprint of Charter Communications, Inc..