Talk:Publisher Catalogs and Print Series

From ISFDB
Jump to navigation Jump to search

Publisher Namespace

I noticed that the first two "example" publisher pages are using a "flat" namespace model.

A suggestion I'd like to make is to put all of these under a "Publisher" namespace so that the pages are

I actually would like to see namespaces used but believe that requires Al's input. For now we can pretend we have namespaces by prefixing the publisher names with "Publisher:". The advantage of namespaces is from Special:Allpages you could select "Publishers" "Authors" etc. and it would pull up a list of those pages. I'm not sure what happens if we start pretending to have a namespace and later add a real namespace. Marc Kupper (talk) 15:06, 5 Feb 2008 (CST)

Publisher Names

Should there be a standard on publisher names? For example, is it

  • Doubleday
  • Doubleday and Co.
  • Doubleday & Co.
  • Doubleday & Company
  • Doubleday & Company, Inc.
  • Doubleday & Company, Inc., Garden City, New York

My personal vote is for "Doubleday" but then at the top of the page we'd day that the publications state on the title page

Doubleday & Company, Inc.
Garden City, New York

If there were minor changes to the formal name over the years then those could be documented too and they would all be under "Doubleday." Marc Kupper (talk) 15:06, 5 Feb 2008 (CST)

I actually laughed at loud at this suggestion. (Sorry Marc - if it's any consolation it HURT - I'm only up this late as I was at the dentist today and the medication is finally wearing off and it actually hurts to even smile.) I frankly think we CANNOT make it this simple without compromising any sort of bibliographical respect on publisher and imprint names we have. Only today I commented here about how some publishers are now just imprints - "Doubleday" now should probably be listed as "Doubleday, an imprint of Transworld Publishers" - maybe "part of the Random House Group", or "Owned by Bertelsmann". BLongley 18:38, 5 Feb 2008 (CST)
I WOULD, however, like us to record Imprints and Publishers, on the understanding that Imprints change hands, and so do Publishers, and there are now Giant Media companies behind both, and they have Groups as well... I have some hope that we MIGHT be able to sort SOME things out, but frankly we're going to at least need date ranges for when a publisher was a publisher, or an imprint: where and when an imprint was owned and/or used: and I'm sure some British conventions will screw up anything we try anyway even for the past times when things were so simple. Well, it hasn't been simple since the 1970's at least - Bantam and Corgi had a special relationship of some sort even back then. BLongley 18:38, 5 Feb 2008 (CST)
I wonder if Marc realizes that creating "canonical names" and/or a "publishers directory" for publishers is on Al's list of things to do? I am not sure how he is going to attack this can of worms (see Bill's comments above), but it would be probably best to ask him about his plans before starting anything :) Ahasuerus 22:56, 5 Feb 2008 (CST)
While I'm aware of the conglomerates and the the various complicated relationships between publishers the average publication has a fairly easy to read publisher name. You are going to look at the title or copyright page and see that it's a Bantam, Signet, DAW, etc. "Del Rey / Ballantine" can be a puzzle and a few of the imprints have taken on what look like independent lives. FWIW - Many of the "Publisher" fields in ISFDB's publication records contain just a single name too.
What started this thread was I e-mailed a book seller about puzzling statement he had made about a Doubleday book club edition. He wrote back explaining how the printing date is encoded. I wanted to document this on ISFDB, found the Publishers page, and realized that before diving in and adding a page about Doubleday that it would be better to establish a better foundation for the publisher related things. Marc Kupper (talk) 23:09, 5 Feb 2008 (CST)
We can certainly do something with the Wiki (I've already started keeping notes on some British publishers as I find out more about them) and I don't object to a Doubleday page being started. It's how to link all the Imprints with Publishers - e.g. I could index "Corgi 1968-2005" and "Corgi 2005-" and link to a standalone Corgi page and a Transworld page respectively: or we could start with all the top-level companies and list the imprints they own and cross-reference to previous owners. For now I'm just gathering data and waiting to see what Al has in mind - I suspect he'll design in some links from ISFDB to the Wiki, so matching the two at the design stage could help. If people want to start dumping the data somewhere for now, that's fine: we can at least comment on what we should be gathering, e.g. I'm interested in ownership and imprints and ISBN ranges used, but these are all qualified by time. BLongley 13:15, 6 Feb 2008 (CST)

Wiki Conversion Notes

This article is a Wiki conversion of printseries.html

Most links point to non existent wiki pages, but you can get the content that should be there from the above link.

As i started converting some of these pages, i encountered two big issues...

  1. should these pages really be in the wiki, or should they be powered by the DB? (at the moment search by publisher doesn't seem to work)
  2. some of these publisher lists are too big for the wiki... when i tried converting "Orbit" the wiki warned me that many browsers would have dificulty editing and i should break it up, when i tried converting DAW, after my browser had submitted all the data, the wiki churned for about 10 (more) minutes before my browser finally timed out.


That said...

Assuming you use this URL to convert pages to wiki syntax, the following perl script is handy for cleaning up the publisher listing pages. Gnome Press and Fantasy Press are good examples


 #!/bin/perl
 #
 # use http://diberri.dyndns.org/html2wiki.html to convert pub pages,
 # then use this to clean them up
 # :TODO: should have one script that uses HTML::WikiConverter and does it all
 #
 use warnings;
 use strict;
 # slurp it in
 undef $/;
 my $w = <>;
 
 # convert bold years to sub-headings
 $w =~ s/ '''(\d+)'''/\n\n== $1 ==\n/g;
 # get rid of all the horiz rules
 $w =~ s/^\s*?----//mg;
 # any pub link is a bullet
 $w =~ s{(\[http://www.isfdb.org/cgi-bin/pl.cgi)}{\n* $1}mg;
 # some pubs don't have links, just a dash
 $w =~ s{ ?\- (\S)}{\n* $1}mg;
 # trim excess newlines
 $w =~ s/\n{2,}/\n\n/g;
 # kill any remaining single newline (followed by optional whitespace)
 $w =~ s{([^\n])\n ?([^\n])}{$1$2}g;
 # get rid of all the excess whitespace
 $w =~ s/ +/ /sg;
 # any line that still starts with whitespace is bad
 $w =~ s/^ +//mg;
 
 # spit it out
 print $w;

Publisher Naming Standards

OK, I admit I laughed when Marc suggested it but now we've had some time to play around with the new Publisher functionality we're beginning to reduce the number of typos and consolidate some variations. I've mostly been sticking to obvious typos or publishers/imprints I'm familiar with, but there's some obvious rules we could introduce that would reduce the number of variant Publishers a lot further. Here's some questions and some thoughts.

  1. Regularization:
    1. Many publishers have people's names in them. If we apply the same rules to the publishing company we can merge, for instance, "E.P. Dutton" and "E P Dutton" with "E. P. Dutton".
    2. "Co" and "Inc" - should these abbreviations have a period after them or not? If yes, what about when there's both: "Co., Inc." or "Co, Inc."?
    3. Maybe we should just standardise "Company" or "Co.", and "Corp." or "Corporation", and "Limited" or "Ltd."? (Plus all the foreign variations indicating company status.)
    4. Pick one of "&" and "and" and stick with it?
    5. Expand "Pub." or "Pub" to "Publishing" or "Publishers" or whatever, when we know which it is? Or vice versa?
    6. Expand "Pubns" to "Publications"? Or vice versa?
    7. US State Abbreviations, when used to disambiguate publishers - both letters capitalized? With a following period or not? Should we expand them, or contract full state names?
    8. Apostrophes - the different types are causing multiple publisher entries, and they're a pain to fix. Can we decide on one and get Al to do a mass-update via SQL?
  2. Imprints:
    1. Keep the Imprint only and document the owner(s) in the Publisher notes? Or use "Publisher/Imprint" or "Imprint/Publisher"? If so, with spaces around the "/" or not? (I must admit I've been trying to keep imprint only for UK Publications - multiple imprints are going to be a pain whatever we choose though, e.g. the Panther/Grafton/Granada/Triad books, mercilessly hammered into one by Amazon.)
    2. Are overseas publications Imprints or separate Publishers? (I've avoided amalgamating Penguin US and Australia and NZ for instance.)
  3. Printing Number:
    1. I think this is a failed experiment and printing numbers should be removed from Publisher and go back into notes for now.
    2. But I really want Printing Number as a database field - even though it will be a nightmare for the UK publishers - so IF it's coming soon I can wait a bit so we can move Printing Number directly to that field rather than to Notes and back.
  4. City:
    1. I'm not keen on these being included in the Publisher name although I appreciate that's what Library Catalogues often give you. I've regularized/corrected a few "London:" prefixes already as it's obviously been used for "This is a British Publisher" - we should be able to derive that information from the price field anyway (does anyone else put in a currency symbol alone when price is unknown, to help with that?). However, "London:" was/is being used for Publishers nowhere near London, and sometimes not even in England. (Wait till some Scots get here and complain about "London: Collins" for a Glasgow publisher!) Maybe the "Official" library categorisations can be included in the Publisher notes?
    2. Leave them be for foreign - OK, Non-English-speaking - publishers for now.
  5. Publisher and Publication records themselves: What would you like to see?
    1. I'm still mostly in favour of Imprint AND Publisher being available on a Publication, if we can decide which is which - back-populate the Imprint from the Publisher for now, and suggest mass-updates for individual groups.
    2. For Publisher, I'd like to have Parent Publisher for given date ranges. Or if we only have one field on a publication, then that should be the Imprint and we can track ownership of the Imprint via Imprints recorded as Publishers. E.g. Questar was an Imprint of Popular Library owned by someone else, then skipped the Popular Library step to become Questar owned by Warner. And I think Questar was divided into Science Fiction and Fantasy sub-imprints as well?
    3. A definite Start date for a Publisher/Imprint would be good for spotting inaccurate Clones, etc.
    4. A Wikipedia link on the Publisher record could be useful, I find I'm constructing them manually for quite a few of the bigger Publishers.
  6. Trust No-One
    1. We have a lot of contaminated data already if we want to preserve imprints. Amazon are listing CURRENT publisher for a lot of publications that were really published by a prior company or under a different imprint.
    2. Library Data is often only to the Publisher level as well.
  7. ISBN
    1. I've been adding known ISBN prefixes to Publisher notes, and been using them to spot misclassified Pubs. However, it might be good to use them the other way round and start creating an ISBN range to possible Publisher table somewhere?
  8. Field Length
    1. "A Bantam Book; Transworld Publishers, a div. of Random House Aus" is a prime example of people trying to cram too much info into one field, IMO. I'd support a REDUCTION in field size at some point. But feel free to argue for increased size if there can be more actual meaning placed into such. (I'm pretty sure Transworld aren't a subsidiary of an Australian publisher though... they're a bit more Global than that surely? (Not that I think we have any publishers publishing on multiple Globes/Planets...) Or we could extend it for foreign language support - e.g. "Центрполи&" is being stored with each letter taking seven characters (ampersand, hash, 4 digits, semi-colon) and truncated publisher names seem to lead to duplicate entries. BLongley 13:27, 27 Mar 2008 (CDT)
Lots of interesting points, Bill. I have been thinking along similar lines, but you are well ahead of me. I'll have to sleep on it and will comment in a day or two. Ahasuerus 00:23, 27 Mar 2008 (CDT)
As Ahasuerus says, many interesting points. The structure of the list makes it hard to comment in individual items. Overall, it's fine other than for 1.7 I would use city, state as that's nearly always stated in the publication. Marc Kupper (talk) 03:29, 27 Mar 2008 (CDT)
Well, it's more of a brain-dump of my frustrations so far, I don't expect everybody to reply to all of them! Each can be spun off as a separate topic if somebody has a strong opinion. (And I probably will as some of my opinions become firmer.)
As to 1.7 - fine by me, I don't own any publications that this affects. However, people can usefully regularize without actually owning the Pubs, IF the rule is clear. E.g. some publishers are listed with a State of "N.Y." and some with "NY" - I could work on those but not add a definite City. One obvious confusion is "CO" - does it mean Company or Colorado? Does it matter - would people search by State? BLongley 13:27, 27 Mar 2008 (CDT)
Another day, a few more regularizations, a few more publisher websites added... not really satisfying though. Anyone else interested in this topic? As I think we need to bring out the Big Guns on some regularizations that are just too much work for one creature: so more tools or more interest or more activity. BLongley 17:43, 6 Apr 2008 (CDT)
I've been working on publishers as I find them and doing some cleanup. As more tools become available such as merging names and a publishers directory thing can move a little faster. Nine thousand plus publishers is going to take a bit of time to cleanup and regularize.Kraang 20:08, 6 Apr 2008 (CDT)
I agree we need more tools - e.g. I'm sure nobody will object to merging "Fitzhenry & Whiteside Limited", "Fitzhenry & Whiteside Ltd" and "Fitzhenry and Whiteside" - they might disagree on what the final result should be though. This is where we need some indication of what people are working towards, or who's "taking charge" of a particular publisher or imprint. Is it "Ace/Berkley" or "Berkley/Ace" for instance? Would posting the numbers for each variant so we can round them all up for easy regularization help, then Al can just rename the final publisher according to agreed standards? Anyone for a project page? BLongley 15:13, 7 Apr 2008 (CDT)
A note about the city: some publishers, at least in Poland, had subdivisions in multiple cities. Such subdivisions were using their parent company's name, so that 'Warszawa: KAW' and 'Wrocław: KAW' would be two subdivisions (perhaps active at the same time) rather than a single publisher moving between cities. And if you see only the books you have, it is hard to say if there were multiple divisions (i.e. the city is meaningful) or not (i.e. you could add publisher's address to Wiki and drop the city from database). Possibly you could see similar cases in other countries. --Roglo 03:07, 7 Apr 2008 (CDT)
Oh, I'm leaving "Non-English-Language" (and mostly "Non-British") publishers alone as much as possible! (Perhaps I ought to run a check on what proportion of our 9,000 are due to this, if I knew a way.) Current English Language publications are causing similar difficulties on a world-wide scale - e.g. Orbit US really IS a different company/division/organisation from Orbit UK now - but until recently an Orbit book probably was from Orbit UK, and if it had a dollar price it was either converted from pounds or was multiply-priced on the publication. The same book might be from "BBC Books" on Amazon.co.UK but "Random House UK" on Amazon.com. I've even had a trial at UN-merging imprints from the mass of "Scholastic" titles ("Point SF" and "Point Fantasy" looked useful sub-divisions (as does "Point Horror", but that's of little interest to me)) but I think our Publisher data is too corrupted in that case and we'll only be able to separate such from Primary copies or Cover-scans - IF we agree what an imprint actually IS. BLongley 14:58, 7 Apr 2008 (CDT)
Now, that is the question. You have one thing printed on spine, another on the title page, and yet another on the verso of t.p. --Roglo 15:30, 7 Apr 2008 (CDT)
If it was only THAT simple! I've dealt with publications with TWO imprints on the spine, a sub-imprint on the cover, a fuller explanation on title page, and the whole hierarchy of the imprint/company/division/owning company on the copyright page... I'm not going to do it all by myself, and I'm only offering to explain British stuff so far. When there's seven possible "publishers" from one book we have to establish MEANING. For instance, "Panther" means "paperback" to me - "Panther Science Fiction" might be on the cover, but we don't have many non-genre Panther books here and those we do have SHOULD be marked as NONGENRE. "Panther Books Limited" might indicate when it was an independent company, but that seems not to be the case as Granada owned them for a few years before what it said on the book changed... and there are books with "Granada" on the Spine and "A Panther Book" in the copyright page... it's a big mess, WE have to figure out what we want to record though. There is meaning in there somewhere - usually qualified by dates - we just have to find it, share it. I've updated some submissions tonight based on Publisher and Year where format wasn't specified, as I have a gut-feel for when Ace published paperbacks and Gollancz only did hard-covers - sharing THAT knowledge is a good start though. BLongley 16:31, 7 Apr 2008 (CDT)
BTW "Non-English-Language" is risky, as some Polish/Russian/German etc. publishers publish books in English, including SF :) --Roglo 15:30, 7 Apr 2008 (CDT)
ANY sign of it being Non-English, with someone obviously interested in it, makes me leave it alone. For instance, I've left a load of Andre Norton titles in the Moderator queue for someone else (despite approving most of the American/Canadian/British versions from the same source) - there's Danish and German and all sorts of things I've no expertise in. BLongley 16:31, 7 Apr 2008 (CDT)

(Unindent) One thing I started on today before I got distracted was Publishers being credited as Authors. "HarperCollins" don't actually WRITE books as far as I know, so that was an easy spot. "Star Trek" isn't a publisher either - most were "Pocket Books". But Publisher/Author mix-ups are an easy Data-Cleanup, and not TOO onerous so far. BLongley 20:43, 13 April 2008 (UTC)

Working through some familiar publishers/imprints yesterday and today has made me a bit more confident that there is some good value in the Publisher AND ISBN searches. In my case, for spotting lazy clonings which take US pubs into British ISBN ranges: hardcovers attributed to paperback imprints: SF imprints mixed with Non-SF (or at least, not original SF) imprints. I'm sure I've added ISBNs to a lot of pubs usefully, OCLC data at times, deleted a few dups. I've not yet interfered damagingly with anything by an active verifier ( I hope) but I'm beginning to acquire a good idea about the people I want to challenge over the remaining oddities. So go on, take on a publisher yourself and fix it before I come to you with a comment like "you've verified THESE two remaining pubs as published by X whereas I've checked the other two dozen and it's pretty clear it's Y that should be recorded..." BLongley 23:41, 15 April 2008 (UTC)
FWIW, here are my reactions to Bill's numbered suggestion above.
  • I fully agree with 1.1.
  • for 1.2 I would favor periods all round.
  • 1.3 i agree.
  • As to 1.4, if the company in question has a recognized official style as to the use of and vs &, I would say follow that style. Otherwise, i would default all to "and".
  • As to 1.5 & 1.6 i would favor always expanding when the correct form is known.
  • As to 1.7, The modern two-letter postal state codes should always be all UC with no periods, this is standard style. The older abbrevs (like "Penna." or "Okl." or "Mass."

usually have an initial capital and a period. Exceptions include "N.Y." and "N.J.". However, the older forms should probably only be used for older works, and perhaps only if the publisher used them, IMO.

  • As to 1.8 I would favor the straight ASCII single quote ('), but unless we convert on entry, other sorts will keep getting into the db.
  • As to 2.1 i would favor "Imprint / Publisher", with spaces.
  • On 3.1 & 3.2 i agree, but I don't think a printing number field is high enough on the list to justify leaving it in the date -- i now convert these to notes on sight.
  • On 5.1 i tend to agree -- i want both publisher and imprint as publication fields.
  • We now have 5.4.
Hope that is of some help/interest. -DES Talk 20:33, 24 July 2008 (UTC)


Re 2.2 (same publisher name, different countries) - personally, I view them as different publishers rather than different imprints. As would have been noticed, I generally put the country, e.g. Penguin Australia. I have generally been adding Australia even if the Aust'n publisher doesn't have Australia in its name.

BTW: Is it really OK to put the currency by itself in the Price field if there's no printed price? (Not many Australian published books have printed prices these days.)

An exhibit re viewing "same publisher name, different countries" as different publishers is the following, from the Penguin Australia publication of Magic Lessons ...
The copyright page states at the top:

PENGUIN BOOKS
Published by the Penguin Group
Penguin Group Australia

... then further down:

Published by arrangement with Razorbill Books, a member of Penguin Group (USA) Inc.


As to 2.1, if "Imprint, an imprint of Publisher" or "Imprint, imprint of Publisher" aren't favoured, I'd favour "Imprint / Publisher" (not the other way around).

re different versions of an imprint &/or publisher being on the spine, cover, title page, copyright page: I suggest going with

  • what's on the copyright page (title verso) as the basis (including whether there is a space between, say, Harper and Voyager), then
  • adding anything extra (e.g. imprint, if there isn't a version on the copyright page) from the title page next, then
  • from back or front cover, then
  • spine.
    --j_clark 00:09, 25 July 2008 (UTC)
My preference has always been Imprint/Publisher(no spaces) and when it comes to determining the imprint or publishers name I generally defer to the title page.Kraang 01:47, 25 July 2008 (UTC)

Tips for those venturing into Publisher clean-up

Amazon is not always so bad

Yes, they may mess up the publisher, but sometimes the Imprint is in the title. For instance: this pub is listed as being published by "Hodder & Stoughton Ltd". It isn't actually - the title page says "CORONET BOOKS" (newline) "Hodder Fawcett Ltd., London". You're not going to get the entire Hodder hierarchy sorted out via Amazon, but they do at least note the Title as "Report from Group 17 (Coronet Books) (Paperback)" - note that they got the imprint correct, and the format, even if they dropped the leading article from the title. The UK site seems fairly good at recording Imprint this way, for 1960s, 1970s and 1980s publications at least. Not a guarantee by any means, but I've also found it useful for finding Publisher Series like "Corgi SF collector's library": and sometimes an "unk" format pub's actual format is clearly identified as a paperback or hardcover in the title. It SEEMS to work for some US publishers too - e.g. search for "Five Star Science Fiction and Fantasy Series" and you seem to narrow down the possible "Five Star" publishers. When such searches work it's probably worth a note on Publisher Pages - I'll be adding a few myself, though that reminds me that Publication Series don't have much support yet. When it's all contained within a certain imprint it's probably fairly supportable on a publisher page, as I did here for instance. But a series of "World Book Day" titles is going to be harder, where several publishers come together in the cause of greater profitsimproved literacy. BLongley 14:45, 27 Mar 2008 (CDT)

Trade Marks

My most interesting find recently was The UK Intellectual Property Office search engine. I've been having great fun putting in Imprint names and seeing who owns them now. Try "Hodder" for a simple enquiry. Try "Panther" and it's a bit more awkward finding the right one - select "Nice Type" of 16 and that narrows it down to paper products like Books. Still not a unique search, but on Page 2 I can now see where "Harvill Panther" comes from. Something I can play with for ages - are there similar sites for other countries? BLongley 14:45, 27 Mar 2008 (CDT)

Publisher Listings

This UK list and This US list look worth further investigation. Not sure how up-to-date it is though. Any better sites known? BLongley 14:45, 27 Mar 2008 (CDT)

Nice job Marc - Thanks!

I just wanted to leave a little note that the cleaned up page is very nice. Thanks Marc! Kevin 01:01, 5 September 2008 (UTC)