ISFDB talk:Invalid characters in Publication titles

From ISFDB
Jump to navigation Jump to search

It looks like nearly all of those are on the list because they use the angled single quote, character code 146 (’), versus apostrophe which is character code 39 ('). The last two are using a bullet, character code 149 (•), in the title. I've fixed all of the titles to use the standard single quote and replaced the bullets with hyphens (-). Both the quote and bullet usually happen when someone copy/pastes a title from Amazon. Marc Kupper (talk) 01:34, 20 Jan 2008 (CST)

Excellent, many thanks! :) Ahasuerus 01:49, 20 Jan 2008 (CST)
While editing Reaper's Gale publication I discovered we have quite a few ’ (=’) characters (in the titles, too: the simple search returned 47 matches). Do we replace them in the title records only or in pubs as well? --Roglo 14:01, 26 April 2008 (UTC)
I think we want to replace them in pubs as well as in titles, but it also raises another issue. Al has some special logic behind the scenes that converts different kinds of apostrophes to the same character, but it was a little shaky in the past. If we fix all currently existing offenders and new ones appear in the next couple of months, then we may still have a software issue. Ahasuerus 22:06, 26 April 2008 (UTC)
I've replaced ’ with ' in most titles but put 2 submissions on hold because they show more changes than the title that was edited and I don't understand why (both sides look the same). Something worth checking later. --Roglo 16:58, 28 April 2008 (UTC)
I believe that the display logic in the approval screen doesn't process apostrophes and some other punctuation characters correctly, so it reports differences even when the "before" and the "after" fields are identical. Al may know more about it since I seem to recall that he struggled with this issue in the past. Ahasuerus 17:24, 28 April 2008 (UTC)
I think something happened during submitting because submissions have the extra field. But I'm fighting the amazing twin brothers Mike O’Driscoll and Mike O'Driscoll (guess where is the difference). --Roglo 19:08, 29 April 2008 (UTC)
In edit/submittitle.cgi, function EvalField uses XMLescape to escape the old value before comparing it with the new value; this is why series or author that has apostrophe is resubmitted: the comparison is between ’ and '. It doesn't really break anything, just looks suspicious (for XMLescape see isfdblib.py). --Roglo 17:36, 30 April 2008 (UTC) (edited --Roglo 17:53, 30 April 2008 (UTC))
And we have 15 series with the right quote used as apostrophe. Anyway, this discussion should be moved somewhere else, as it is no longer about publications and titles (ISFDB:Data_Consistency or Rules_and_standards_discussions?) --Roglo 17:53, 30 April 2008 (UTC)
A search for "’" found 363 matches (limit 100) (the simple title search on our home page). That's the ’ entered directly, not as HTML (the code 146 as above, I believe). So you won't find Acorna's Rebels etc. --Roglo 16:35, 28 April 2008 (UTC)
And a search for "`" found 81 matches, often with it doubled up: e.g. "Introduction to ``Black Country’’". If we're regularizing, should these doubles change to """? BLongley 18:08, 28 April 2008 (UTC)
The "`" is not a quotation mark but I'm not sure how to replace it. Our help states: Symbols and punctuation. Strange symbols should be entered if appropriate typographical characters exist. [ . . . ] Other characters should be entered in Unicode if possible; this includes accented characters, and symbols such as em-dashes. [ . . . ] If you are using a Windows computer, you can use the Windows Character Map to enter unusual characters. (Screen:EditTitle) So I think left and right quotes should be entered as left and right quotes, apostrophes as apostrophes etc. (But not left quote as grave accent, and no HTML for latin1 characters) And searching should be improved by changes in the software. Or is the Help outdated and the regularized titles are preferable? --Roglo 18:44, 29 April 2008 (UTC)
I'd prefer some regularization - but I'd like some of it to be in the software. I'd rather separate punctuation from typography, and if I find I'm tempted to use the Windows character map I'm either entering foreign titles or would rather stick to the basics: e.g. ' and " have been good enough for me all my life and I hate Microsoft products "helpfully" creating opening and closing versions for me. If it wasn't available on my father's typewriter, I probably don't need it either: although I'd probably annoy people (and definitely computers) if I went back to using l and O for 1 and 0. BLongley 20:43, 29 April 2008 (UTC)