Difference between revisions of "Publisher Catalogs and Print Series"

From ISFDB
Jump to navigation Jump to search
(→‎ISFDB - Publisher Catalogs / Print Series: Use a more general name for the page)
(Move conversion notes to talk)
Line 30: Line 30:
 
** [http://www.isfdb.org/tordoubles.html Tor Doubles]. (1988-1991)
 
** [http://www.isfdb.org/tordoubles.html Tor Doubles]. (1988-1991)
 
* [http://www.isfdb.org/zebra.html Zebra]
 
* [http://www.isfdb.org/zebra.html Zebra]
 
== Wiki Conversion Notes ==
 
 
This is a Wiki conversion of [http://www.isfdb.org/printseries.html printseries.html]
 
 
Most links point to non existent wiki pages, but you can get the content that should be there from the above link.
 
 
As i started converting some of these pages, i encountered two big issues...
 
 
# should these pages really be in the wiki, or should they be powered by the DB? (at the moment search by publisher doesn't seem to work)
 
# some of these publisher lists are too big for the wiki... when i tried converting "Orbit" the wiki warned me that many browsers would have dificulty editing and i should break it up,  when i tried converting DAW, after my browser had submitted all the data, the wiki churned for about 10 (more) minutes before my browser finally timed out.
 
 
 
That said...
 
 
Assuming you use [http://diberri.dyndns.org/html2wiki.html this URL] to convert pages to wiki syntax, the following perl script is handy for cleaning up the publisher listing pages. [[Gnome Press]] and [[Fantasy Press]] are good examples
 
 
<pre>
 
<nowiki>
 
#!/bin/perl
 
#
 
# use http://diberri.dyndns.org/html2wiki.html to convert pub pages,
 
# then use this to clean them up
 
# :TODO: should have one script that uses HTML::WikiConverter and does it all
 
#
 
use warnings;
 
use strict;
 
# slurp it in
 
undef $/;
 
my $w = <>;
 
 
# convert bold years to sub-headings
 
$w =~ s/ '''(\d+)'''/\n\n== $1 ==\n/g;
 
# get rid of all the horiz rules
 
$w =~ s/^\s*?----//mg;
 
# any pub link is a bullet
 
$w =~ s{(\[http://www.isfdb.org/cgi-bin/pl.cgi)}{\n* $1}mg;
 
# some pubs don't have links, just a dash
 
$w =~ s{ ?\- (\S)}{\n* $1}mg;
 
# trim excess newlines
 
$w =~ s/\n{2,}/\n\n/g;
 
# kill any remaining single newline (followed by optional whitespace)
 
$w =~ s{([^\n])\n ?([^\n])}{$1$2}g;
 
# get rid of all the excess whitespace
 
$w =~ s/ +/ /sg;
 
# any line that still starts with whitespace is bad
 
$w =~ s/^ +//mg;
 
 
# spit it out
 
print $w;
 
</nowiki>
 
</pre>
 

Revision as of 01:26, 3 March 2008

Publisher Catalogs / Print Series