What's New from 2003

From ISFDB
Revision as of 13:12, 16 April 2006 by Alvonruff (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

What's New from 2003

What's New - 31 December 2003

  • Well, as often happens, the ISFDB has received little attention over the last six months. This was partially due to distractions with other hobbies, as well as the database being in a messy state since the experiments from this summer. In particular, I worked on a number of tools for automated data extraction from other locations (like Amazon, Z39.50, and used-book sites) - this brought in a lot of new data, as well as a lot of new errors. I started working on some AI/Machine Learning tools to detect and fix the errors auomatically, made some good progress (developing a name grammar linked with census data), went down some dead-ends (Bayesian filtering cannot detect the difference between a first name and a last name), and then lost interest near the end of summer. As a result, I had a huge amount of new data in the SQL database, which I hadn't merged into the main line due to the large number of errors. If I keep waiting until all of the errors have been fixed, I'll never do another update, and it's not like the ISFDB has a reputation for irrefutable data perfection anyway. So, I've made a backup of the current data, and dumped all the new stuff in. Like it or lump it.
  • Exceeded 130,000 titles.
  • Updated forthcoming books.
  • Merged in latest awards update from David.
  • Soundtrack: Queens of the Stone Age - Songs for the Deaf.


What's New - 10 June 2003

  • Have merged in 10,385 sfnal records from Amazon.com since the last update, increasing the number of publication records by 20 percent. All of the new records have ISBNs not currently registered in the ISFDB. Amazon doesn't have a lot of pre-ISBN era data, but it does mean we've got copious amounts of new data trailing back through 1977. As with previous major database merges we've gone through, there's a downside: there are a fantastic number of author name variations, duplicate titles, and numerous miscategorizations of books now present. Okay, not that we haven't always had that problem - it will simply be more noticable than usual for a while until it's cleaned up.
  • The last time I integrated such a large number of records goes back to 97, when I merged in the NESFA database. That particular effort was done manually for a period of time, until the amount of effort required to complete the job compelled the creation of error analysis and merging tools. Things haven't changed much since then. The tools are run manually, and a series of potential problems are emitted - all of which have to be reconciled manually. Faced with a mountain of new records, it seems that now would be an excellent opportunity to take things to the next level. The two major problems in merging in new records has been detecting author and title variations that leads to duplicate entries. It turns out that the problem of detecting similar strings in database records is an area of active research in the fields of Artificial Intelligence and Machine Learning.


What's New - 11 May 2003

  • Franz-Leo Chomse has been working on an ISFDB viewer for home use (on Windows) that doesn't require installing Cygwin, or a web server. It uses precompiled versions of the database and html files. This release supports the viewer in two ways: First the precompiled html files are available as viewer.html.zip. Secondly, there is a new make target that creates precompiled html files, and also creates a local directory with everyhing but the viewer itself. We'll make the viewer available for download soon.
  • Integrated publication data from John DaMassa. John submitted the data by placing it in an Excel spreadsheet. This turns out to be an insanely easy way of integrating data into the ISFDB now that the publications are maintained in a MySQL database. If others would like to submit book information this way, I placed a spreadsheet template with embedded instructions on the home page at isfdb.xls.
  • Brought 81 issues of Science-Fantasy online, primarily from data supplied by Malcolm Farmer.
  • The current rev of the tools that I have been writing to create and run a MySQL database of ISFDB data are now availble for download at sql.tar.gz. This includes the SQL statements to create the database, as well as the Python CGI scripts for accessing and modifying data. This stuff is a work in progress so the layouts and library API's are not set in stone, but it's something to start playing with. If you want to populate the database, you'll need the new frame.tar.gz since the new make targets are in there.
  • In MacLeod's The Star Fraction, a famous bit of freeware called Dissembler serves as a magical data mining tool. Reading about this compelled me to finish my tool that automatically populates the ISFDB with data gathered from Amazon.com. This has added about 1,000 new publications into the database, which represents in increase of about 2 percent. Okay, it's no Dissembler, but it does do a huge chunk of my work for me.


What's New - 26 April 2003

  • I have converted the BOOKS and AUTHORS files into tables in a MySQL database, and all work on those files is now taking place through a web interface to those tables (It's not available for public use). Support tools dump the contents into ISFDB format. This release represents the first generated from that SQL database. The rest of the files will soon follow. I'll release the tools and sql scripts needed to replicate this after I productize things a bit.
  • Integrated a major awards update from David G. Grubbs. Rebuilt the top 100 lists as a result.
  • Integrated Asimov's updates from Mike Cross.
  • Made a major update to forthcoming books.
  • My email address is back on the front page again.
  • The old ISFDB logo was starting to look a little frayed, so I made a new one last weekend. The old logo was the result of a 30 minute session with xpaint about seven years ago. The new one is the result of a 30 minute session with Maya.
  • Reading: Ken MacLeod - The Star Fraction.


What's New - 13 April 2003

  • Integrated magazine updates from Mike Cross, including F&SF, Interzone, and NYRSF. Caught up on Asimov's listings.
  • The ISFDB was verbed by James Patrick Kelly in the May 2003 issue of Asimov's Science Fiction: "I Googled and ISFDBed all the potential candidates..." Doesn't quite have the same ring as Googled.
  • Listings for Ballantine books are now cleaned up through 1973.
  • The data section of the ISFDB Open Data Project is now located here.


What's New - 5 April 2003

  • The new website setup requires building images remotely, and then uploading the results upon completion. The HTML files require post-processing which used to be done at the website, and now have to be done remotely. To facilitate this need, the HTML system was modified to use Makefiles (which is something I've wanted to do for a while anyway). Tweaks following this change are ongoing.
  • Now that the www.isfdb.org is pointed at the TAMU website, I've changed the HTML definitions to point there as well.
  • The tools section of the ISFDB Open Data Project is now located here.
  • Listings for Ballantine books are now complete through 1968.


What's New - 24 March 2003

  • Minor update to patch up some of the HTML pages after the move.
  • Spammers have somehow found my new email address, but the results are somewhat surreal - I'm targeted by fine art spam. Chagall, Magritte, Munch, Moore, van Gogh, Corot. I guess to be truely surreal I need some spam for Dali and Picasso.


What's New - 11 February 2003

  • Thanks to the efforts of Hal W. Hall and Jeff Bachtel at Texas A&M, the ISFDB is back online. Hal Hall is the curator for Texas A&M's Science Fiction and Fantasy collection, and Jeff Bachtel is the administrator for A&M's Digital Library project. Go Aggies!
  • In a truely ironic turn of event, the recent hubbub over our problems with your-site.com somehow compelled someone to try them as a hosting service. Since we were used as a reference, this resulted in a month's free service for the referral!


What's New - 18 January 2003

On Jan 17 2003, your-site.com pulled the plug on ISFDB cgi scripts. This means that database searches are no longer functional. Rationale was that there were too many daily database queries (which exceeded your-site's limit of roughly 3000 per day), and the ISFDB was generating a system load beyond their specified per-account limits.

I think that at this point the ISFDB has reached an awkward point for a non-profit site: it's too large (in size, bandwidth, processes, and system resources) to run at a typical ISP. Renting an allocated server would cost in the neighborhood of $200 a month (a considerable step up from the current $5). Buying a server and colocating it at an ISP is cheaper, but would still run in the neighborhood of $100 a month. In general, sites with low resource needs are very cheap, and sites with high resource needs are very expensive. There isn't a lot of middle ground. Even SFSite is feeling the pinch. They're being required to pay for the bandwidth used, and the ISFDB share for that would have been in the neighborhood of $80 a month. Hence our original move away from SFSite.

The point isn't that we need donations - the point is that the concept of a large, heavily-used, non-commercial database doesn't fit in well with the way the Internet has evolved. We've tried running at numerous sites over the last 7 years, and all have imposed various restrictions that have eventually prevented the ISFDB from running there.

There are several ways we could address the issue:

  1. We could compile the database into traditional HTML files. There are a couple of downsides to this: the first being that it would generate several hundred megabytes of pages (things currently fit into about 20MB). The second being that searches would not be possible. Thirdly, it really wouldn't be significantly different than the Locus Index.
  2. We move the database completely onto people's home computers. This is possible today, although someone really needs to bulletproof the installation. There are a few downsides to this choice as well: it will take a lot of space on the home system, and downloading database updates will take long periods of time for those without broadband (it used to take me multiple hours to upload database changes when I used to have dial up).
  3. An alternative hybrid would be to put thin clients on home systems, which would contact a real database server. This would not require CGI scripts since that part of the effort is being done at home, and wouldn't require database updates, since that information would be located out on the Internet.

In the meantime, the source code and data to the ISFDB remains available to those who want to establish their own mirrors, or simply install on their home systems.