Difference between revisions of "User:Alvonruff/ISFDB2 Notes"

Revision as of 06:43, 28 April 2022

The isfdb2 staging system is a minimal system, with few packages installed, which uses dnf instead of apt-get.

Prerequisites

The staging system a minimum configuration AlmaLinux system, which is a variant of Fedora Core. It's really intended for tight cloud installations, so almost everything is missing, and installation of packages is done with yum/dnf.

dnf install gcc
dnf install make
dnf install tar
dnf install zip.x86_64
dnf install bzip2.x86_64
dnf install wget

Apache

dnf install httpd
firewall-cmd --add-service=http --add-service=https --permanent
service httpd start

MySQL

dnf update
dnf module enable mysql:8.0
dnf install @mysql
systemctl enable mysqld
systemctl start mysqld
Issue: mysql
While in mysql, issue the command: create database isfdb;
While in mysql, issue the command: use isfdb;
While in mysql, issue the command: alter database isfdb character set latin1 collate latin1_swedish_ci;
While in mysql, issue the command: source <<backupfile>>;'
GRANT ALL PRIVILEGES ON isfdb.* TO 'isfdb1'@'localhost';

Python 2.7.18

dnf install python2.x86_64
dnf install python2-devel.x86_64
dnf install mysql-devel.x86_64
pip2 install mysqlclient

Versions

Linux: 4.18.0-240.15.1.el8_3.x86_64 x86_64
Apache: Apache/2.4.37 (AlmaLinux)
MySQL: 8.0.26
Python: 2.7.18

Charset Experiments

I have a python script for generating Wikipedia article stubs from the ISFDB tables in MySQL. It was run on both isfdb.org and isfdb2.org. Running diff on the outputs shows:

4c4
< | name        = Philip José Farmer
---
> | name        = Philip JosÃ© Farmer

If, however, the files are brought up in the vim text editor, they both appear to be correct. If I pull the name string out of each file and run od -X --endian=big STRING_FILE, the results are (with hand annotation):

0000000 5068696c 6970204a 6f73e920 4661726d         Phil   ip J   osé   Farm
0000020 65720a00

0000000 5068696c 6970204a 6f73c3a9 20466172         Phil   ip J   osÃ©   Far
0000020 6d65720a

This generates 2 questions:

Why is the output different on the two systems, and
Why does the file appear correct when viewed inside a vim session?

The answer to the first question is highlighted by the answer to the second question: vim uses utf-8 as it's default charset. I made the following changes to common/isfdb.py:

[1] Altered the html content type output from:

print 'Content-type: text/html; charset=%s\n' % UNICODE

to

print 'Content-type: text/html; charset=%s\n' % "UTF-8"

[2] Altered the meta tag string from:

print '<meta http-equiv="content-type" content="text/html; charset=%s" >' % UNICODE

to:

print '<meta charset="UTF-8"/>'

The output now appears normal. Need to run with this for a while to see if there are any untoward side effects. This also does not answer the question as to why this runs fine on the original isfdb.org