Difference between revisions of "User:Alvonruff/ISFDB2 Notes"
Line 65: | Line 65: | ||
This generates 2 questions: | This generates 2 questions: | ||
− | # Why is the output different on the two systems, and | + | # Why is the output different on the two systems (python is clearly generating more and different characters), and |
# Why does the file appear correct when viewed inside a vim session? | # Why does the file appear correct when viewed inside a vim session? | ||
Revision as of 06:48, 28 April 2022
The isfdb2 staging system is a minimal system, with few packages installed, which uses dnf instead of apt-get.
Prerequisites
The staging system a minimum configuration AlmaLinux system, which is a variant of Fedora Core. It's really intended for tight cloud installations, so almost everything is missing, and installation of packages is done with yum/dnf.
- dnf install gcc
- dnf install make
- dnf install tar
- dnf install zip.x86_64
- dnf install bzip2.x86_64
- dnf install wget
Apache
- dnf install httpd
- firewall-cmd --add-service=http --add-service=https --permanent
- service httpd start
MySQL
- dnf update
- dnf module enable mysql:8.0
- dnf install @mysql
- systemctl enable mysqld
- systemctl start mysqld
- Issue: mysql
- While in mysql, issue the command: create database isfdb;
- While in mysql, issue the command: use isfdb;
- While in mysql, issue the command: alter database isfdb character set latin1 collate latin1_swedish_ci;
- While in mysql, issue the command: source <<backupfile>>;'
- GRANT ALL PRIVILEGES ON isfdb.* TO 'isfdb1'@'localhost';
Python 2.7.18
- dnf install python2.x86_64
- dnf install python2-devel.x86_64
- dnf install mysql-devel.x86_64
- pip2 install mysqlclient
Versions
- Linux: 4.18.0-240.15.1.el8_3.x86_64 x86_64
- Apache: Apache/2.4.37 (AlmaLinux)
- MySQL: 8.0.26
- Python: 2.7.18
Charset Experiments
I have a python script for generating Wikipedia article stubs from the ISFDB tables in MySQL. It was run on both isfdb.org and isfdb2.org. Running diff on the outputs shows:
4c4 < | name = Philip José Farmer --- > | name = Philip José Farmer
If, however, the files are brought up in the vim text editor, they both appear to be correct. If I pull the name string out of each file and run od -X --endian=big STRING_FILE, the results are (with hand annotation):
0000000 5068696c 6970204a 6f73e920 4661726d Phil ip J osé Farm 0000020 65720a00 0000000 5068696c 6970204a 6f73c3a9 20466172 Phil ip J osé Far 0000020 6d65720a
This generates 2 questions:
- Why is the output different on the two systems (python is clearly generating more and different characters), and
- Why does the file appear correct when viewed inside a vim session?
The answer to the first question is highlighted by the answer to the second question: vim uses utf-8 as it's default charset. I made the following changes to common/isfdb.py:
[1] Altered the html content type output from:
print 'Content-type: text/html; charset=%s\n' % UNICODE
to
print 'Content-type: text/html; charset=%s\n' % "UTF-8"
[2] Altered the meta tag string from:
print '<meta http-equiv="content-type" content="text/html; charset=%s" >' % UNICODE
to:
print '<meta charset="UTF-8"/>'
The output now appears normal. Need to run with this for a while to see if there are any untoward side effects. This also does not answer the question as to why this runs fine on the original isfdb.org