Difference between revisions of "User:Alvonruff/ISFDB2 Notes"

From ISFDB
Jump to navigation Jump to search
Line 42: Line 42:
 
* MySQL: 8.0.26
 
* MySQL: 8.0.26
 
* Python: 2.7.18
 
* Python: 2.7.18
 +
 +
==Charset Experiments==
 +
 +
I have a python script for generating Wikipedia article stubs from the ISFDB tables in MySQL. It was run on both isfdb.org and isfdb2.org. Running diff on the outputs shows:
 +
 +
<pre>
 +
4c4
 +
< | name        = Philip José Farmer
 +
---
 +
> | name        = Philip José Farmer
 +
</pre>
 +
 +
If, however, the files are brought up in the vim text editor, they both appear to be correct. If I pull the name string out of each file and run ''od -X --endian=big STRING_FILE'', the results are (with hand annotation):
 +
 +
<pre>
 +
0000000 5068696c 6970204a 6f73e920 4661726d        Phil  ip J  osé  Farm
 +
0000020 65720a00
 +
 +
0000000 5068696c 6970204a 6f73c3a9 20466172        Phil  ip J  osé  Far
 +
0000020 6d65720a
 +
</pre>
 +
 +
This generates 2 questions:
 +
# Why is the output different on the two systems, and
 +
# Why does the file appear correct when viewed inside a vim session?
 +
 +
The answer to the first question is highlighted by the answer to the second question: vim uses utf-8 as it's default charset. I made the following changes to ''common/isfdb.py'':
 +
 +
[1] Altered the html content type output from:
 +
<pre>
 +
print 'Content-type: text/html; charset=%s\n' % UNICODE
 +
</pre>
 +
to
 +
<pre>
 +
print 'Content-type: text/html; charset=%s\n' % "UTF-8"
 +
</pre>
 +
 +
[2] Altered the meta tag string from:
 +
<pre>
 +
print '<meta http-equiv="content-type" content="text/html; charset=%s" >' % UNICODE
 +
</pre>
 +
to:
 +
<pre>
 +
print '<meta charset="UTF-8"/>'
 +
</pre>
 +
 +
The output now appears normal. Need to run with this for a while to see if there are any untoward side effects. This also does not answer the question as to why this runs fine on the original isfdb.org

Revision as of 06:43, 28 April 2022

The isfdb2 staging system is a minimal system, with few packages installed, which uses dnf instead of apt-get.

Prerequisites

The staging system a minimum configuration AlmaLinux system, which is a variant of Fedora Core. It's really intended for tight cloud installations, so almost everything is missing, and installation of packages is done with yum/dnf.

  • dnf install gcc
  • dnf install make
  • dnf install tar
  • dnf install zip.x86_64
  • dnf install bzip2.x86_64
  • dnf install wget

Apache

  • dnf install httpd
  • firewall-cmd --add-service=http --add-service=https --permanent
  • service httpd start

MySQL

  • dnf update
  • dnf module enable mysql:8.0
  • dnf install @mysql
  • systemctl enable mysqld
  • systemctl start mysqld
  • Issue: mysql
  • While in mysql, issue the command: create database isfdb;
  • While in mysql, issue the command: use isfdb;
  • While in mysql, issue the command: alter database isfdb character set latin1 collate latin1_swedish_ci;
  • While in mysql, issue the command: source <<backupfile>>;'
  • GRANT ALL PRIVILEGES ON isfdb.* TO 'isfdb1'@'localhost';

Python 2.7.18

  • dnf install python2.x86_64
  • dnf install python2-devel.x86_64
  • dnf install mysql-devel.x86_64
  • pip2 install mysqlclient

Versions

  • Linux: 4.18.0-240.15.1.el8_3.x86_64 x86_64
  • Apache: Apache/2.4.37 (AlmaLinux)
  • MySQL: 8.0.26
  • Python: 2.7.18

Charset Experiments

I have a python script for generating Wikipedia article stubs from the ISFDB tables in MySQL. It was run on both isfdb.org and isfdb2.org. Running diff on the outputs shows:

4c4
< | name        = Philip José Farmer
---
> | name        = Philip José Farmer

If, however, the files are brought up in the vim text editor, they both appear to be correct. If I pull the name string out of each file and run od -X --endian=big STRING_FILE, the results are (with hand annotation):

0000000 5068696c 6970204a 6f73e920 4661726d         Phil   ip J   osé   Farm
0000020 65720a00

0000000 5068696c 6970204a 6f73c3a9 20466172         Phil   ip J   osé   Far
0000020 6d65720a

This generates 2 questions:

  1. Why is the output different on the two systems, and
  2. Why does the file appear correct when viewed inside a vim session?

The answer to the first question is highlighted by the answer to the second question: vim uses utf-8 as it's default charset. I made the following changes to common/isfdb.py:

[1] Altered the html content type output from:

print 'Content-type: text/html; charset=%s\n' % UNICODE

to

print 'Content-type: text/html; charset=%s\n' % "UTF-8"

[2] Altered the meta tag string from:

print '<meta http-equiv="content-type" content="text/html; charset=%s" >' % UNICODE

to:

print '<meta charset="UTF-8"/>'

The output now appears normal. Need to run with this for a while to see if there are any untoward side effects. This also does not answer the question as to why this runs fine on the original isfdb.org