Difference between revisions of "User:Alvonruff"

From ISFDB
Jump to navigation Jump to search
Line 6: Line 6:
 
* [[User:Alvonruff/ISFDB2_Notes]]
 
* [[User:Alvonruff/ISFDB2_Notes]]
 
* [[User:Alvonruff/The_Charset_Problem]]
 
* [[User:Alvonruff/The_Charset_Problem]]
 +
* [[User:Alvonruff/Notes on MySQLdb]]
 
* [[User:Alvonruff/Test_Pages]]
 
* [[User:Alvonruff/Test_Pages]]
  

Revision as of 16:28, 10 May 2022

Founder of the ISFDB.


ASCII vs UTF-8 vs Latin1

Mysql

Clearly the most expressive character set is utf8mb4. The current ISFDB database is using the character latin1 (ISO-8859-1).

  • My database (a fresh install defaults to UTF8) has been set to the same character set as the online ISFDB with: alter database isfdb character set latin1 collate latin1_swedish_ci;
  • An attempt was made to update the character set to something that works with the new MySQLdb with: alter database isfdb character set utf8mb4 collate utf8mb4_unicode_ci;. This had no affect on the output.

MySQL Python APIs

It turns out MySQLdb has has moved along. There are several revisions and forks of MySQLdb1, which is the legacy version intended for python2, and there is MySQLdb2, which is the new version for python3. But MySQLdb1 is outdated and unsupported, and MySQLdb2 is not ready for production use.

  • MySQLdb1 - This can be found at https://github.com/farcepest/MySQLdb1. The latest version is 1.2.4, but it is 10 years old. This is intended for machines still using python2. There is an ancient ten-year old TODO to support Python 2.7-3.3, which my installation is beyond. Installation method:
  • MySQLdb2 - This is a fork of MySQLdb1. Also absolete. This has the same character set issues of MySQLdb1. Installation: sudo pip2 install mysqlclient. This version of MySQLdb has two relevant optional arguments when connecting: use_unicode and charset:
    • use_unicode. The proper format would be use_unicode=True. This is supposed to cause MySQLdb to return unicode characters instead of bytes. This doesn't seem to change any visible behaviors, and the code defaults to False on python2 and True on python3.
    • charset. The proper format would be charset='target_charset', where valid charsets are:
      • utf8mb4 - generates an ascii encoding exception
      • utf8mb3 - generates an ascii encoding exception
      • latin1 - generates an ascii encoding exception
      • koi8r - generates ? in place of the target character.
      • koi8u - generates ? in place of the target character.
  • moist - This is a fork of MySQLdb2, which was refactored such that it requires at least Python-2.7 and compatible with Python-3.x (whatever that means). It also warns that :i "is not yet ready for prime-time". I haven't tried this as yet.
  • MySQL Connector - MySQL has released their own module called MySQL Connector. This is clearly the correct API to use, as it is the only officially support Python API right now. However, the current version only works on Python3, and requires rewriting all of the ISFDB database access code. According to their compatibility chart, Connector release 8.0 can work with MySQL 8.0 and python2.7, but only before connector release 8.0.24. Initial attempts at installing 8.0.23 were unsuccessful, failing the dependency checks. The latest release can be installed with:
    • sudo pip3 install mysql-connector-python
  • Remarks on this layer: There are likely going to be 3 steps to getting a Python MySQL API to work with a modern LAMP stack:
    • STEP 1: Since we have the code to MySQLdb1, configure/modify that code until we get a working system. This would be a Python2.7 implementation. Everything in the LAMP stack would be up to date, with the exception of Python.
    • STEP 2: Get the MySQL Connector 8.0.23 to work. We can create a shim layer on top of it, that will have the same API as MySQLdb1, eliminating the need for rewriting ISFDB code (maybe).
    • STEP 3: Move the ISFDB to python3, rewriting all required code, and update the database section at the same time. This is an obviously major task, but necessary to move everything into the modern era.

UPDATE (4/6/22): I took a copy of the MySQLdb source from the server and rebuilt on my machine. There were problems with the egg cache, but once resolved, running MySQLdb 1.2.2 had no effect on the problem. So the issue is either related to python2.7 or MYSQL 8.0.28.

python2.7

The default encoding in python2.7 is ASCII.

ISFDB

Web Browser

Reading List

2021

2020

2019

2018

2017

2016

2015

2014

2012

2011

2010

2009

2008

2007

2006