User:Alvonruff/The Charset Problem
Jump to navigation
Jump to search
XXX
- Browser - The Browser just follows the html content-type indicator, as well as the <meta> tag. This definitely affects the appearance of the text, as this was one of the first hacks attempted at isfdb2.
- Apache - Apache now has a configurable charset. This defaults to utf-8, based on this entry in the config file: AddDefaultCharset UTF-8
- ISFDB Scripts - Whatever is stored in the UNICODE variable in localdefs.py, which is currently ISO-8859-1 (latin1)
- Python2.7 - Defaults to UTF-8
- MySQLdb - ??
- MySQL - Set to latin1
MySQLdb
The Connection() function takes an optional arguments named use_unicode, and charset (these only work on MySQL-4.1 and newer).
conn = mysql.connect(host='127.0.0.1', user='user', passwd='passwd', db='db', charset='utf8', use_unicode=True)
These arguments are not used on ISFDB1, so we will also bypass these for now on ISFDB2.
MySQL
The current ISFDB character set of the MySQL database is latin1 (ISO-8859-1):
mysql> select default_character_set_name, default_collation_name from information_schema.schemata where schema_name='isfdb'; +----------------------------+------------------------+ | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | +----------------------------+------------------------+ | latin1 | latin1_swedish_ci | +----------------------------+------------------------+
That said, there are other MySQL charset variables to look at. On ISFDB1, we have:
mysql> show variables like '%character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+
While on ISFDB2, MySQL defaulted these variables to:
mysql> show variables like '%character_set%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8mb4 | | character_set_connection | utf8mb4 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8mb4 | | character_set_server | utf8mb4 | | character_set_system | utf8mb3 | | character_sets_dir | /usr/share/mysql/charsets/ | +--------------------------+----------------------------+
These variables can be set using the mysql app by issuing the following commands:
- set character_set_results = 'latin1';
- set character_set_server = 'latin1';
- set character_set_client = 'latin1';
- set character_set_connection = 'latin1';
character_set_system is a read-only variable and cannot be changed at runtime. Changing the four above variables had no observable effect on the issue.