Difference between revisions of "User/Alvonruff/2to3"

From ISFDB
Jump to navigation Jump to search
Line 44: Line 44:
 
                 records.append(list(record[0]))
 
                 records.append(list(record[0]))
 
                 record = cursor.fetchmany()
 
                 record = cursor.fetchmany()
 +
 +
This issue is the most time-consuming part of the port to Python3. It's unclear what the best procedure is for moving forward:
 +
 +
* On the one hand, our python scripts currently treat dates as though they are strings, so why not just make them strings in MySQL?
 +
* On the other hand, we have SQL statements that select on YEAR(), MONTH(), DAYOFMONTH(), DAY(), etc.. which all require the column to be of type DATE.
 +
* On the third hand, there are about 1800 unique select statements, with an unknown number that would require modification to stringify the DATE field. Some of these are not straightforward to change (looking at you SQLloadAllTitleReviews() with your four variant clauses with field renaming like 'review.title_copyright as review_date')
 +
* On the fourth hand, I haven't gotten to the editing portion of the program, so I'm not sure how well it's going to work to store an invalid date through the connector into a DATE field.
  
 
== MD5 is Not a Distinct Module Under Python3 ==
 
== MD5 is Not a Distinct Module Under Python3 ==

Revision as of 16:34, 28 April 2023

Python3 Conversion

  • Use 2to3 in a manner the same as futurize. This produces results suitable for python3.
  • Update to the official MySQL mysql.connector. There is no python3 support for MySQLdb, which is our current connector. Moving to a new connector is a prerequisite.
  • Fix the following issues:

DATE Field With Month or Day of Zero

Probably the most significant issue found so far is that the mysql.connector does not support datetime values where either the month or day are zero. As stated in the MySQL documentation:

   “Zero” date or time values used through Connector/ODBC are converted automatically to NULL because ODBC cannot handle such values.

This is a fundamental change to Python3 (See https://docs.python.org/3/library/datetime.html):

   class datetime.date(year, month, day)
       All arguments are required. Arguments must be integers, in the following ranges:
       MINYEAR <= year <= MAXYEAR
       1 <= month <= 12
       1 <= day <= number of days in the given month and year
       datetime.MINYEAR - The smallest year number allowed in a date or datetime object. MINYEAR is 1.
       datetime.MAXYEAR - The largest year number allowed in a date or datetime object. MAXYEAR is 9999.

So the following date constructions are no longer allowed in a datetime variable:

  • 0000-00-00
  • 1984-00-00
  • 1984-10-00
  • 8888-00-00
  • 9999-00-00

This means that many dates in the ISFDB are returned as NULL by mysql.connector, which is converted to None in Python, which are then displayed as 'unknown' by the ISFDB. The least invasive workaround would be to use DATE_FORMAT to convert the datetime field before it ever leaves MySQL:

    select *, DATE_FORMAT(title_copyright, '%Y-%m-%d') as title_date  from titles where title_id=92392;

This has the side effect of adding the stringified date as a new field appended to the end of the record. There would then need to be code to patch up the NULL title_copyright field:

   for record in records:
       index = len(record)-1
       record[TITLE_YEAR] = record[index]

This will make the TITLE_YEAR field a string instead of a datetime. Note that this requires the records array to be constructed with the list() operator:

       while record:
               records.append(list(record[0]))
               record = cursor.fetchmany()

This issue is the most time-consuming part of the port to Python3. It's unclear what the best procedure is for moving forward:

  • On the one hand, our python scripts currently treat dates as though they are strings, so why not just make them strings in MySQL?
  • On the other hand, we have SQL statements that select on YEAR(), MONTH(), DAYOFMONTH(), DAY(), etc.. which all require the column to be of type DATE.
  • On the third hand, there are about 1800 unique select statements, with an unknown number that would require modification to stringify the DATE field. Some of these are not straightforward to change (looking at you SQLloadAllTitleReviews() with your four variant clauses with field renaming like 'review.title_copyright as review_date')
  • On the fourth hand, I haven't gotten to the editing portion of the program, so I'm not sure how well it's going to work to store an invalid date through the connector into a DATE field.

MD5 is Not a Distinct Module Under Python3

Use of md5 needs to be replaced with hashlib:

   import hashlib
   result = hashlib.md5(something)

DateTime Variables Cannot be Subindexed

We have a lot of date manipulation code that looks like this:

   year = title[TITLE_YEAR][:4]

This generates an error under python3. The subfields need to be extracted with the appropriate method:

   year = title[TITLE_YEAR].year
   month = title[TITLE_YEAR].month
   day = title[TITLE_YEAR].day

This situation is complicated, since some fields are in datetime, and some are strings (due to the mysql.connector). Most of the effort in converting in a file to Python3 tends to be datetime issues. We will likely need to create a custom datetime class that supports existing year, month, and day values that a now invalid in Python3.

String Methods Cannot Be Accessed via 'string'

Python3 doesn't allow the following syntax:

   string.replace()

It instead should be:

   str.replace()
   str.strip()
   str.rstrip()

mysql.connector Doesn't Have a db.escape_string() Method

Since mysql.connector is DB API v2.0 compliant interface, the API automatically performs the escape without asking.

Surprise - Division Works Differently on Python3

The following code generates different results on Python2 vs Python3:

   checksum = 112
   remainder = (checksum/10)*10

On python2 it returns the integer 110, while on python3 it returns the float 112.0. Under python3 the division operator always returns a float. This obviously breaks the ISBN checksum calculator. The corrected code would be:

   checksum = 112
   remainder = int(checksum/10)*10