Difference between revisions of "User talk:Alvonruff"

Latest revision as of 22:51, 7 October 2023

MediaWiki

Hi. I am responsible for a MediaWiki-based wiki for work, and in early 2020 I did a major upgrade from 1.16 to 1.34, which included having to upgrade MySQL, PHP, and a bunch of extensions. My environment is Windows, but from a MediaWiki point of view, I don't think underlying OS makes much difference. I realize ISFDB's MediaWiki is even older than what I started with, but if I can help, answer questions, or experiment, let me know. I'm happy to try to lend a hand. --MartyD 08:46, 1 January 2021 (EST)

So I might as well start with a current accounting of everything. My notes (from May 2014) from a previous attempt to move to MediaWiki 1.22 (that I never finished) showed that we needed to do the following:

Move to SQL 5.0.2 or later. We were on 5.0.45 at that time (and we still are)
Move to PHP 5.3.2 or later. We were on 5.2.4 at that time (and we still are)

I have the following packages/add-ons laying around in my home directory at the ISFDB:

mediawiki-1.12.0rc1.tar (original Mediawiki version)
ImageMagick-6.4.1-8.i386.rpm
highlight-2.6.10
geshi-1.0.7.21.tar
SyntaxHighlight_GeSHi.class.php
SyntaxHighlight_GeSHi.i18n.php
SyntaxHighlight_GeSHi.php

The following extensions are installed in wiki/extensions:

ConfirmEdit
SyntaxHighlight_GeSHi
SVGtag.php

Moving PHP is easy, because nothing else on our system relies on it. Looks like MySQL 5.5.8 is required for the latest version of MediaWiki. I'm running 8.0.22 on my fresh install at home, and I am seeing errors on all pages with incorrect date values. I'm not exactly done with my installs, so some of these might be artifacts of some other issues, but there are some notes/queries elsewhere in our wiki about date format issues while using later versions of MySQL.

If a MySQL move is required to get to the new MediaWiki, then obviously we need to move the ISFDB along as well. So that seems like the first step to me. Alvonruff 14:15, 1 January 2021 (EST)

I have been running MySQL 5.5.17 on my development server for the last 6 years or so. I have encountered only one problem so far. A particularly tricky SQL query ran fine under 5.5.17, but it hung on the production server. I had to break up the query into two separate queries for it to work. Based on my experience with it, we should be able to upgrade the production version to 5.5.17 without running into any issues. Ahasuerus 17:03, 1 January 2021 (EST)

I'm running MySQL 5.7.29 with MediaWiki 1.34 and PHP 7.4.4 at the moment. (I have not done the 1.34 -> 1.35 update yet). I did not have any MySQL problems but did have some PHP-related issues where behaviors, packages/functions, and defaults have changed. And then MediaWiki itself needs different syntax and different packages/settings in LocalSettings.php. I recall having to do the upgrade in multiple steps (I thought I had to do something like get to 1.27 first, then from there go to 1.34), but the UPGRADE information seems to suggest being able to convert directly from 1.1x to 1.35. If you'd like, I can get my environment up to snuff for 1.35 and then try upgrading a recent dump and see what happens. DB upgrade will probably take hours. I could also see what's up with those three extensions. --MartyD 08:02, 2 January 2021 (EST)

That would be awesome, given you've already done this before. Do you need a current listing of LocalSettings.php ? Alvonruff 16:40, 2 January 2021 (EST)

I'm happy to help with the extensions and LocalSettings.php, as I have experience with those where I work (I maintain several different wikis, and have updated them multiple times now). I'd need to have admin access via command line, though, as that's how I know how to do things. :) ···日本穣 · 投稿 · Talk to Nihonjoe 12:23, 4 January 2021 (EST)

@Al: Yes, if you could send me the LocalSettings.php, that would be great. You can XXXX out the database credentials -- I will use my own -- and I don't think there is anything else sensitive in there. You might need to zip it or rename it to ".txt" to get it through any mail filtering. --MartyD 14:38, 6 January 2021 (EST)

Ongoing work on HTTPS-Support of ISFDB

I'm currently in contact with Ahasuerus who is working on conversion ISFDB the python code to configure/enable HTTPS (#1298 Support HTTPS). Which is somewhat a bigger task, I thought at the beginning, doing it right.

I'm currently running my own HTTPS-implementation of HTTPS on a local server, using newest MariaDB, MediaWiki 1.36, Apache 2.4. Upgrading ISFDB to a similar setting requires OS-Updates, including MySQL and other upgrades (as mentioned above).

I won't go into detail here, but [[User_talk:Ahasuerus|Ahasuerus] asked to jump into the current discussion. There is currently a need for the system administrator, who is responsible for the server (upgrading...) --elsbernd 06:25, 28 November 2021 (EST)

I have moved the discussion from my Talk page to Development/HTTPS. I plan to add a list of identified dependencies next. Ahasuerus 16:28, 28 November 2021 (EST)

I have created a list of dependencies and sent Al an e-mail. Ahasuerus 18:15, 28 November 2021 (EST)

2022-03-05 performance issues -- a DDOS attack?

Our current performance issues may be due to a DDOS attack -- see these findings for details. Would you happen to have any ideas? Ahasuerus 13:16, 5 March 2022 (EST)

SQLloadNextSubmission error

I am trying to recreate the SQLloadNextSubmission error that you ran into on my development server. A couple of questions to make sure that we are on the same page:

Did you create your test user using "scripts/create_user.py" as per Enable ISFDB Editing?
Did you use the latest SVN version of scripts/create_user.py?

Ahasuerus 16:16, 5 March 2022 (EST)

Yes to both. I do only have exactly 1 user on the system. The observed error is:

   Traceback (most recent call last):
     File "/usr/lib/cgi-bin/mod/submission_review.cgi", line 44, in <module>
       ApproveOrReject('%s.cgi' % submission_filer, submission_id)
     File "/usr/lib/cgi-bin/mod/common.py", line 110, in ApproveOrReject
       PrintSubmissionLinks(submission_id, reviewer_id)
     File "/usr/lib/cgi-bin/mod/common.py", line 127, in PrintSubmissionLinks
       next_sub = SQLloadNextSubmission(submission_id, reviewer_id)
     File "/usr/lib/cgi-bin/mod/SQLparsing.py", line 2139, in SQLloadNextSubmission
       db.query(query)
   ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'groups\n

I changed PrintSubmissionLinks to:

       try:
               next_sub = SQLloadNextSubmission(submission_id, reviewer_id)
       except:
               next_sub = 0

But there is another failure case when SQLloadNextSubmission actually succeeds, and there is a next_sub, but I need to figure out the steps to get there. —The preceding unsigned comment was added by Alvonruff (talk • contribs) .

According to the error message above, it's a syntax error in the following SQL query:

       query = """select * from submissions s
               where s.sub_state = 'N'
               and s.sub_holdid = 0
               and s.sub_id > %d
               and not exists (
                       select 1 from mw_user u, mw_user_groups groups
                       where s.sub_submitter != %d
                       and s.sub_submitter = u.user_id
                       and u.user_id = groups.ug_user
                       and groups.ug_group = 'sysop'
                       )
               order by s.sub_reviewed
               limit 1""" % (int(sub_id), int(reviewer_id))

Would it be possible to use Python's "print" statements to display the values of "sub_id" and "reviewer_id" before the query is executed?

submission_id = 5243522 (same as the argument to the cgi script), reviewer_id = 2

Also, when you say that you have only 1 user in the database, do you mean that you are not using the publicly available backups? Or did you truncate mw_user after installing them? Ahasuerus 18:28, 5 March 2022 (EST)

No, the mw_user table has 1,977,439 entries in it. The mw_user_groups table, which control the editing permissions was empty in the backup, so it now has two entries for me. Since I was already present in mw_user, I modified the create_user.py script to not insert me again into that table, and let it do all the password stuff, and then add the two entries into mw_user_groups (sysop and bureaucrat).

What MySQL version are you using? "GROUPS" is a reserved word as of 8.0.2 per https://dev.mysql.com/doc/refman/8.0/en/keywords.html#keywords-8-0-detailed-G ErsatzCulture 19:24, 5 March 2022 (EST)

That will do it, thanks! Let me change it to something else... Ahasuerus 19:51, 5 March 2022 (EST)

Done, although the list of recently added MySQL reserved words is so long that I wouldn't be surprised if we ran into something else during testing. Ahasuerus 20:20, 5 March 2022 (EST)

Python 2.7

I am not sure how much it may help with Linux, but here is what I have been running on my Windows 10 development server for the last few days:

Python 2.7.18 -- https://www.python.org/ftp/python/2.7.18/python-2.7.18.msi -- found here. Note that it's the 32-bit version of Python. The 64 version apparently doesn't work with MySQLdb.
MySQLdb 1.2.4 -- https://sourceforge.net/projects/mysql-python/files/mysql-python/1.2.3/MySQL-python-1.2.3.win32-py2.7.msi/download -- which is explicitly compatible with Python 2.7.18.
MySQL 5.5.17. I have been running this version of MySQL for a number of years and have come across only one backward compatibility issue. A SQL query which successfully ran under 5.5.17 hung on the live server. I had to break it up into 2 queries, which was not a big deal.

I am in the process of testing every Web page and every ISFDB process on my development server. Hopefully everything will check out. The current Python code does a number of funky things like occasionally redefining "id" and "type" and it's possible that Python 2.7 may be less tolerant of that level of abuse. We should find out soon enough. Ahasuerus 18:46, 6 April 2022 (EDT)

Python2.7 does seem to be the likely culprit (more so than MySQL itself). I'll be installing 2.5, and that will tell me where the problem lies. Alvonruff 06:04, 7 April 2022 (EDT)

I'm curious why Python 2.7 is being used for the upgraded site. Wouldn't it be better to switch to a newer version, like 3.8, or even 3.10, especially since 2.7 reached EOL in 2020? ···日本穣 · 投稿 · Talk to Nihonjoe 13:07, 9 May 2022 (EDT)

Unfortunately, Python 3.x, unlike Python 2.7.x, is not backward compatible with Python 2.5.x. Ahasuerus 13:30, 9 May 2022 (EDT)

That's true, but since so much of the site is being rewritten and redone, wouldn't it be better for the future to redo any Python used on the site to use Python 3.x? ···日本穣 · 投稿 · Talk to Nihonjoe 13:35, 9 May 2022 (EDT)

I am sure we would all like to move to Python3, however, that would entail many changes (which could be done and probably should be however these are fraught with issues and thus will take significant time to untangle). For example, the website is serving pages in Latin1 (not UTF-8) and I believe the database is storing strings that way too. We get Unicode support by allowing HTML entity coding in said strings. It is quite ugly but it works. Strings in Python2 are basically binary strings (although in late versions there is also a "unicode" string type). In Python3 all strings are Unicode (though not UTF-8; there is a PEP for the encoding someplace) but there is also a "bytes" type which is basically a binary string (as well as a mutable "bytearray" type). We would likely want to update the database to use UTF-8 strings and get the website to serve UTF-8 and get rid of all the HTML entity encodings for for non-Latin1 content but updating all the Python, JavaScript and SQL code to handle such is a nontrivial undertaking. Could we move to Python3 keeping our currently encoding mess? Maybe but I am not sure it is worthwhile. —Uzume (talk) 01:54, 7 September 2022 (EDT)

There are three pretty big issues moving to Python 3:

The primary function of the isfdb is to gather information from MySQL, and then organize, format, and print that data. Since the fundamental data type for strings is different in Python 3, and the methods for printing are different (including the syntax of those methods), all of the formatting/output code of the isfdb code would require rewrites - and that's the vast majority of the isfdb. There are some automated tools to help in such a conversion, but I haven't tried those as yet.
The current connector between MySQL and the ISFDB only works on Python 2.X. Moving to Python 3 requires moving to a new connector (most likely the connector produced by MySQL). That connector uses a different paradigm for the storage of queries, so all of SQLparsing.py would require a rewrite. I have done experiments with Python 3 and the new connector, and have written up how to convert our current SQL design patterns into the new connector requirements, but our SQL code isn't isolated to just SQLparsing.py (we have a ton of support scripts that would also need conversion).
As Uzume discussed above, the character set problem is probably the biggest issue of all, and was the single biggest issue in the modernization project this year (see User:Alvonruff/The Charset Problem). Since python and the connector both have charset changes, we would almost certainly attempt to simplify the charset problem by moving to the unicode standard (which we absolutely should do at some point), which would require converting all of the data resident in MySQL (as in we would need to write conversion tools to pull every line of data out of MySQL, convert it from the current mish-mash of Latin1+Other Stuff into unicode), and then write it back into MySQL. This part of the project comes with a super-crazy amount of risk.

On a scale of 1 to 10, I would put the difficulty of this year's modernization project at a 2 or 3, and we started that at the beginning of the year, and still haven't quite completed that project. That said most of the changes we did for the modernization project were configuration changes, and a smattering of single-line code additions, not a rewrite. I estimate the difficulty of a python 3 conversion to be more like a 7 or 8, and may take more than a year to complete. I've done some experimentation for a python 3 move, and will likely use the isfdb2 staging server for that project next year. But we were out of time due to the hosting change, which put python 3 out of scope for this year.

It's really on the Python team for breaking compatibility between 2.X and 3.X. I'm sure there were fabulous reasons for doing so, but the barrier to entry for 3.X is causing widespread delays in its adoption. For instance, in the 3D printer community, the most modern firmware available is Klipper, which is still on Python 2.X, as moving to 3.X is a massive undertaking. Most teams are making decisions on whether they want to burn their limited volunteer time on doing new features, or on trying to move to the latest version of Python. In my mind, there have been two major versions of the ISFDB: the first was the non-database version written in C, with the indices compiled for online use, and the second was the move to python/MySQL. Moving to python 3 is such a large project, that I consider that to be the next major phase of the ISFDB, as it would be mostly a rewrite of everything. Alvonruff (talk) 07:10, 7 September 2022 (EDT)

We are far from the only ones so affected (e.g., Ren'Py used to use pygame mostly for its SDL binding but they needed to move from SDL1 to SDL2; pygame moved but also moved to Python3 so Ren'Py created its own SDL2 binding for Python2 until it can be migrated, however, migrating to Python3 is complex since Python2 is exposed to users of Ren'Py necessitating a major breaking change). That said, this has been coming for a very long time and we could have been more proactive to not let ourselves get put into the position we are not in, e.g., Python has had a Unicode string type since 2.0 (introduced with PEP 100; 2.0 was released 2000-10-16). Regardless, we are here now and it is a major issue that will take significant work to fix. As Al said, we will likely have to do a major database conversion (effectively making a new Unicode database based upon the current pseudo-Unicode one). That will necessitate significant changes to the codebase. I might make more sense to consider rewriting the code. For example, make a script to convert the database and then a Python3 access code and keep the current Python2 for database changes until we can get a new submission and moderator interfaces developed, etc. (i.e., the new Python3 would likely start as just a database viewer and the database would have to get say period updates converted from the current one, etc. until a full switch over could be made.). —Uzume (talk) 15:00, 12 September 2022 (EDT)

Moving from HTML-encoded characters to UTF-8 is clearly beneficial because it will allow proper searching and other functional improvements. However, it can be done under Python 2.7. What are the benefits of moving to Python 3? Ahasuerus (talk) 18:05, 12 September 2022 (EDT)

All of my own Python code for the past 5+ (?) years has been Python 3. However, there's very little that I use that isn't available in 2.7. From the discourse I see on Twitter, the main things of note that have been added in recent years are type hints, async and pattern matching (basically like a more powerful switch statement AIUI). None of these things are particularly interesting to me - I'm sure they have their uses, but I don't see any major wins for the stuff I personally do.

AFAIK (and I don't pay much attention to this) all distros still ship with both Python 2 and 3, none has moved to just Python 3. (And generally people use virtualenvs, Docker etc for their own preferred version/libs/etc.) In the context of ISFDB moving to Python 3, I think any benefits would be (a) not getting caught out if there are security issues discovered in Python 2; (b) longer term, new Python devs are likely to start on Python 3, so might struggle with Python 2 codebases; (c) picking up any new tooling/enhancements/whatever that almost certainly won't be available for Python 2. None of these strike me as convincing arguments to justify the work necessary for a 2->3 migration, at least at this moment in time.

IMHO there are plenty of other areas that might deserve attention over a 2->3 migration, but that's a different discussion... (Which is why I'd kept out of this talk item prior to this time) ErsatzCulture (talk) 18:29, 12 September 2022 (EDT)

My concerns with staying on Python2.7 are: 1) There are issues with the Python2 unicode model, and those issues were a driving factor in the creation of Python3 (see: https://python-notes.curiousefficiency.org/en/latest/python3/questions_and_answers.html). If we undertake a large charset project, we will encounter those issues. 2) The unicode support in MySQLdb is sketchy, and the newer connectors don't work with Python2. 3) Staying with Python2 leaves us marooned on MySQLdb, which was abandoned a decade ago. If we stay on Python2, then we'll either have to take ownership of MySQLdb to fix the unicode issues we encounter, or we'll need to fork mysql.connector to support Python2 (which may not be possible).

That said, I don't think we yet have a concrete (and reasonable) proposal for how to proceed. It can't be: we rewrite everything, including all data in the database - as we'll be debugging that for years, as we don't have a formalized, automated test system. I suspect the MySQL conversion and the MySQLdb/Python3 work can be done separately, but it's currently unknown how well one would work without the other, or how that work would be verified. --Alvonruff (talk) 07:52, 13 September 2022 (EDT)

What do other Python 2.7-based projects do when they need to store Unicode data in a database? Do they use MySQLdb in spite of its "sketchy Unicode support"? Ahasuerus (talk) 10:40, 13 September 2022 (EDT)

P.S. I see that there are ways to write Python code that should work under both Python 2.7 and Python 3.4+ -- see this page. Some parts require importing from "future" and related "futurization" modules, but it looks doable. If true, it should be possible to update our code to be Python 3-compatible while we are still on Python 2.7. It wouldn't immediately address some parts of the Python 2 libraries that have been deprecated in later versions of Python 3, e.g. cgi, but it's a start. Ahasuerus (talk) 08:07, 14 September 2022 (EDT)

This is a pretty attractive approach, which would create the following substeps to Python3:

* Use 2to3 (https://docs.python.org/3/library/2to3.html) to do a quick side conversion of files from python2 to python3. We would not directly use these files, but would instead use them to diff the delta between python2.7 and python3 to create a roadmap of changes we would need to implement via futurization.

* Do a futurization of the files, ultimately minimizing the list of outliers to those that cannot be addressed with this methodology. This can be done file by file to minimize overall risk.

* Do a full conversion to python3 and move to mysql.connector, but do not modify the database. It's a TBD to determine if this step is feasible, but tests show we can use python3 and mysql.connector and extract the same strings from the database that we currently see in python2.7. It's a question of whether they can be formatted for output by python3 to a browser correctly (which I don't know the answer as yet).

* Technically, the python3 conversion would be done at that point, leaving unicode as a different next big project. --Alvonruff (talk) 08:33, 14 September 2022 (EDT)

(unindent) I just did a quick run of both 2to3 (which does an automated attempt at porting python2.7 to python 3) and futurize on all files in our common directory (22 files). futurize was probably built on top of 2to3, as they take exactly the same arguments. Diffing the output of the two files generates a pretty small diff, mostly around the handling of iterators, which requires the futurized version to import object from builtins, which is then fed into every class method. Observations:

The output from 2to3 fails to execute under python3 due to "inconsistent use of tabs and spaces in indentation", so we'll need to post-process the files to meet the isfdb indentation standard (using something like PyCharm). The output from futurize seems to run fine under Python2.7. So an additional step for moving to Python3 is to fix the indentation issues.
new imports from futurize that would need to be removed to run under python3 were:
- from builtins import chr
- from builtins import map
- from builtins import object
- from builtins import range
- from builtins import str
- from __future__ import print_function
- from future import standard_library
We would need to remove all object references before a final move to Python3.

This seems pretty promising. We still have a need to run the current code base on isfdb2.org for now, so I'll explore the possibilities of having both futurized and non-futurized versions running at the same time (with different paths).

I futurized everything in biblio and common, and have it online at: https://www.isfdb2.org/cgi-bin/p3/index.cgi. Observations:

Futurize adds the __future__ imports above the pound bang line (#!_PYTHONLOC), which causes a runtime error. So some hand editing is required.
In general, the apps run, so long as there are no specific unicode encoding problem areas present (Hungarian, British Pound pricing, TradeMark symbol, etc). Examples:

—The preceding unsigned comment was added by Alvonruff (talk • contribs) .

I agree that it's an attractive approach. Some of the changes outlined above, e.g. spaces vs. tabs, are just cleanup which will be beneficial even if we don't move to Python 3 in the foreseeable future.

It's also good that we identified the issue with non-Latin-1 characters early in the process. If it turns out to be a show-stoppers, we'll know about it sooner rather than later. Ahasuerus (talk) 20:21, 14 September 2022 (EDT)

I agree. In everything I have read python-future is a bit dated but perhaps the best and most up-to-date method to create a Python2 and Python3 compatible code base and its futurize is a good method for doing a brute force conversion (which should always have manual oversight as it is not perfect) from Python2 to Python3. Since we are discussing major Python code changes, I wanted to bring up the code structure too. Currently we host this cold with a large number of assorted CGI scripts. I am not suggesting we totally abandon CGI hosting but currently this is particularly messy and forces people to continue hosting it such. I have proposed in the past moving all the code to WSGI and hosting under a single CGI interface allowing it to also be hosted in other ways (e.g., FastCGI, etc.). A single entry point means having to part CGI PATH_INFO to obtain path information past the script component but this is very commons in most server-side web apps today. Perhaps the bigger issue in moving to such is the permanence of inbound URLs, where we might want to setup some redirection or the like (but many do not need to be sustained like /edit/ and /mod/, etc.). On the up side this would also mean the installation would be simpler as the code could be installed completely as Python libraries with only the single CGI endpoint (and any redirecting compatible endpoints) needing to be installed in CGI space (so in most situations we could be rid of #!_PYTHONLOC). —Uzume (talk) 17:15, 17 September 2022 (EDT)

One thing to note is that most CGI-specific processing -- paths, script names, query values, validation, unescaping, etc -- has been moved to one central location over the last year. One temperamental part of CGI that hasn't been centralized yet is cgi.FieldStorage. And then there is the fact that the cgi module was deprecated in Python 3.11. Ahasuerus (talk) 17:56, 17 September 2022 (EDT)

CC license: 2.0 vs. 4.0

Earlier today User:Nihonjoe mentioned that Creative Commons 4.0 is the latest -- and almost identical -- version of CC 2.0, which we currently use. I know little about licenses, but it seems to be a correct assessment. Would you happen to be aware of any reasons why we shouldn't upgrade to 4.0? Ahasuerus 20:17, 18 April 2022 (EDT)

I have no issue with Attribution 4.0 International (CC BY 4.0). Alvonruff 06:37, 19 April 2022 (EDT)

Thanks! Ahasuerus 10:42, 19 April 2022 (EDT)

Searching

I don't know whose lap this would fall into (I will point Ahasuerus to this), but with regard to this Wiki search behavior, I think that may mean that maintenance/rebuildtextindex.php ought to be (re)run. IIRC, the normal background search index rebuilding only works off of the recent changes list. The move might have had some other oddball effect on the existing index that is causing it to not align correctly somehow. --MartyD (talk) 14:39, 14 September 2022 (EDT)

I'll leave it in Al's capable hands :-) Ahasuerus (talk) 15:07, 14 September 2022 (EDT)

Well before I embark on an update that "may take several hours" (according to the docs), I guess I'd like to know exactly what issue we're trying to solve, and how we know it the update had any effect. I'm not seeing any search issues, and I'm not using quotes. For example, if I search for the words normal background search (without quotes) from MartyD's post above, I see this page in the results. So is there a test search that fails, that we know should not fail? --Alvonruff (talk) 16:21, 14 September 2022 (EDT)

So just to be clear on what I'm seeing, let's take the Lord of the Rings example cited above:

If I type the words Lord of the Rings into the search bar, without quotes, I see a single hit for the page "Rules and standards discussion", which does not actually contain the phrase "Lord of the Rings".
If I then click on "Everything" (just below the search bar), I then see 5 hits, the two most important being "User talk:Animebill" and "User talk:Mavmaramis", both of which do contain the phrase "Lord of the Rings".
If I click on "Advanced" it reveals why this is the case. The default search namespace is "(Main)" and none of the other namespaces are checked, so searches will not look on the "User talk" namespace pages, which is why the phrase on both the "User talk:Animebill" and "User talk:Mavmaramis" pages fail to show up as hits.
If I select the "User talk" namespace, and then click on "Remember selection for future searches", and then search again, I see 4 hits. It doesn't display all 5 hits, because the fifth hit is "ISFDB:Moderator noticeboard", where the issue was reported, and that page is in the "ISFDB" namespace - which I did not enable.
If I go to "Advanced" and click the "All" button on the right (under "Check") as well as the box for "Remember selection for future searches", then a search shows all five hits. --Alvonruff (talk) 17:34, 14 September 2022 (EDT)

P.S. It now shows 6 hits, since I said the words on this page :) --Alvonruff (talk) 17:38, 14 September 2022 (EDT)

If I type "lord of the rings" with no quotes into the top search box, I get 6 hits, none showing "lord of the rings". The search results page shows context for me is Advanced, with Main + User + User talk + ISFDB + ISFDB talk + File + Help + Help talk + Magazine. Re-pushing the Search button there produces the same results. If I type "lord of the ring" with quotes, in either location, and all other things the same, it produces 5 hits, all of which show that phrase. I do notice the five pages found are a subset of the 6 pages found in the original hit, so perhaps what's displayed is just an artifact of how the snippet is selected; I did not notice that before. BUT.... Taking the original complaint that there should be many hits at face value, I also notice that all of the hits are on recently modified pages, and none is on a page not recently modified. Which makes me think (Male Answer Syndrome, I admit) that the background index rebuild is working on the recently changed pages, and so searching is finding those, but older pages need to be re-indexed. --MartyD (talk) 12:46, 16 September 2022 (EDT)

I think that Marty is right - the problem is not new pages - these get indexed just fine. The problem are old pages after the DB change. One page I always look for via search is Entering non-genre periodicals. The current search cannot find it in any way or form (it does find a few pages where it was referenced and which had been updated lately but not the main page). I just went via google search to pull it up but I suspect that the lack of an index may have made all our help pages not findable and this is not optimal... Annie (talk) 13:44, 16 September 2022 (EDT)

Ok. maintenance/rebuildtextindex.php has completed. Let's see if it rectifies the situation. --Alvonruff (talk) 14:51, 16 September 2022 (EDT)

Search for "Entering non-genre periodicals" (with or without quotes) shows the Help page (and its Talk page) now, including populating the "Page title matches" area which was empty before so we seem to be back to normal. A few random other help pages also pop up as soon as you look for them (which was not happening earlier). "lord of the rings" with no quotes pulls 174 results now. Marty will probably want to recheck but I think we are all sorted now. Thanks! Annie (talk) 15:00, 16 September 2022 (EDT)

That did it! Now "lord of the rings" finds 164 hits without quotes and 147 hits with quotes, many pages last edited years ago. Thanks! --MartyD (talk) 17:06, 16 September 2022 (EDT)

Adding Mediawiki extension TemplateStyles

Will you install the TemplateStyles extension? It will allow for individual templates to have specified mini stylesheets. Thanks! ···日本穣 · 投稿 · Talk to Nihonjoe 15:24, 27 December 2022 (EST)

Including the TemplateStyles extension causes a fatal wiki error across all pages. I'll need to debug on the other server. Alvonruff (talk) 08:48, 28 December 2022 (EST)

What's the specific error? I've had good luck using the Support desk over on the Mediawiki site. ···日本穣 · 投稿 · Talk to Nihonjoe 12:20, 28 December 2022 (EST)

Saturn's Children

https://www.isfdb.org/cgi-bin/pl.cgi?279672; https://archive.org/details/isbn_9780739499344; Archive.org copy uploaded way back in 2012, info fits your PV but the wrong cover (with the right artist) is shown on the ISFDB page, somebody named Aardvark or something like that entered the same edition recently but left out/botched some stuff and didn't PV, so if you want to add the Archive link to yours, replace the cover, and then maybe ask Mr. Aardvark if they think theirs is still needed or could be deleted. --Username (talk) 09:57, 30 December 2022 (EST)

This is a book I had at the time, but was sent away during one of the many book purges, so I can't go to the source any longer. That said, the image is correct, and is consistent with the other Ace editions, which list the artist as Joe Williamsen. So I suspect it requires an artist change, not an image change. Alvonruff (talk) 20:52, 30 December 2022 (EST)

Taff

https://www.isfdb.org/cgi-bin/ea.cgi?15027; Re: your recent entry of contents for Orphans of Bliss, either Taff's name has the periods or it doesn't, in which case a variant would need creating. --Username (talk) 19:08, 2 March 2023 (EST)

Python3 and Unicode thoughts

A couple of thoughts about User:Alvonruff#Python3_Notes, specifically "2. Update all character sets":

A number of Python modules currently reference "iso-8859-1" explicitly. In some cases it's apparently redundant and can be removed at any time, but a few modules like common/isfdb.py will need a manual review.
The software uses a number of HTML-encoded Unicode strings for certain UI elements and currency signs. They were recently consolidated in "class _Ui" and "class _Currency" in common/isfdb.py. There are a few other functions and classes that also use HTML-encoded Unicode string, e.g. "def ISFDBUnicodeTranslation" in common/library.py and a bunch of functions in the cleanup reports directory. Some can be consolidated ahead of time.

Also, my next big project is migrating Fixer to Python/MySQL and re-hosting it on a server accessible to moderators (which may or may not be the live ISFDB server depending on the likely impact on performance.) I have already migrated the Amazon API and some other supporting modules, but most of the heavy lifting still needs to be done. I should presumably go back and make sure that the previously migrated modules work under Python 3. None of them use MySQL, so they can't be a test case for the MySQL conversion. Ahasuerus (talk) 12:35, 27 April 2023 (EDT)

I haven't thought strongly about the character set problem as yet. I just happened to notice a number of display errors when I had index.cgi, ea.cgi, and pl.cgi all running under python3.

I think we can discuss hosting fixer on isfdb2, especially since all the login/moderator data is already there. It's just not stable right now, with python3 experiments, but there's no reason it couldn't have its own set of unaffected web pages.

I'm currently converting existing files for print(), MySQL, and tabs, and ensuring that they still run on python2. If you want access to the new SQLparsing.py let me know, and I'll drop a copy someplace on isfdb2. It runs fine under python3 (after installing mysql.connector, and making a 1-line change to the file) Alvonruff (talk) 15:21, 27 April 2023 (EDT)

Chances are that I won't be able to start working on migrating Fixer's data to MySQL until some time in June at the earliest, so there is no rush. I need to fix Bug 491 first and then I need to merge Fixer's "978 ISBNs" data store with the "B0 ASINs" data store. Designing a new workflow for Fixer won't start until after I am done with that.

As an aside, making Fixer's data available to moderators will change a lot of things since it will mean going from a single user (me) to multiple concurrent users. For starters, the Amazon API can only handle so many multi-ASIN requests per minute, so we will need a single-threaded background process with a built-in throttle mechanism.

Also, the current Fixer design is based on the idea that an ISBN/ASIN can be assigned to one (and only one) "queue". Fixer's other tables and algorithms -- high/low priority publishers, high/low priority authors, etc -- support this design. When multiple moderators start working on Fixer's data, they will concentrate on different areas as we have seen happen in the past. Using multiple servers (since creating lists of new ISBNs/ASINs will be done on a third server) will add more wrinkles. Etc.

Not all of this functionality needs to be implemented right off the bat, but we will want to make sure that the new design doesn't paint us into a corner. Ahasuerus (talk) 16:34, 27 April 2023 (EDT)

mysql.connector

A couple of questions/thoughts re: User:Alvonruff/mysql.connector:

mysql.connector 8.0.24 doesn't play nice with Python 2.7, but versions 8.0.23 and lower do, at least according to MySLQ.com. I wonder if 8.0.23 may be worth a try since it would presumably let us upgrade the ISFDB software in small increments while remaining on Python 2.7.
The MySQLConnection.sql_mode Property section of the "MySQL Connector/Python Developer Guide" has a list of connector properties that can be set. The list includes 'NO_ZERO_IN_DATE' and 'NO_ZERO_DATE', which is what MySQL uses to support "00" month/day values and "0000-00-00" values respectively. Have you tried experimenting with them?

Ahasuerus (talk) 16:16, 30 April 2023 (EDT)

I experimented with the Python3 datetime class (not related to MySQL at all), and it no longer supports the concept of a month=0 or day=0, and throws an exception. Suggestions on the Internet are to use year-01-01, which doesn't work for us. So I currently believe we need to extract the dates as a string (possibly into a custom datetime class). Nonetheless, it feels like something is being missed, as this is a pretty big fundamental issue. There are ways of creating a custom converter for the connector, which I'm also looking at.

Using 8.0.23 may be interesting. It may also pull is into the charset issues (which I haven't started looking at yet). Alvonruff (talk) 06:19, 1 May 2023 (EDT)

Okay, I am onto a different trail. If you initiate the connector with raw=True, you get back a bytearray for each field, so fields of type DATE are returned, since you are bypassing the converter in the connector. You then cast with str() using the target charset (I am oversimplifying, as you need to worry about NULL and integer fields). I have something hobbling, but need to unwind the other changes. Alvonruff (talk) 22:25, 1 May 2023 (EDT)

What will the "raw mode" do to time, float and enum values? Also, if I understand the idea correctly, titles like "1984", authors like 171 and series names like "1999" will be converted to int, right? Ahasuerus (talk) 23:08, 2 May 2023 (EDT)

[Edit:] Another thought. As you said in your notes, ideally we would address this issue at the query level. Something like "SELECT CAST(title_copyright AS CHAR)" or "DATE_FORMAT(title_copyright, '%Y-%m-%d')" instead of "SELECT title_copyright". The most time-consuming problem with this approach is that we have a lot of "SELECT *" queries, where "*" would need to be changed to a full list of fields for this approach to work.

One possible workaround would be to create globally scoped variables or attributes (in common/isfdb.py) with lists of field names for each ISFDB table. Date field names would be defined to include "CAST" or "DATE_FORMAT". Once that's done, we should be able to modify all queries (mostly in common/SQLParsing.py) which start with "select * from table_name" to use "select %s from table_name" % LIST_OF_FIELD_NAMES_FOR_THIS_TABLE instead. Ahasuerus (talk) 23:36, 2 May 2023 (EDT)

[Edit #2] Another advantage of this workaround would be the ability to update all queries to use the newly defined globally scoped variables/attributes ahead of time, before migrating to the new connector. The connector migration patch would only need to change the lists of table-specific field names in common.isfdb.py to include "CAST" or "DATE_FORMAT" for date fields. Ahasuerus (talk) 07:39, 3 May 2023 (EDT)

The problem with both CAST and DATE_FORMAT is that they only work well if you are selecting just the date field. IF you issue the following SQL command:

   select *, cast(title_copyright as char) from titles where title_id=10604;

MySQL returns:

  +----------+-------- [...]
   | title_id | title_title                    | title_translator | title_synopsis | note_id | series_id | title_seriesnum | title_copyright | title_storylen | title_ttype | title_wikipedia | title_views | title_parent | title_rating | title_annualviews | title_ctl | title_language | title_seriesnum_2 | title_non_genre | title_graphic | title_nvz | title_jvn | title_content | cast(title_copyright as char) |
   +----------+------- [...]
   |    10604 | Doctor Who and the Giant Robot | NULL             |           NULL |  151121 |      5277 |              28 | 1975-03-13      | NULL           | NOVEL       | NULL            |        1316 |            0 |         NULL |              1223 |         0 |             17 | NULL              | No              | No            | Yes       | Yes       | NULL          | 1975-03-13                    |
   +----------+------- [...]

The date field is not modified in place, the string version is appended to the end of the record. DATE_FORMAT does the same. So there must be some fixup code as well, that takes the appended field at (titles[len(record)-1]) and copies it into the original field at (titles[TITLE_YEAR]), which will be a datetime from Python's perspective. This was the first strategy that I took, but examining each query to figure out where the FORMAT statement goes, as well as the placement of the fixup code was taking a considerable amount of time, and there are roughly 1400 unique query statements to go modify. So I was looking for something a little less invasive and error prone, which was raw mode.

The attribute field idea is interesting, however the fetch_row/fetchmany routines don't have any idea what record type they are reading - they just extract the next blob of available data. And not all queries extract whole records. For instance:

   query = "select metadata_counter from metadata"

So we would need to store a custom conversion mapping before making the query call. For instance:

   CNX.FORMAT("INT")
   query = "select metadata_counter from metadata"
   CNX.QUERY(query)

Or the more complicated:

   CNX.FORMAT("INT, STR, STR, INT, INT, INT, INT, CHAR, CHAR, INT, CHAR, INT, INT, FLOAT, INT, INT, INT, CHAR, INT, INT, INT, INT, CHAR")
   query = "select * from titles where title_id=10604;
   CNX.QUERY(query)

That can certainly be done, and it would be preferable to the current brute force method, but it is encroaching on custom converter territory. At least a custom converter has direct access to the MySQL field types, so we wouldn't have to specify them. I will look into that for round #3 on the problem. Alvonruff (talk) 10:00, 3 May 2023 (EDT)

Rereading my last response, I see that I may not have been clear. Let me try to expand a bit. I'll use SQLloadAwards in common/SQLparsing.py as an example. Currently the query statement reads:

   query = "select * from awards where award_id='%d'" % (award_id)

I was considering adding the following globally scoped variable (or property if we put these new values in a class) to common/isfdb.py:

   QUERY_AWARD = "award_id, award_title, award_author, award_year, award_ttype, award_atype, award_level, award_movie, award_type_id, award_cat_id, award_note_id"

We could then change the query statement in SQLloadAwards to:

   query = "select %s from awards where award_id='%d'" % (QUERY_AWARD, award_id)

Functionally, there should be no change to how the function works. Once we are ready to upgrade to the new connector, we will change the value of QUERY_AWARD to:

   QUERY_AWARD = "award_id, award_title, award_author, DATE_FORMAT(award_year, '%Y-%m-%d'), award_ttype, award_atype, award_level, award_movie, award_type_id, award_cat_id, award_note_id"

Similar global variables could be created for "titles", "authors", "pubs" and any other tables that include "date" columns. Switching from the current connector to the new one will then require a simple change to the QUERY_* variables/properties in common/isfdb.py.

Granted, it will only handle "*" queries. The rest will have to be checked and updated manually, but at least they can be found via grep since they are not hidden behind asterisks.

P.S. Another thought. The Python/MySQL converter documentation page mentions two possibly relevant argument names: "converter_class" and "converter_str_fallback". The first is described as "Converter class to use" and the second one as a False/true flag to "Enable the conversion to str of value types not supported by the Connector/Python converter class or by a custom converter class." Apparently the second one was added in 8.0.27, so it requires Python 3, but it could be the solution that we need. Ahasuerus (talk) 16:58, 3 May 2023 (EDT)

OK. Calling out each field works:

   mysql> select award_id, award_title, award_author, DATE_FORMAT(award_year, '%Y-%m-%d'), award_ttype, award_atype, award_level, award_movie, award_type_id, award_cat_id, award_note_id from awards where award_id=1234;
   +----------+---------------------+--------------+-------------------------------------+-------------+-------------+-------------+-------------+---------------+--------------+---------------+
   | award_id | award_title         | award_author | DATE_FORMAT(award_year, '%Y-%m-%d') | award_ttype | award_atype | award_level | award_movie | award_type_id | award_cat_id | award_note_id |
   +----------+---------------------+--------------+-------------------------------------+-------------+-------------+-------------+-------------+---------------+--------------+---------------+
   |     1234 | Another Short Story | Bruce Boston | 2000-00-00                          | Ar          | NULL        | 7           | NULL        |             3 |           21 |          NULL |
   +----------+---------------------+--------------+-------------------------------------+-------------+-------------+-------------+-------------+---------------+--------------+---------------+

I am good with that. Alvonruff (talk) 21:12, 3 May 2023 (EDT)

Update: I have backed out the previous raw mode code, and have a chunk of biblio up and running at isfdb2.org with the new proposal. The only addition is that we need a different set of variable names for when we have multi-table queries, and start using labels like t.* instead of *. Fortunately, most of the queries use the same format, so I have the following definitions so far:

   CNX_AUTHORS_STAR = "author_id, author_canonical, author_legalname, author_birthplace, DATE_FORMAT(author_birthdate, '%Y-%m-%d'), DATE_FORMAT(author_deathdate, '%Y-%m-%d'), note_id, author_wikipedia, author_views, author_imdb, author_marque, author_image, author_annualviews, author_lastname, author_language, author_note"
   CNX_ADOT_AUTHORS_STAR = "a.author_id, a.author_canonical, a.author_legalname, a.author_birthplace, DATE_FORMAT(a.author_birthdate, '%Y-%m-%d'), DATE_FORMAT(a.author_deathdate, '%Y-%m-%d'), a.note_id, a.author_wikipedia, a.author_views, a.author_imdb, a.author_marque, a.author_image, a.author_annualviews, a.author_lastname, a.author_language, a.author_note"
   CNX_TITLES_STAR = "title_id, title_title, title_translator, title_synopsis, note_id, series_id, title_seriesnum, DATE_FORMAT(title_copyright, '%Y-%m-%d'), title_storylen, title_ttype, title_wikipedia, title_views, title_parent, title_rating, title_annualviews, title_ctl, title_language, title_seriesnum_2, title_non_genre, title_graphic, title_nvz, title_jvn, title_content"
   CNX_TDOT_TITLES_STAR = "t.title_id, t.title_title, t.title_translator, t.title_synopsis, t.note_id, t.series_id, t.title_seriesnum, DATE_FORMAT(t.title_copyright, '%Y-%m-%d'), t.title_storylen, t.title_ttype, t.title_wikipedia, t.title_views, t.title_parent, t.title_rating, t.title_annualviews, t.title_ctl, t.title_language, t.title_seriesnum_2, t.title_non_genre, t.title_graphic, t.title_nvz, t.title_jvn, t.title_content"
   CNX_PUBS_STAR = "pub_id, pub_title, pub_tag, DATE_FORMAT(pub_year, '%Y-%m-%d'), publisher_id, pub_pages, pub_ptype, pub_ctype, pub_isbn, pub_frontimage, pub_price, note_id, pub_series_id, pub_series_num, pub_catalog"
   CNX_PDOT_PUBS_STAR = "p.pub_id, p.pub_title, p.pub_tag, DATE_FORMAT(p.pub_year, '%Y-%m-%d'), p.publisher_id, p.pub_pages, p.pub_ptype, p.pub_ctype, p.pub_isbn, p.pub_frontimage, p.pub_price, p.note_id, p.pub_series_id, p.pub_series_num, p.pub_catalog"

Alvonruff (talk) 14:19, 5 May 2023 (EDT)

(unindent) Looks promising! (Also, after sleeping on it I realized that leveraging converter_str_fallback would be a bad idea because it would return some date values as "dates" and some as "strings".)

One related consideration that comes to mind is that I have been moving globally scoped constants to the SESSION object/class and its sub-classes, which are defined in common/isfdb.py. For example, all supported currency signs are currently accessed via SESSION.currency.[currency], all supported publication types are found in SESSION.db.pub_types, all non-trivial UI elements are found in SESSION.ui, etc. It prevents namespace pollution as we add more globally scoped constants.

If we use the same approach for CNX constants, they would be accessed as SESSION.cnx.authors_star, SESSION.cnx.titles_star, etc. Ahasuerus (talk) 17:20, 5 May 2023 (EDT)

Amazon UK vs. Amazon.com

A quick note re: Amazon.com as the source of information about UK publications. I am looking at this submission which you created on 2023-04-10 and which used 2023-05-23 as the publication date. As it turns out, 2023-05-23 comes from Amazon.com while Amazon UK states that the publication date was 2023-02-16. Unfortunately, this happens quite often: some UK publishers make their books available in the US weeks or months after their appearance in the UK and vice versa. It's even worse with Australian books, which commonly appear on Amazon.com and Amazon UK months or even years after they were made available in Australia. And, of course, the Amazon store of the country of publication is much more likely to have the right price value for recent edition, although occasionally Amazon lists an Amazon-specific discounted price without mentioning the list price.

The good news is that things are getting better. Some publishers are meticulous about making their books appear on Amazon.com and Amazon UK at the same time. The bad news is that it's nowhere close to 100% (yet.) The Amazon store of the country of publication is still a much more reliable source than other Amazon stores (publisher sites tend to be even better, but they are not as convenient.) It's also the main reason why Fixer and human editors state the source of their data in the Note field -- it helps determine where our data originally came from if we find discrepancies at a later point.

Hope this makes sense! Ahasuerus (talk) 16:07, 23 May 2023 (EDT)

A few days ago, I went to retroactively add the ebook for this title, and was surprised to see an entry already in the database, even though my tools had highlighted that the ISBN and ASIN were not known. I also noticed the date issue on the physical, but frankly there were several other issues with these that would probably have caused a moderator comment if they'd come from a regular editor:

* (Both pubs): price should ideally be in £ given that the publisher is UK based. I see the hc pub has just been fixed, but the ebook still has this field empty.

* (Ebook): Page count should not be entered for an ebook, unless it's in a paginated format such as PDF

* (Ebook): No cover image, even though the entry was sourced from Amazon post publication, so would 99.9999% certainly have had an image

* (Ebook): No ISBN or ASIN. Whilst these can be a PITA - e.g. no easy way AFAIK to get an ISBN from Amazon, ASINs may differ between different Amazons - they're obviously super useful for any number of reason

* (Ebook): No date on the note to indicate when the data came from Amazon.com. Less of an issue when something is being entered post-publication, but helpful for trying to understand nightmares like this

I'll fix the ebook shortly, but I dunno if there are likely to be any other similar pubs? ErsatzCulture (talk) 16:38, 23 May 2023 (EDT)

Sorry guys. I was restricting myself to award data, but during the entry of the Aurealis 2022 found a lot of missing data, and did wind up putting in a fair amount of Australian titles so I could get the awards in. This particular title was not nominated, so I'm not sure what attracted me to that particular one. That said, the editing tools absolutely would not accept the Amazon image URL for that particular title, and it still has no ISBN or ASIN, so I'll be interested to see how its resolved. The only other publications entered were related to the Aurealis awards. I'm going to be in Python3 land for the foreseeable future, so you don't have to worry about any more entries. Alvonruff (talk) 20:43, 24 May 2023 (EDT)

ErsatzCulture seems to have updated the book itself so now it has its ASIN and ISBN (and I just added the US-side ASIN as well). Not unusual for the Australian records to be missing - Fixer cannot get them until they show up on the US side and most of them don't or show up a lot later. So unless someone makes it a point to track them down, we usually get them when they get nominated or reissued US or UK side.

About the covers - I wonder if you were trying to grab the wrong link from Amazon. In the last year or so, Amazon changed their UI a bit (again) - the main picture in some books (more and more of them) is linked under a very weird address and not necessarily stable address and there is a smaller thumbprint just under the image which actually links cleanly to the proper address that keeps the picture - the main picture is a link to that one. As a rule, if the thumbprints are under the main cover, the main image is usually the weird address these days. Plus it needs to be cleaned from the formatting. If that is not it, if you remember what you were trying, we may be able to see what may have gone wrong.

PS: Glad to see you back around in any capacity. :) Annie (talk) 12:04, 25 May 2023 (EDT)

Banners

I created a banner a couple of years ago for the Japanese SF Magazine, but was told my attempt at writing Data Base in Japanese was wrong. They offered to do better but never got back to me. I have the banner .jpg and constituent elements if you want to take a crack.

I've also collected a few covers for Fleuve Noir, and cut out some elements from a half-dozen or so (.xcf format). It's a back burner project, but yet another attempt at internationalization. ../Doug H (talk) 12:11, 28 May 2023 (EDT)

Fixer, Windows 10 and isfdb2.org

A quick update re: Fixer, Python 3, MySQL, etc.

Microsoft expects to stop security patches for Windows 10 in 2025 and the current version of Fixer requires Windows 10. With all of the recently requested ISFDB features implemented, I plan to concentrate on migrating Fixer to Linux, Python and MySQL as the next big project. I hope to have it done long before the cut-off date, but you never know what you may run into.

Since you are working on upgrading the core ISFDB software to Python 3, I plan to use Python 3 for the Fixer rewrite. Could you please provide a copy of the new, Python 3-compatible, code that you are using on isfdb2.org so that I could use it for reference purpose as I design the new Fixer software and/or reset my isfdb2.or password? I tried to log onto isfdb2.org, but my password doesn't appear to be working any more. TIA! Ahasuerus (talk) 10:05, 2 August 2023 (EDT)

I have recovered my isfdb2.org password; looking at the Python 3 code now :-) Ahasuerus (talk) 17:07, 2 August 2023 (EDT)

I got Python 3 reading data from the core ISFDB tables under Cygwin. Next I'll create a separate MySQL database for Fixer and see how well it performs with Unicode tables. Ahasuerus (talk) 18:31, 2 August 2023 (EDT)

Since you have it working you probably don't need any pointers, but I'm working out of /home/alvonruff/Python3. The build system needs to be primed with a 'make python3' on a new system (which touches a .pythonver file). There's also a relatively recent copy on isfdb.org in /home/avonruff/Python3 (just in case). Alvonruff (talk) 22:04, 2 August 2023 (EDT)

Thanks for the tip! Ahasuerus (talk) 22:14, 2 August 2023 (EDT)

By the way, there is a file in the mod directory call fixup.py. To do a first pass at porting a file you can:

* Turn on the do_tabs variable, and set do_str and do_mysqldb to zero. Then run against the target file. It will produce a file called OUT, with all tabs converted to spaces.

* Run futurize -f print <file>. This will convert all the calls to print. Only.

* Turn off the do_tabs variable, and turn on the others. This will port all the SQL calls (mostly). You'll need to remove all instances of REMOVE_THIS_LINE, and then check if the emitted DB_FETCHMANY's should be DB_FETCHONE's. But otherwise it saves a lot of time. Alvonruff (talk) 22:49, 2 August 2023 (EDT)

Grabbed it, thanks!

So my current (still tentative) plan is to convert Fixer in stages:

Move Fixer's data from "data stores" to MySQL tables. At that point the Fixer software will be talking to MySQL using ODBC.
Rewrite the current version of the Fixer software in Python 3, which will eliminate Windows dependencies and make it possible for other developers to maintain/upgrade Fixer's code if something happens to me. The UI will still be "roll-and-scroll" at that point.
Upgrade Fixer's Amazon interface to support multiple concurrent users, which will require a single-threaded request queue due to Amazon throttling.
Upgrade the UI from "roll-and-scroll" to regular HTML.
Integrate the new HTML interface with the ISFDB code; leverage ISFDB/Wiki account authentication.
Deploy the new Fixer on isfdb2.org and make it available to moderators.

Based on the above, I plan to use the wrapper class (MYSQL_CONNECTOR) that you developed for ISFDB during stages 2-5. The difference is that:

I will be using Python 3 from the get-go, and
Fixer is already using Unicode

so the connection settings will be a bit different, but it shouldn't be a big deal.

Re: fixup.py and the Python 3 upgrade project, how far along would you say it is? It sounds like the heavy lifting has been done and it's just a matter of testing everything on isfdb2.org, right? Ahasuerus (talk) 08:27, 3 August 2023 (EDT)

The per-script status can be found here. /biblio is complete and tested on both python3 and python2. /edit is complete and tested on python3 and about 75% tested on python2. /mod is about 75% complete. /rest will go rapidly. I haven't cracked open /nightly or /scripts yet. Progress is slow lately, as I've been in physical therapy to regain the full use of my hands, so my typing time is somewhat limited. Nonetheless, after completing the rough port, we can begin checking in changes that do not require python3 (tabs/spaces, print(), the db interface changes), which is why we need to ensure that the changes work on both python2 and python3. After that, the total number of changes are small, so we should be able to integrate, then switch over to python3. We'll have the ability to switch back to python2 if we encounter some issue that requires some time to fix.

That said, I don't plan on checking in the whole mess all at once. But there's nothing stopping us from starting to integrate some of the changes. For instance the, tabs/spaces project can start now (which is a pretty good project itself, given that some of the indentation is very mixed and ambiguous to an automated tool, and it would be good to make that change in isolation). Alvonruff (talk) 07:04, 4 August 2023 (EDT)

Thanks for the update! Sorry to hear about the issues with your hands. On the plus side physical therapy can be very effective these days. Due to all the typing that I do, I have had some issues with my hands and shoulders over the last 10 years, but have been able to recover every time. Knock on wood.

Re: /nightly , most of it is pretty straightforward:

There are 330+ cleanup reports, but most of them use "standardReport" to talk to MySQL, so a single change should take care of the vast majority of reports.
Report 288 uses slightly different SQL syntax depending on whether you are running MySQL 5.x or MySQL 8.x; we can check the former on my development server, which is running 5.5 on Windows/Cygwin.
The "monthly" job, which looks for potential author duplicates, is currently disabled on the live server for performance reasons. The algorithm that I used doesn't scale well: what took hours to do when we had 50,000 authors takes days now that we have 250,000+ authors. There are ways to test it under Python 3, of course, e.g. by deleting all but 10,000 author records, but the algorithm really needs to be replaced before we re-enable the monthly job.
The code that runs nightly reconciliation with SFE uses urllib2, so it will need to be tweaked.
The code that rebuilds the list of front page pubs uses SQLparsing.py.
The code that updates database statistics uses 3 dozen custom SQL queries, which will need to be updated.

Re: "starting to integrate some of the changes" and making the tab/spaces change a separate sub-project, I'll be happy to assist any way I can. Just let me know how you want to structure it. Ahasuerus (talk) 12:08, 4 August 2023 (EDT)

Fixer's Amazon API and Python 3

Fixer's implementation of the Amazon API (built on top of Amazon's reference implementation) has been upgraded to work with both 2.7 and 3.9. Lessons learned:

At least under Cygwin, Python 3 uses a different default encoding for roll-and-scroll interfaces. I needed to run:
- sys.stdout.reconfigure(encoding='iso-8859-1')
when Fixer invoked Python 3 via a system call and redirected output to a file. Note that "reconfigure" was implemented in Python 3.7 and may not work in earlier versions. Everything may have worked out of the box if Fixer was using "file write" commands instead of simple "print" commands, but I don't plan to change it because the whole thing will be rewritten in native Python soon-ish.
Python 3 changed the way Python compares strings and integers. In Python 2, "1" > 1 was "True" because any "string" value was greater than any "int" value. In Python 3 it generates a TypeError. It's a good thing, but it threw me for a loop for a bit.

Ahasuerus (talk) 14:24, 7 August 2023 (EDT)

"mw_objectcache" in the ISFDB Wiki - abnormal growth

We have been struggling with an odd disk space issue for the last few weeks. Here is what I have found so far.

The ISFDB Wiki uses the MySQL table mw_objectcache as its caching mechanism. Apparently the choice of MediaWiki's caching mechanism is controlled by the value of $wgMainCacheType in /var/www/html/wiki/LocalSettings.php . We currently have it set to "CACHE_ANYTHING", which, according to MediaWiki documentation, translates to "CACHE_DB", i.e. "mw_objectcache", on our system:

CACHE_ANYTHING – Use $wgMessageCacheType or $wgParserCacheType if they are set to something other than CACHE_NONE or CACHE_ANYTHING. Otherwise use CACHE_DB.
CACHE_DB – Use the database table objectcache.

A few weeks ago I noticed that we were running low on disk space. Running:

sudo ls -lah /var/lib/mysql/isfdb|grep G

identified the culprit:

-rw-r----- 1 mysql mysql  2.2G Aug 28 13:47 mw_objectcache.MYD

Since I truncated mw_objectcache just a couple of days ago, it means that it is abnormally large and growing rapidly.

Running:

select DATE(exptime),count(*) from mw_objectcache group by DATE(exptime)

produced the following results [last run a couple of hours ago]:

+---------------+----------+
| DATE(exptime) | count(*) |
+---------------+----------+
| 2023-08-28	|	47160 |
| 2023-08-29	|	79942 |
| 2023-08-30	|  	115 |
| 2023-08-31	|  	110 |
| 2023-09-01	|  	102 |
| 2023-09-02	| 	2434 |
| 2023-09-03	|   	167777 |
| 2023-09-04	|	73630 |
| 2023-09-23	|   	10 |
| 2023-09-24	|   	49 |
| 2023-09-25	|  	670 |
| 2023-09-26	|	17734 |
| 2023-09-27	|	12892 |
| 2024-08-25	|   	87 |
| 2024-08-26	|   	97 |
| 2024-08-27	|   	99 |
| 2038-01-19	|   	58 |
+---------------+----------+

It looks like some Wiki processes which update mw_objectcache set the expiration time to 24 hours, while other processes set it to 1 week, 1 month or even 1 year. ("2038-01-19" is a known issue with JavaScript and doesn't happen often.)

I have checked the date/time stamp on LocalSettings.php and it hasn't been changed since December 2022, so it shouldn't be causing this issue. Would you happen to be aware of any other Wiki-related changes which may have happened over the last month?

If we can't identify and fix the problem then I guess we'll need to modify the daily backup script to TRUNCATE mw_objectcache right before the backups kick in. Ahasuerus (talk) 14:00, 28 August 2023 (EDT)

I don't think here has been any Wiki-related changes since the move. There have been a handful of similar complaints about mw_objectcache over the years elsewhere, but didn't find anything recent or very useful in terms of the root cause. Running the query on isfdb2 shows similar results:

mysql> select DATE(exptime),count(*) from mw_objectcache group by DATE(exptime);
+---------------+----------+
| DATE(exptime) | count(*) |
+---------------+----------+
| 2023-08-30    |     2465 |
| 2023-08-31    |     3821 |
| 2023-09-01    |       84 |
| 2023-09-02    |     2468 |
| 2023-09-03    |   107963 |
| 2023-09-04    |     1966 |
| 2023-09-05    |     4590 |
| 2023-09-06    |     2207 |
| 2023-09-23    |       14 |
| 2023-09-24    |       32 |
| 2023-09-25    |      637 |
| 2023-09-26    |    11788 |
| 2023-09-27    |      376 |
| 2023-09-28    |      837 |
| 2023-09-29    |      419 |
| 2024-08-25    |       89 |
| 2024-08-26    |       54 |
| 2024-08-27    |        1 |
| 2038-01-19    |       34 |
+---------------+----------+

The cache is also large there:

    -rw-r----- 1 mysql mysql 1139418628 Aug 30 10:11 mw_objectcache.MYD

Alvonruff (talk) 10:13, 30 August 2023 (EDT)

Thanks for checking isfdb2.org; I didn't think of that.

For now, I have modified the backup script to use "--ignore-table=mw_objectcache", which should take care of the immediate issue with the daily backup files getting large and not fitting on a single DVD disc. (I also back them up online, on a USB stick, on an SSD drive and on an external hard drive, but non-magnetic media that can't be easily erased is another line of defense.)

Curiously, checking the disk space, I see that mw_objectcache is only 77MB as of 5 minutes ago. I truncated it yesterday night and it should be much larger 18 hours later based on what we have seen over the last few weeks. I wonder if the changing growth pattern may be due to various Web spiders accessing different subsets of the MediaWiki Wiki pages which get cached with different expiration dates based on whatever algorithms the PHP code is using.

Oh well, something to keep an eye on going forward. Thanks for checking! Ahasuerus (talk) 13:12, 30 August 2023 (EDT)

DDOS protection

Yesterday (2023-09-11) the live site was targeted by malicious robots. They hammered the site for about an hour, then stopped. They returned this afternoon (2023-09-12) at 12:30pm.

Here are the specifics as far as I could tell:

A three-digit number of concurrent processes. It wasn't enough to make the site unavailable, but it slowed everything down to a crawl.
Originating IPs from multiple countries, including multiple Chinese provinces.
Random access to random Web pages, both on the Wiki side and on the database side.
All types of pages were accessed on the database side -- display, add, delete, clone, vote, unmerge, etc.

Coming on the heels of the LibraryThing and related DDOS attacks over the Labor Day weekend -- which included demands for ransom money -- this is disconcerting. We are not as big as LibraryThing, but the number of estimated monthly accesses to our site is within an order of magnitude. If a hobbyist site like LibraryThing can be attacked, other hobbyist sites may be in danger as well.

It's possible that the attackers were just poorly configured Web crawlers, but I would bet that they were fishing for vulnerabilities to DDOS and/or SQL injection attacks. Apparently you can buy a DDOS attack cheaply these days, but it still costs money, so the attackers needed to have a business plan as opposed to doing it on a whim. I suppose it's also conceivable that the attackers were state-sponsored actors who were looking for vulnerable sites to add to the list of sites to take down if and when the sponsoring state decides to pull the trigger, but the fact that they returned today makes it less likely. Not that it really makes a difference from our perspective.

Based on the above, I think it makes sense to look into possible countermeasures. The fact that we are small fish in a big ocean may no longer provide us with the same kind of protection that it did in the past. Cloudfare is apparently the leading provider in this area and their introductory plans are not too expensive, but the devil is in the details. Perhaps other companies may be a better match for us. I know little about this area, so I am just mentioning what I have seen so far. Ahasuerus (talk) 13:27, 12 September 2023 (EDT)

Death's Head

You are PV1 for this title: https://www.isfdb.org/cgi-bin/pl.cgi?97321 I just submitted the following edit: Submitting two external IDs (LCCN & Goodreads), three webpage reviews, & added material to the pub note. Cheers. Mike (talk) 22:51, 7 October 2023 (EDT)