logo

PanLex: Maintenance

Periodic maintenance

The “ubuntu” user performs periodic maintenance on the database with the following “crontab” file. The “cron” utility is used for this purpose because PostgreSQL does not provide a method for periodically repeated execution of functions.

MAILTO = ''

#
# Jobs are spaced apart and executed mostly at night in Pacific Time Zone
#
# DATABASE PERFORMANCE TUNING
#
# Vacuum and analyze all databases every Sunday, supplementing programmatic operations.
4 7 * * 0 sudo -u postgres vacuumdb -az
#
# DATABASE CONTENT CORRECTION
#
# Delete all PanLex orphan expressions every day.
34 7 * * * psql -U pool -c 'select * from exrm ()' plx
# Repopulate tables pl0 and pl1 for PanLem twice a day.
54 7,19 * * * psql -U pool -c 'select * from pl ()' plx
# Repopulate table exap every 4 hours if it has recently become obsolete.
4 */4 * * * psql -U pool -c 'select * from exapw (243, 3)' plx
# Repopulate table lvap every 4 hours if it has recently become obsolete.
34 */4 * * * psql -U pool -c 'select * from lvapw (243, 3)' plx
# Add data to the PanLex history table at the beginning of every month.
4 9 1 * * psql -U pool -c 'select * from molvs ()' plx
#
# PANLINX CONTENT CORRECTION
#
# Regenerate the group and subgroup tables of PanLinx every Sunday.
4 9 * * 0 psql -U pool -c 'select * from plxl ()' plx
#
# ARCHIVE CREATION
#
# Create an archive of large PanLex tables every day for on-site cacheing, replacing the previous archive. Takes 11 minutes.
34 9 * * * sudo -u postgres pg_dump -a -t dn -t ex -t mn -t lv -t ap -Z 9 -f /home/ubuntu/pgsql/temp/plxcachedump.gz plx
# Cache large PanLex indices every day on-site.
54 9 * * * psql -U pool -c 'select * from ixck ()' plx
# Create an archive of the entire database cluster every day for on-site security,
# replacing the archive created a week earlier. Takes 13 minutes.
04 11 * * * sudo -u postgres pg_dumpall | bzip2 > /home/ubuntu/pgsql/temp/dumpall`date +\%w`.bz2
# Archive the nonregeneratable PanLex tables at the beginning of every month for compact
# off-site security. Takes 13 minutes.
24 11 1 * * sudo -u postgres pg_dump -n dev -n hist -n public -n sec -n util -T dfid -T dmid -T dnid -T exap -T exid -T log -T mdid -T mnid -T pl0 -T pl1 -T td -T tdg -T wcid -Z 9 -f /home/ubuntu/pgsql/export/plxmain-`date +\%F`.sql.gz plx
# Archive and truncate the log of table changes every Saturday for compact off-site security.
44 11 * * 6 sudo -u postgres psql -c 'select * from util.logrot ()' plx

Performance tuning

The weekly vacuum-and-analyze operation assures that the query planner’s statistics catch up to changes that have taken place in tables.

Content correction

It is possible for expressions in the “ex” table to become orphaned, namely to become unused in the “dn”, “lv”, and “dm” tables. Such expressions are considered useless. Once a day, they are discovered and deleted by the “exrm ()” function.

The PanLem interface makes use of tables “pl0” and “pl1” in order to accelerate localization. These tables are regenerated once a day by the “pl ()” function, rather than upon each change in a table on which they depend, for the sake of database performance.

The security archive “plxsecdump.gz” is deleted every week rather than rotated, on the assumption that another project host each week will download a copy of it soon after it is created and preserve it according to the project’s security policy.

The command that repopulates the PanLinx tables does not perform all of the periodic maintenance that PanLinx requires. After the tables are repopulated, a daemon, “plxl.pl”, on the webserver that runs the PanLinx interface uses the “tdg” table to recreate the PanLinx start page.

Valid XHTML 1.1!