The PanLex database begin in 2005 as the TransGraph database at the University of Washington Turing Center, managed as a Microsoft SQL Server database. In 2006 Utilika Foundation began the PanLex branch, converting the SQL Server database to a PostgreSQL database. The host for the PostgreSQL server was a Red Hat Enterprise Linux (RHEL) server housed in the office of the foundation. Utilika Foundation contributed it to The Long Now Foundation in 2012, and the server remained in operation, running under RHEL 5 by 02012.
This page documents the installation, configuration, and maintenance of PostgreSQL and of the PanLex database on a local server.
The version of PostgreSQL that is currently hosting the database is 9.0.1. This version is more recent than the one provided with Red Hat Enterprise Linux 5.
We built and installed it from source files. Two other installation modes, using a pre-built binary package in the PostgreSQL Yum repository and using Postgres Plus published by Enterprise DB, were deemed too poorly documented.
We installed PostgreSQL in the standard directory, /usr/local/pgsql. We used the following commands, which elect optional Perl support and specify a particular version of Perl (likewise more recent than the RHEL 5 version):
cd /var/local/packages/distributions/postgresql-9.0.1 pg_dumpall -U postgres -f dumpall20101213 gmake distclean configure --with-perl PERL='/opt/perl/bin/perl' gmake world (Output ends with “PostgreSQL, contrib, and documentation successfully made. Ready to install.) sudo -u postgres gmake check (Output ends with “All 122 tests passed.”) gmake install-world (Output ends with “PostgreSQL, contrib, and documentation installation complete.”) gmake clean
The documentation in section 15.6 of the PostgreSQL documentation on shared-library optimization advises executing the command “/sbin/ldconfig /usr/local/pgsql/lib”. The alternative that we have implemented is to put “/usr/local/pgsql/lib” into a file named /etc/ld.so.conf.d/postgresql-9.0.1.conf, and then to execute “ldconfig”.
Environment variables are defined in /etc/profile:
PGDATA=/usr/local/pgsql/data PGPORT=5432 PATH=/usr/local/pgsql/bin:$PATH:/var/local/utils MANPATH=/opt/perl/man:/opt/python2.6/share/man:/usr/local/pgsql/share/man:/usr/kerberos/man:/usr/local/share/man:/usr/share/man/en:/usr/share/man:/usr/share/locale/man
The server’s main purpose is to host the PanLex database, which is not locale-specific. Locales other than the generic POSIX locale have a “performance impact” according to the PostgreSQL documentation on cluster creation. Therefore, the database cluster’s locale is set to POSIX. Several locale categories can be overridden, but we do not override any of them. For example, we leave the LC_COLLATE category as POSIX, partly because any other string-sort order imposes “a performance penalty” according to the PostgreSQL documentation on “initdb”.
The database cluster’s default encoding, if not set, is derived automatically from the cluster’s locale by the “initdb” program. The encoding derived from the POSIX locale is SQL_ASCII. However, PostgreSQL documentation advises (in section 22.2) against using SQL_ASCII as the encoding of a database containing non-ASCII data. It doesn’t assert that SQL_ASCII offers any performance advantage. Accordingly, starting with PostgreSQL version 9.0.1 we specify UTF8 as the default encoding of the database cluster when running “initdb”, making UTF8 the default encoding of any new database. The cluster-creation procedure was:
su postgres /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data -E UTF8 --locale=POSIX
The path to “initdb” was specified because an older version of PostgreSQL still existed, with its version of initdb, in a different directory.
The default encoding before PostgreSQL 9.0.1 was SQL_ASCII, and this was also the encoding of all databases in the cluster. Our upgrade to PostgreSQL 9.0.1 required the conversion of all of the databases from SQL_ASCII to UTF8 encoding. We performed this conversion by executing pg_dumpall (in its latest version) to create a database-recreation script for all databases in the (previous) cluster, and then editing the script to change the encoding and the client encoding of each database to UTF8.
As documented in section 17.4 ("Linux Memory Overcommit") of the PostgreSQL documentation, we encountered out-of-memory crashes in PostgreSQL until we modified the /etc/sysctl.conf file to set the vm.overcommit_memory parameter's value to 2.
In 2010 we discovered that both the old and the new versions of PostgreSQL crashed with out-of memory errors when executing some index-creation commands, if both versions were running simultaneously. Investigation by PostgreSQL Experts produced the advice to increase the swap space from 2 GB to 8 GB. To accomplish this, we created a 6GB swap file. Nonetheless, we found both versions crashing when we next tried to install a later version (9.1.5) in parallel with 9.0.1.
Per section 17.3 of the PostgreSQL documentation, we (as root) copied contrib/start-scripts/linux from the source files into /etc/rc.d/init.d/postgresql-901, changed its permissions with “chmod 755 /etc/rc.d/init.d/postgresql-901”, and executed “chkconfig --add postgresql-901”, in order to cause the host to launch PostgreSQL automatically on each reboot. To stop the previous version of PostgreSQL from launching automatically, we removed its script from the init.d directory and the symbolic links to it from the runlevel directories where they existed.
To launch a new version of PostgreSQL while the old version is still running, we executed the shell command “PGPORT=nnnn pppp/bin/pg_ctl start -D dddd/data”, where “nnnn” was the number of the new version’s port, “pppp” was the path to the new version’s “bin” directory, and “dddd” was the path to the new version’s “data” directory.
We have attempted to upgrade to a new version of PostgreSQL by installing it in parallel with the current version and running them both until we have adequately tested the new version. This strategy has caused installation failures and post-installation crashes, as well as administrative difficulties. A reasonable conclusion is that this strategy should be avoided.
The parallel strategy is illustrated by the following description of our attempted upgrade from version 9.0.1 to version 9.1.5 in August 02012. The installation strategy is use of the PostgreSQL Yum Repository. It is documented by PGDG. If it fails, we resort to installation from source files.
sudo yum install pgdg-redhat91-9.1-5.noarch.rpm
.sudo yum install postgresql91-server postgresql91-contrib postgresql91-docs postgresql91-plperl
. The files are installed in /var/lib/pgsql/9.1, in /usr/pgsql-9.1, and (with a version-specific name) in /etc/rc.d/init.d. Thus, they don’t overwrite the currently running version./usr/pgsql-9.1/bin/initdb -D /var/lib/pgsql/9.1/data -E UTF8 --locale=C > /var/lib/pgsql/9.1/pgstartup.log
. (The documentation advises using the command service postgresql-9.1 initdb -E UTF8 --no-locale
for this, but it ignores the “no-locale” switch.)cp -p postgresql.conf postgresql-std.conf
. Then replace the original’s content with the cluster’s custom dual-operation main configuration, shown below.cp -p pg_hba.conf pg_hba-std.conf
. Then replace the original’s content with the cluster’s custom access configuration, shown below.cp -p pg_ident.conf pg_ident-std.conf
. Then replace the original’s content with the cluster’s custom name-mapping configuration, shown below.chkconfig postgresql-9.1 on
.mv /etc/ld.so.conf.d/postgresql-9.0.1.conf /etc/ld.so.conf.d/postgresql-temp-9.0.1.conf
; (b) ldconfig
; (c) mv /etc/ld.so.conf.d/postgresql-temp-9.0.1.conf /etc/ld.so.conf.d/postgresql-9.0.1.conf
. Without these, the upgrade fails 2 steps below, when the attempt to launch PostgreSQL 9.1 succeeds in starting the PostgrSQL processes, but also elicits the error message “/usr/pgsql-9.1/bin/pg_ctl: symbol lookup error: /usr/pgsql-9.1/bin/pg_ctl: undefined symbol: PQping”.sudo -u postgres /usr/pgsql-9.1/bin/pg_ctl start -w -D /var/lib/pgsql/9.1/data -l /var/lib/pgsql/9.1/start.log
. (That log file is created, showing the time, but has no content.)makewhatis -w
to index the man pages on the MANPATH. Edit /etc/httpd/conf/httpd.conf to include Alias /pg91doc "/usr/share/doc/postgresql91-docs-9.1.5/html"
. (If you have a client browser bookmark for this documentation, edit it to point to http://panlex.org/pg91doc
.)cp -p opt/www/main/cgi-bin/plxu.cgi opt/www/main/cgi-bin/plxu29.cgi
. Edit its definitions of $plvb to '9' and $act to '/cgi-bin/plxu29.cgi'. Edit the SRImp subroutine in that script to define the PostgreSQL listening port as 54321. Create the new version’s library of input and output routines with the command cp -Rp opt/www/main/local/panlex/u/28 opt/www/main/local/panlex/u/29
. Change the ownership of the directory and its files with the command chown -R apache:wheel opt/www/main/local/panlex/u/29
.su postgres
and /usr/pgsql-9.1/bin/pg_dumpall -p 5432 | psql -d postgres -p 54321
.If the new version of PostgreSQL has different directories and a different port from the old version, other files must be edited. These may include:
The custom configuration of the database cluster in which PanLex resides is contained in three files. The configuration shown below is that for PostgreSQL version 9.0.1. The configuration for version 9.1.5 is identical, except that the value of the “port” parameter is 54321. The release notes for version 9.1 apparently require or recommend no configuration changes.
There are 2 versions of “postgresql.conf”. One is for solo operation (only 1 version of PostgreSQL running, and the other is for dual operation (2 versions running in parallel).
# See postgresql-std.conf for embedded documentation and other options. # This version is tuned for 32 GB of RAM. authentication_timeout = 15s # Per J Berkus, 2009-10-13 lecture. autovacuum = off # Per J Berkus, 2010-12-02 message. # When autovacuum was on, other autovacuum settings were: # autovacuum_analyze_scale_factor = 0.05 # autovacuum_naptime = 10800 # autovacuum_vacuum_scale_factor = 0.2 checkpoint_segments = 32 # 16 per J Berkus, 2009-10-13 lecture. # Increased because of occasional log warnings of checkpoint intervals about 15 seconds. constraint_exclusion = off # Per J Berkus, 2009-10-13 lecture. cpu_index_tuple_cost = 0.001 # Per J Berkus, 2009-10-13 lecture. cpu_operator_cost = 0.0005 # Per J Berkus, 2009-10-13 lecture. datestyle = 'iso, ymd' effective_cache_size = 24GB # effective_cache_size to be 0.667 of RAM # J Berkus, 2009-10-13 lecture, recommends 0.75. escape_string_warning = 0 lc_messages = 'POSIX' lc_monetary = 'POSIX' lc_numeric = 'POSIX' lc_time = 'POSIX' listen_addresses = 'localhost,70.36.157.216' log_destination = 'syslog' log_error_verbosity = default log_min_duration_statement = 10000 log_min_error_statement = warning log_connections = off log_line_prefix = '%h ' max_connections = 30 max_prepared_transactions = 30 max_stack_depth = 8192 port = 5432 # statement_timeout = 30min # Per J Berkus, 2009-10-13 lecture. # Apparently caused function execution to abort. synchronous_commit = off # Per J Berkus, 2009-10-13 lecture. syslog_facility = 'LOCAL0' syslog_ident = 'pg' track_functions = pl # Per J Berkus, 2009-10-13 lecture. vacuum_cost_delay = 0 # VCD = 1000ms slowed vacuuming enormously. # J Berkus, 2009-10-13 lecture, recommends VCD less than 100ms. wal_buffers = 8MB # Per J Berkus, 2009-10-13 lecture. ############################################################### # SOLO OPERATION ############################################################### maintenance_work_mem = 2GB # maintenance_work_mem to be 0.0625 of RAM # temporarily doubled to diagnose out-of-memory failures shared_buffers = 8GB # shared_buffers to be 0.25 of RAM work_mem = 512MB
Same as above, except replace the section starting “SOLO OPERATION” with this section:
############################################################### # DUAL OPERATION ############################################################### maintenance_work_mem = 1GB # maintenance_work_mem to be 0.0625 of RAM # normally 2GB; divided by 2 for dual-PostgreSQL operation # can be temporarily doubled to diagnose out-of-memory failures shared_buffers = 4GB # shared_buffers to be 0.25 of RAM # normally 8GB; divided by 2 for dual-PostgreSQL operation work_mem = 256MB # normally 512MB; divided by 2 for dual-PostgreSQL operation
# See pg_hba-std.conf for documentation. # TYPE DATABASE USER CIDR-ADDRESS METHOD # "local" is for Unix domain socket connections only # <CUSTOM> local all all trust local all postgres ident map=postgres local all apache ident map=apache local all pool ident map=pool # </CUSTOM> local all all md5 # IPv4 local connections: # <CUSTOM> host all all 127.0.0.1/32 trust host all postgres 127.0.0.1/32 ident map=postgres # </CUSTOM> host all all 127.0.0.1/32 md5 # IPv6 local connections: host all all ::1/128 md5 # <CUSTOM> # IPv4 remote connections: host all all 0.0.0.0/0 md5 # </CUSTOM>
# See pg_ident-std.conf for documentation. # MAPNAME SYSTEM-USERNAME PG-USERNAME # <CUSTOM> apache apache apache apache pool apache apache root apache postgres postgres postgres postgres pool postgres postgres root postgres pool apache pool pool pool pool # </CUSTOM>
Our automatic periodic duplication process makes copies of our active databases weekly. They are saved in “/var/local/archives/dbms”, then combined and compressed into a file in “/var/local/archives/general”. We copy the compressed archives to our workstation and periodically onto removal storage drives for off-site deposit.
We can specially archive any particular database named “dbname” to a file named “fname” by executing the command “pg_dump -f fname dbname” as the user “postgres”.
To restore any database from its archived file, we must (at least during the restoration) make the database’s owner a superuser in the database cluster. That user makes the “/var/local/archives/dbms” directory current, then deletes the database with the “dropdb” command. The user then executes the “createdb -T template0” command, appending the database name as the last argument. The user then executes the “psql” command, giving the name of that database as the command argument, e.g. “psql ac”. If we are restoring the database to a fresh database cluster, we then assure that any users with privileges on any objects in the database already exist in the cluster; for this purpose, the user issues the applicable statements. For all databases, they are “create role pool login createdb” and “create role smc login”. For “panlex” and “ac”, they also include “create role apache login”. For “cal”, they also include “create role webcalng login”. Finally, the user, still in psql, issues the “\i” command, giving the file name as the argument, e.g. “\i ac_dump”. (The dump file contains a create-language command that only a superuser is permitted to execute.) The confirmation messages after this command should contain no warnings and no errors. This procedure is the only one that seems to work, among those described in the PostgreSQL documentation.
The current version’s PostgreSQL documentation is accessible via an HTML client at "http://utilika.org/pgdoc". We have provided for this access in "/etc/httpd/conf/httpd.conf". If the directory containing the documentation changes, and we amend httpd.conf accordingly (in 2 places where the string "postgresql" appears), after which we execute "apachectl restart".