This document describes a procedure for creating an AWS instance on which a copy of the PanLex database (the official copy or a testing copy) can reside.
The following actions must precede the creation of the instance. If you have already created another database instance, you don’t need to perform these again. If you have created only non-database instances, perform only the actions that you haven’t yet performed.
An instance can have local storage and one or several EBS volumes. We experimented with two-volume instances, with a standard root EBS volume and a separate database EBS volume. Managing these was somewhat complex and led to configuration errors. They potentially offer the benefit of easy database migration among instance types, but we have been unable to demonstrate that ease, so we have chosen to create a single-volume instance instead.
Using the AWS EC2 Management Console, navigate to “Instances” and create, configure, and launch a new EC2 instance, using the Classic Wizard interface. Choose the selected Ubuntu server AMI, an instance type, “us-west-2a” Availability Zone, Termination Protection, root volume, a volume size sufficient for the operating system, the database, and temporary archives (in early 02013 we found 70 GiB sufficient), nondeletion on termination, “DB Server” as the value of the “Name” tag, any other desired tag, the desired key pair, and the “db” security group.
Navigate to “Elastic IPs” and associate a (public) IP address with the instance that you have just launched. If you are reusing an IP address that is already allocated to you, this step is complete. If, however, you are getting a new IP address allocated, this step requires further actions, as follows:
Host host.domain.tld n.n.n.n domain.tld n.n.n.n IdentityFile ~/.ssh/keypairname.pemReplace the items in the “Host” line with the URLs and public IP addresses of the AWS server instances that you want to use the selected key pair to connect to. For example, if the database server will be “db.panlex.net” and the other servers will be accessed via “panlex.net”, use those URLs and their public IP addresses. Replace “keypairname” with the body of the name of the selected key pair.
Connect to the server from a terminal (command-line) client on a local workstation with SSH, as the default administrative user of Ubuntu EC2 instances, namely “ubuntu”, thus with the command ssh ubuntu@host.domain.tld
(e.g., ssh ubuntu@db.panlex.org
).
On the first connection, note and follow any advice to run apt-get to update the server’s packages, with the commands sudo apt-get update
and sudo apt-get upgrade
. Even if all packages are up-to-date, execute sudo apt-get update
and sudo apt-get upgrade
.
Populate the locations database with the command sudo updatedb
.
Change the default locale to POSIX by editing, as the root user, the /etc/default/locale file to make the value of LANG POSIX. If you don’t, the database cluster will permanently have en_US.UTF-8 as its locale (LC_CTYPE and LC_COLLATE), and collation will be partly more intuitive but partly bizarre.
You now have a running EC2 GNU/Linux server that is web-accessible.
There are 2 main PostgreSQL installation strategies:
We have experimented with both strategies, and they have different advantages. Our current strategy is to conform to the Debian standard.
If the PostgreSQL version that is automatically available with the current version of the OS is not the latest, it is possible to expend extra effort and install the latest version, either from an alternative (PPA) repository or from the source. Currently we choose to install PostgreSQL with the automatically provided repositories. In early 02013, this implied installing PostgreSQL 9.1 rather than 9.2.
While still connected to the instance with SSH as the “ubuntu” user, install the required PostgreSQL packages with the command sudo apt-get --fix-missing install postgresql postgresql-plperl
. Check the output for any error messages and, if any appear, for advice on how to cure the errors.
The “apt-get” output reports that these packages are being installed:
This command:
ps aux | egrep postg
)While still connected to the instance as the “ubuntu” user, make that user’s directory the current directory with the command cd
and create directories for files that you will create and edit, related to the database, as follows.
For the main directory, use the commands mkdir pgsql
, sudo chgrp postgres pgsql
, and chmod 770 pgsql
.
For temporary files, use the commands mkdir pgsql/temp
, sudo chgrp postgres pgsql/temp
, and chmod 770 pgsql/temp
.
Within the pgsql/temp directory, create a file to collect error reports, with the commands touch pgsql/temp/errors.txt
, sudo chgrp postgres pgsql/temp/errors.txt
, and chmod 660 pgsql/temp/errors.txt
.
For durable custom files, use the commands mkdir pgsql/custom
, sudo chgrp postgres pgsql/custom
, and chmod 770 pgsql/custom
.