PanLex: User interfaces

Introduction

The development of user interfaces is a secondary purpose of the PanLex project, which mainly seeks to enlarge, enrich, and correct the PanLex database for various user publics, including developers of user interfaces that will make use of PanLex. However, PanLex participants have developed some prototype UIs for research, administrative, and demonstration purposes. Interfaces available for use by the public are described at, and linked to from, the “Try it” page on this site. The purposes and features of these interfaces are described below.

PanImages

The first UI for the original versions of PanLex was PanImages, developed by the Turing Center at the University of Washington. It was operational until 02010. It used the database to expand queries from monolingual to multilingual and submit them to Google and Flickr for image searching. Although it can no longer be demonstrated, there is a description of its functionality.

PanLem

One of the internally developed UIs, PanLem, has two main purposes. One is to provide access to most functionalities of the database required by those who are developing content in it. The other is to test PanLex as a platform for automated panlingual linguistic localization.

Numerous queries to discover facts in the database and to make permitted modifications to the data can be performed in PanLem. These include requests for attested and inferred lexical translations. A PanLem user can find an expression to translate by entering it exactly or approximately. The translation inference and fuzzy matching offered by PanLem are basic features based on simple algorithms. For example, fuzzy matches for languages with complex scripts, such as Indic languages, are especially nonoptimal. Some reasonably desirable operations are beyond PanLem and require access to the database through a PostgreSQL client (generally psql).

A user starting a session with PanLem may (in fact, must) choose any language variety registered in PanLex as the language variety of the interface. PanLem then attempts to translate all labels displayed during the user’s session into that variety.

Labels in PanLem are coded as expressions in the artificial language variety art-000 (PanLex). PanLem uses the database to translate these labels into the user’s variety. Translations between all the art-000 expressions and several other varieties are present in the database. PanLem uses one of those if the user’s variety is one of those varieties. Otherwise, PanLem seeks indirect translations into the user’s variety. If PanLem fails to find a direct or indirect translation, it displays the art-000 expression in an italic face.

This constraint on PanLem labels provides an opportunity for PanLem developers to test the limits of lemmatic communication, described in a 02010 article by Everitt et al..

PanLinx

Another UI, PanLinx, seeks to provide access to much of the content in the database through hyperlinks alone. PanLem uses almost entirely forms with button elements to allow the user to choose what to see or modify. PanLinx, by contrast, uses no forms and no buttons. The intent is to make PanLex data accessible to robots that crawl the web following hyperlinks. If this method works, a person submitting an expression’s lemma to a search engine may be directed to a PanLinx page documenting that expression. The page in PanLinx that documents any expression contains a list of all of the direct translations in PanLex of that expression.

While PanLem is potentially panlingual, PanLinx is (at least arguably) nonlingual, so it doesn’t require localization.

As of 2012, PanLinx provides routes to (as of 02013) about 19 million pages, each describing an expression. One could imagine a version of PanLinx with 2 levels: (1) an initial level, where the client receives a page with links pointing directly to expressions, and (2) a terminal level, where the client receives a page describing one of those expressions. Instead, PanLinx is implemented with 4 levels:

initial
group
subgroup
terminal

In this implementation, the expressions are ordered according to the Unicode codepoints of their “labels”. A label is the expression’s degraded and then truncated (to 15 characters) text. For example, the expression “legsötétebb nyomorban él” has the label “legsotetebbnyom” and is ordered accordingly. The ordered expressions are partitioned into groups, which in turn are partitioned into subgroups. This dual partitioning is implemented so that each subgroup contains about the cube root of the total count of expressions and each group contains about the same count of subgroups. With the 02013 total of about 19 million expressions, each group and each subgroup contains about 265 elements.

At the initial level, PanLinx displays a static page, on which the user chooses one group. At the group level, PanLinx displays a generated page, on which the user chooses a subgroup. At the subgroup level, PanLinx displays a generated page, on which the user chooses an expression. At the terminal level, PanLinx displays a generated page describing the expression, and in particular the expression’s text (lemma), its language variety, and the language varieties and texts of all direct translations of the expression that are found in PanLex.

The texts of the links at the initial and group levels are descriptions of the ranges of expressions of the groups and subgroups, respectively. For example, at the initial level, one of the group links may have the text “digitalnoracuna … distributedpara”. This link refers to a group containing (via its subgroups) all expressions with labels within this range. If the user selects this group, a group-level page is displayed, showing subgroups with ranges of expressions within this range, such as “dilouedin … diluentmaterial”. When the user chooses one such subgroup, a page at the subgroup level is displayed, showing a list of all the expressions in the subgroup with their true (non-degraded, non-truncated) texts, such as “Türkçe | tur-000 | dil topluluğu”. When the user chooses one such expression, a page at the terminal level is displayed, showing that expression and its translations, such as “한국어 | kor-000 | 언어 모음”. The text of each translation is, in turn, a link to the terminal-level page for that expression, so the user can follow translations to their own translations indefinitely.

Twice a week, a daemon refreshes the partition of the expressions into groups and subgroups, records the results in the “td” table, and regenerates a file that constitutes the body of the PanLinx initial-level page. The “td” table is defined as:

 Schema | Name | Type  | Owner  |  Size   |    Description    
--------+------+-------+--------+---------+-------------------
 public | td   | table | apache | 4560 kB | PanLinx subgroups

                                                  Table "public.td"
 Column |   Type   |                    Modifiers                    | Storage  |            Description             
--------+----------+-------------------------------------------------+----------+------------------------------------
 id     | integer  | not null default nextval('td_id_seq'::regclass) | plain    | ID of the subgroup
 gp     | smallint | not null                                        | plain    | ID of the subgroup’s PanLinx group
 tdbeg  | text     | not null                                        | extended | initial label
 tdend  | text     | not null                                        | extended | final label
Indexes:
    "td_pkey" PRIMARY KEY, btree (id)

The purpose for the use of degraded and truncated labels on the initial-level and group-level pages is to simplify the user’s task of finding expressions, in cases in which the user is human rather than a search engine. Many complex distinctions in orthography are eliminated. For example, “Co-op”, “co-op”, “coöp”, and “coop” are all ordered and labeled as “coop”.

Selecting any link on any PanLinx page invokes the script “plxl.cgi”, which computes and generates the next page.

Unilingual interfaces

Three interfaces, TeraDict, TümSöz, and InterVorto, are prototypes that make use of the prototype PanLex API. By developing three different versions differing only in source language variety, we can explore how to make localizability of an API-based UI scalable. These interfaces require the translation of only a few sentences.