PanLex: Source analysis

Process summary

PanLex editors consult sources to discover lexical translations and then add such translations, giving credit to their sources, to the PanLex database.

This task entails understanding the structures of the consulted works, which are often complex and/or inconsistent. If an editor can analyze a work’s structure and thereby convert the PanLex-relevant data in the work into a format that permits the programmatic insertion of those data into the database, the task can be accomplished efficiently. Potentially, generalizations about the structures of lexicographic works could be discovered and the analysis of their structures thereby partly automated, for even greater efficiency.

Specifications

Routines have been developed for the structural validation and ingestion of data in three formats. A file in any of these formats can be uploaded into the database by means of the PanLem interface. The typical implementation of an editor’s structural-analysis task involves extracting PanLex-relevant data from a consulted work, modifying the data as the editor judges appropriate, formatting the modified data as a file in one of these formats, and then uploading that file.

The three file formats’ specifications are:

Simple text
Full text
PanLex XML

Reports

Problems and methods associated with the structural-analysis task have been discussed in these reports:

Timothy Baldwin, Jonathan Pool, and Susan M. Colowick, “PanLex and LEXTRACT: Translating all Words of all Languages of the World”
Jonathan Pool, “Sourcing in PanLex”
Andréa Davis, “Formatting Data for PanLex”

PanLex: Source analysis

Home Goal Technology Research Try it Help us People Partners Contact

Process summary

Specifications

Reports