Apertium
A free/open-source machine translation platform
What is Apertium?
Apertium is a free/open-source machine translation platform, initially aimed at related-language pairs but recently expanded to deal with more divergent language pairs (such as English-Catalan). The platform provides
- a language-independent machine translation engine
- tools to manage the linguistic data necessary to build a machine translation system for a given language pair and
- linguistic data for a growing number of language pairs.
Apertium uses a shallow-transfer machine translation engine which processes the input text in stages, as in an assembly line: de-formatting, morphological analysis, part-of-speech disambiguation, shallow structural transfer, lexical transfer, morphological generation, and re-formatting.
Apertium uses finite-state transducers for all lexical processing operations (morphological analysis and generation, lexical transfer), hidden Markov models for part-of-speech tagging, and multi-stage finite-state based chunking for structural transfer.
The initial design was largely based upon that of systems already developed by the Transducens group at the Universitat d'Alacant, such as interNOSTRUM (Spanish-Catalan, and Traductor Universia (Spanish-Portuguese).
It is possible to use Apertium to build machine translation systems for a variety of language pairs; to that end, Apertium uses simple XML-based standard formats to encode the linguistic data needed (either by hand or by converting existing data), which are compiled using the provided tools into the high-speed formats used by the engine.
Who is developing it?
The Apertium architecture is being developed by the Transducens research group at the Departament de Llenguatges i Sistemes Informàtics of the Universitat d'Alacant in collaboration with Prompsit Language Engineering.
Linguistic data for Apertium are being developed by Transducens and Prompsit
- Spanish–Catalan
- Spanish–Portuguese
- Catalan–French
- Occitan–Catalan
- English–Catalan
- Spanish–Galician
- Occitan–Catalan
- English–Catalan
- French–Catalan
Many other developers are also generating new pairs. You can browse the SVN repository to see a snapshot of current development.
Apertium welcomes new developers: if you think you can improve the engine or the tools, or develop linguistic data for us, do not hesitate to contact us.
Funding
Apertium is one of the two open-source machine translation engines whose development started inside project OpenTrad (2004–2005); Apertium is designed to translate between related languages. The OpenTrad consortium was led by Eleka Ingeniaritza Linguistikoa.
The Opentrad project ("Open Source Machine Translation for the languages of Spain") was funded by the Ministry of Industry, Tourism and Commerce of Spain through PROFIT grant numbers FIT340101-2004-0003 and FIT340001-2005-0002.
More recently, Apertium has also received funding from the Secretariat for Telecommunications and Information Society of the Generalitat de Catalunya (the government of the autonomous community of Catalonia in Spain) to develop new language pairs and an improved architecture (Apertium 2.0) to include more difficult pairs such as English–Catalan, and again from the Ministry of Industry, Tourism and Commerce of Spain through PROFIT grant number FIT350401-2006-05 (EurOpenTrad).
Project Trautorom ("Romanian-Spanish machine translation", package
apertium-es-ro)
has been funded by: the Universitat d'Alacant through
its Department of Computer Languages and Systems and the Office of the Vice-President
for Extracurricular Activities (Vicerrectorado de Extensión Universitaria) and
by the Romanian
Ministry of Foreign Affairs.
Parts of Apertium have also been funded by the Universitat d'Alacant.