The Open Internet Lexicon (OIL) is an initiative to build a dictionary of Web terms (words and short phrases) in many languages. Our goal is to reflect current Internet and Web usage in many countries. The dictionary will be open for all who are building multilingual web sites or single-language web sites. Look at what we have done so far.

The combination of a language and a country is known as a "locale," e.g., Portuguese-Brazil or French-Canada. The Open Internet Lexicon initiative is looking for skilled translators in each locale who would like to work as "localizers." We hope to have at least one person in each locale translating a large collection, perhaps a few thousand, words and phrases.

Are you a native speaker of a language not represented here? Are you a resident of a new locale, or do you regularly surf the web there? Would you like us to add that locale and become its localizer? Register with Open Internet Lexicon today. Apply to join us and help us internationalize the web.
Localizers should be fluent in English. Usage comments for a particular phrase are written in English. They must also know the idiomatic language of the locale, of course. The final qualification is that they must be familiar with current Internet and Web terminology in that locale. Preferably they reside in the locale, and have a good Internet connection there.

The Open Internet Lexicon runs on a database-backed web site using skyBuilders.com timeLines technology. skyBuilders webWare has an interface in which every text element - button labels, column headings, row names, hyperlink text, captions, etc. - is stored in the database in languages for multiple locales. The multilingual text tables are part of the industry-standard Open DataBase Model (ODBM). For further information, go to www.opendatabasemodel.com.

Commentaries, criticisms, and suggestions are welcome for new words and phrases which may be needed by web developers who want their software to operate in many countries. Web developers and software developers are free to use the resulting text in their work. They may download the Open Internet Lexicon terminology database in a spreadsheet or text file format. Various database formats are also available, as well as a set of SQL commands that create and populate the database tables.

Open Internet Lexicon is no substitute for good dictionaries, machine translation programs, and human translators. It is simply a database repository for the short phrases that are needed on a web page to label buttons and indicate actions. The shorter the phrase the better. Particular words are likely to be highly idiomatic, even with the intense pressure to standardize web terminology around the world.
Although many icons are by their nature independent of language, our Icon Lexicon will collect those that need to be different in some locales. Do you have examples of good or bad icons for some locale? Register with Open Internet Lexicon today. Apply to join us and help us internationalize the web.

Dictionaries

There are several multilingual dictionaries on the web. Many have some web terminology. None include today's rapidly changing set of web page terms and phrases. The EuroDicAutom is a massive project of the European Commission. It has 5.5 million entries in twelve languages (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Latin, Portuguese, Spanish, and Swedish) grouped by field (Agriculture, Aviation, etc.). The Logos dictionary is being compiled in over 30 languages by localizers around the world. It now has 7.5 million entries, but there are very few basic Web terms. Some excellent Internet/Web specific efforts are the French-English Terminologie d'Internet, and NetGlos, which stopped development in 1997.

Glossaries (subject-area specific dictionaries)

The Human-Languages Page, by Tyler Chambers, has links to nearly 2000 language-related web sites, including a few hundred glossaries. The Translator's Home Companion also has hundreds of glossaries. Babylon.com translates single words into 16 languages and has glossaries of many subjects in multiple languages. GlossPost is a searchable list of glossary web addresses (URLs) maintained by Maria Eugênia Farré.

A trilingual Internet-specific glossary is available at the Canadian Bureau of Translation. There have been many great software localization projects in the past, usually specific to a particular computer language or operating system. Their glossaries are also valuable references for Internet and Web. Check out Apple, Microsoft, SGI, and Sun. There are language-specific localization tools for Perl, for C, and for Java.

Translators

The most comprehensive online source for information about translators is Gabe Bokor's Translation Journal. It has hotlinks to translators organizations, databases of translators, on-line glossaries and dictionaries, discussion groups, and much more. The American Translators Association and the Northern California Translators Association list thousands of translators, searchable by language pair and subject area. The pioneer in translation memory tools, TRADOS, sponsors www.translationzone.com. It lists hundreds of freelance translators and localizers. In Europe, Aquarius is a portal to translators. Glenn's Guide to Translation Agencies lists a few hundred translation agencies. Resources for Translators is a database of translation rates, with testimonials and complaints about 5000 different translation agencies worldwide and reviews of translation tools.

Localizers

There are many companies who specialize in localizing web pages. Some have proprietary tools that produce multiple-language draft ("gist") machine translations (MT). All of them have many human translators (some in-house, some freelance) who finish the localizations. Among the world leaders are Berlitz GlobalNET, Lionbridge, Lernout & Houspie, and Bowne. But the biggest of them has less than a one percent share of the $10 billion world wide business of web localization and translation. Yahoo lists some 70 web translation services and tools. Multilingual magazine has a similar number of language translation vendors. The smaller companies generally do not have proprietary tools. They use industry-standard tools like Trados with translation memory (TM) that can be retained by their clients (in TMX format) so they are not captive to any one localization vendor.

Globalizers

Beyond localization, there is internationalization (abbreviated as I18N, localization is L10N). Internationalization implies one system that can respond to requests in large numbers of locales. This capability is often marketed as Globalization (G11N). Globalization companies promise multilingual web site management that will some day handle every locale. Leading globalizers, like eTranslate.com, Idiom Technologies, and GlobalSight, use content management tools to distribute the translation workflow to their partner localizers around the world.

Surprisingly, only one of the large companies boasting globalization technology has a web site that responds to a browser request in a language other than English (and that one only in French, try it - you must set your browser to French).

There are 139 two-letter language codes in ISO 639 and 239 country codes in ISO 3166. An expanded list of three-letter language codes is available at GlossPost. So there can be a very large number of locales. Probably 50 of them account for 98% of web activity today. Will web globalization technology be able to handle them all?

The World Wide Web has broken down the barriers of space and time.
 
The only remaining barrier is language.
Babel

There are two directions we can go to break down the language barrier. One is to realize the panglossian dream of an ideal universal language understandable by all. The other is to translate everything of importance - at least the gist in time and maybe someday le mot juste in time - into every language that has interested readers. In Europe today some are lobbying for English as that universal language, in computers and communications especially, and in the economics and business of globalized world markets. Others look to simultaneous over-the-Internet machine-assisted translations into many languages.

Machine Translation

Babel, a joint I18N project between The Internet Society and Alis Technologies, is an ambitious effort to allow the browser to work in any language with just-in-time translations and character code conversions (to the 16-bit unicode needed by non-Western languages). Alis "Gist-in-time®" is incorporated in the Netscape 6 browser, and their other components make the Windows OS multilingual.

  Babelfish (now babelfish.altavista.com) is the best known machine translation service on the web. It provides immediate online translations from French or English into 5 European languages. You can test their translations of some basic web phrases here. The underlying translation technology is by Systran. It can be purchased for use on your web site. Transparent Language also offers immediate online translations. Another online translation service is InterTran from TranslationExperts, Ltd. They offer a few dozen language pairs, and an interesting sentence diagram, with optional translations for key words and word rearrangement. GPLTrans is an open-source translation engine.

Powered by

The best portal site with access to 22 machine translation sites working in more than 50 languages is foreignword.com. They also offer links to 178 online dictionaries, 1001 glossaries, and hundreds of translators.

Most of the free web translation sites also let you enter a URL and they will translate a whole web page for you. Many millions of web pages have been translated by all these free services. They will play a big role in the future of the multilingual Web.

At the other end of the spectrum from free web machine translations, large corporations are buying huge "enterprise translation servers." These will be centrally located and provide over-the-Internet translations to their users from any browser. Large translation companies hope to sell real-time web-based translation services, either by translating web pages on-the-fly as users request them, or by returning translated web pages to the smaller company's web server. Typical rates are pennies per word for the raw machine translations, and $.25 per word for human-corrected text. These companies have armies (thousands) of translators, who will be able to receive the gist machine translation and return a human translation to the server (or to a client) with very fast turnaround times (note that many translators proficient in the source and target languages find the bland computer-speak gist a waste of their time). The entire translation business - quotations, scheduling, translations, approvals, and billing, will be conducted over the web. Translation suppliers may never meet their customers. Translation will be a universal web application, working on web application servers, at application service providers (ASPs).

Systran Enterprise costs about $5000 for a single language pair and five users, $32,000 for eight language pairs and 20 users. Lernout&Hauspie say their iTranslator Enterprise is coming soon. The Transparent Language MT engine is called TranscendRT. Their Enterprise Translation Server is priced starting at about $17,000. The IBM WebSphere Translation Server offers translations at 500 words per second from English to FIGS and CJK languages for $10,000 per language pair per CPU. Wordstream is developing a multilingual translator for the Internet called ClearText. It can translate a stream of text (like a news feed) into multiple languages in real time. No prices have been announced.

All these efforts at machine translation (MT) have received mixed reviews. Leading companies that have invested heavily in MT have done badly as the Internet frenzy has cooled and dot-com investments are on hold by many venture capitalists. Over a billion dollars has been invested in machine translation, mostly by the U.S. Defense Department and CIA. Much work has been done in universities. Carnegie Mellon University is a recognized leader in MT. The European Commission Systran MT (a distant relative of U.S. Systran) machine translates about a million pages a year into its 11 official languages. Apart from those with large government contracts, and science-fiction fans looking for Douglas Adams' Babelfish or Star Trek fans after a Universal Translator, knowledgeable observers doubt that machine translation will ever translate the subtle nuances in everyday language. Machines are now seen as aids to humans for Computer-Aided Translations (CAT). Machines can provide the gist of a document for a localizer who does not know the source language.

Desktop Translation Programs

Even the most inexpensive desktop translation products can usually provide the gist. They also can be installed in your web browser. Lernout & Hauspie Power Translator Pro (formerly from Globalink) translates into 7 languages. LanguageForce Universal Translator 2000 translates into 40 languages, with software keyboard support for each one. At Open Internet Lexicon we used these low-cost ($150) desktop tools to make the first draft translation in each language. We also consulted Babelfish and our dictionaries frequently.

To help us break the language barrier, you need a lot of knowledge.
 
Try reading some of the great books on localization, internationalization, and multilingual software.

By comparison with all the above, Open Internet Lexicon is much lower-level technology. It is just a simple and immediately useful web dictionary. All the words and concise phrases are Internet terminology suitable for web pages. A second difference is that Open Internet Lexicon is being developed and supported over the web, by web developers, and for web developers. This makes it possible for native speakers familiar with the evolving web in their cultures to keep the dictionary fresh and relevant in Internet time. Third, our database-backed system can easily handle hundreds of locales. Finally, it's open and free for all to use.

To join us as a localizer and get your locale added to our efforts, you must register with Open Internet Lexicon. Then fill out an application indicating your skills and interest. If you are approved, you will be given editing privileges in the terminology database for your locale. You will also have a web page on our site which you can edit to describe your work. And you will have a listing in our searchable database of OIL localizers.

If you want to join a team of localizers for a popular locale, you will be considered by the existing localizers. In any case, as an Open Internet Lexicon team member you will have privileges to comment and criticize the work in any locale.

Important Localization/Translation References
W3C Translations
W3C Translators Mailing List
W3C Translators Mailing List Archives
Localization Industry Standards Association
LISA Web Localization SIGs (dated but some good references)
Books and Magazines

Sponsored by skyBuilders.com

skyBuilders.com

Would your company like to sponsor a locale? Sponsor links appear on the locale pages and can direct visitors to you for custom localization. Inquire about sponsoring a locale and help us break the language barrier and internationalize the web.

 
Criticisms? Suggestions?   Do you want us to add a reference or hyperlink to this page? Send us an email or add a comment directly to this page below. Back to top of OIL site.