Character set. The computer codes that represent the characters of a language. 8-bit (byte or octet) codes have dominated from the early days of computing, when memory and storage were at a premium. 16-bit codes are adequate to represent all the characters of human language. See double-byte character set (DBCS) and Unicode. The mapping of characters to codes is called encoding. The jargon "charset" implies the set of rules for mapping from a sequence of bytes/octets to a sequence of characters.
Content negotiation. Requests by a browser client for a preferred language (or locale) and a character set to be sent from a web server. The server may respond with the closest combination available and alert the browser with MIME types like charset="iso-8859-1" language="fr-CA."
Default language. The use of English or English with localized key terms (especially in error messages) that must be understandable by computer users worldwide, since English is the most common language of computers, the Internet, and the Web. (RFC2277)
Dynamic Web Site. (Dynamic Server Pages) A web site in which part of the content is generated by server-side code working with a backend database server. Dynamic pages do not exist as files on the server, as a static HTML page does, until the request comes for a page. The request contains parameters, user identity, date and time, context, etc. that are used to create a custom HTML page for those specific parameters. Server-side code tools include Microsoft Active Server Pages, Sun Java Server Pages, PHP, and Perl.
Early Normalization. The capture of all text to the Unicode universal character set as soon as possible - by translators/localizers equipped with Unicode input method editors, or by conversion (transcoding) from the local character set to Unicode before major use of the text.
Entities. In HTML, characters that are specified by special sequences inside an ampersand and a semi-colon, such as " = quotation mark and © = copyright symbol. Special Unicode entities use ampersand and a hash-mark/pound sign, such as nnnn; = nnnn.
Gantt chart. A task management tool found in project managers like Microsoft Project. Can display critical path elements that affect the minimum time or budget for project completion.
Functional Testing. Quality assurance that a web site performs properly. All aspects of the user interface, navigation between pages and off-site, multilingual navigation, etc. are tested. Testing is required in all the current browsers and on the major operating systems and platforms.
G11N. Globalization. For the 11 letters in English between the end letters. See I18N, L10N.
Gist. A machine translation that is used to get the essential information from a document. A gist is useful even if it contains serious errors in vocabulary, syntax, and grammar.
Globalization. In the translation/localization business marketplace, it refers to the whole problem of making any product or service global, with simultaneous release in all markets. Web site globalization means more than just making one web site respond to the different language and regional requirements of the browser. Globalization includes the process by which site development, update processes, and workflow are engineered to provide a comprehensive framework for cost-effective multilingual site development and maintenance - incorporating overseas offices, consultants, translators, etc. Sometimes achieved by neutralizing the cultural elements, superior global sites are those that enrich the cultural elements appropriately in each locale.
I18N Internationalization. For the 18 letters in English between the end letters.
Internationalization Architecting a web site so that it works in multiple languages and the cultural contexts of different locales, without having to redesign the basic elements for each locale. It is a web site that is built on a multilingual engine. Sites must be internationalized before being localized. Today, Globalization is a more popular and broader term for the same fundamental problem and its solutions, with emphasis on the impact on the whole business.
L10N Localization. For the 10 letters in English between the end letters.
Locale. A concept that originated in the POSIX standard to identify a complex of cultural conventions (including a language) like number and date formatting, currencies, and
Localized Web Site. A version of a web site that runs in a distinct locale, probably from an Internet Service Provider (ISP) in that locale, but minimally with a domain name registered in the locale, e.g., openinternetlexicon.fr. Particular requirements beside the language are formats for date/time strings, currencies, numbers, and an eye to acceptable graphics in the local culture. A more expensive and powerful approach than a single globalized web site, but it still may require an underlying multilingual architecture in many countries.
Multilingual Web Site. A site that has most if not all of its pages translated into all the languages used on the site. It must have a means of serving the correct localized version based on a selection in the user interface or by automatic detection of browser language preferences and regional settings.
Persistent Web Site. A web site committed to preserving hyperlinks to web pages to prevent bookmarks from breaking. Apparently in conflict with both static and dynamic sites.
Portal. A web site that serves as an entry point to many services, products, or information sources. Usually rich with hyperlinked references to resources on the web.
RDBMS Relational Database Management System. Standard RDBMS tools range from desktop databases like Filemaker Pro and Microsoft Access to industrial-strength databases like IBM DB2, Informix, Microsoft SQL Server, Oracle, and Sybase SQL Server. Globalization tools that use databases may use proprietary databases that lock in a user to that vendors tools. Those that use a commercial standard RDBMS allow clients to port their intellectual property to a competitive superior tool.
Script. A script or writing system is a distinctive set of characters that may be used to write many languages. The alphabetic Latin or Roman script is common in Europe and the Americas. The Arabic script is used throughout the Muslim world. All alphabets are rooted in an ancient Northwest Semitic script. Characters in Asian scripts like Chinese Han and Korean Hangul are ideographs, as were ancient Egyptian. Syllabaries like Indic Devanagari and Japanese Katakana have single characters that represent combinations of consonants and vowels.
Static Web Site. Static web pages cannot change unless the web site designers manually edit the pages. And then they change on the wbe as soon as they are added to the site. Typically any changes in a document remove the earlier versions from the web.
Unicode. The 16-bit Unicode standard is capable of encoding the characters of the world's major language scripts. It is designed to be a universal character set. Version 3.0 contains 49,194 characters and 8,515 code points for private uses and future expansions. Special 32-bit combinations can reach a million characters. Unicode is supported on all the major computer operating systems, as well as by HTML 4.0, XML, and X-HTML. Half the Unicode characters are Chinese Han ideographs. Half the remainder are Korean Hangul. The most common Unicode charset is UTF-8.
Workflow. In the context of the web, the visibility on the web of tasks, schedules, budgeting, materials needing translation, glossaries, style manuals, translation memories, etc. Globalization team members with passwords can access all the elements of a multilingual web site, so that work can be accomplished in "Internet time."
World-Wide-Web Consortium. The W3C is the source of many important standards that affect globalization. A Requests For Comments (RFC) document proposes the W3C standards. RFC 2070 describes Internationalization of the Hypertext Markup Language. RFC 2277 describes the Internet Engineering Task Force (IETF) Policy on Character Sets and Languages