LaTeX/Internationalization
When you write documents in languages other than English, areas where LaTeX has to be configured appropriately:
- All automatically generated text strings have to be adapted to the new language.
- Language specific typographic rules. In French for example, there is a mandatory space before each colon character (:).
- LaTeX needs to know the hyphenation rules for the new language.
- You want to be able to insert all the language-specific special characters directly, without using any strange coding.
About the first, second and part of the third point, if your system is already configured appropriately (and it is, unless your LaTeX distribution has a bug), the babel package by Johannes Braams will take care of everything. You can use it loading in your preamble, providing as an argument the language you want to use:
\usepackage[language]{babel}
you'd better place it soon after the \documentclass command, so that all the other packages you will know the language you are using. A list of the languages built into your LaTeX system will be displayed every time the compiler is started. Babel will automatically activate the appropriate hyphenation rules for the language you choose. If your LaTeX format does not support hyphenation in the language of your choice, babel will still work but will disable hyphenation, which has quite a negative effect on the appearance of the typeset document. Babel also specifies new commands for some languages, which simplify the input of special characters. See the sections about languages for more information
If you call babel with multiple languages:
\usepackage[languageA,languageB]{babel}
then the last language in the option list will be active (i.e. languageB) you can to use the command
\selectlanguage{languageA}
to change the active language.
Most of the modern computer systems allow you to input letter of national alphabets directly from the keyboard. In order to handle variety of input encoding used for different groups of languages and/or on different computer platforms LaTeX employs the inputenc package:
\usepackage[encoding]{inputenc}
When using this package, you should consider that other people might not be able to display your input files on their computer, because they use a different encoding. For example, the German umlaut ä on OS/2 is encoded as 132, on Unix systems using ISO-LATIN 1 it is encoded as 228, while in Cyrillic encoding cp1251 for Windows this letter does not exist at all; therefore you should use this feature with care. The following encodings may come in handy, depending on the type of system you are working on:
| Operating system | Encodings | |
|---|---|---|
| Western Latin | Cyrillic | |
| Mac | applemac | macukr |
| Unix | latin1 | koi8-ru |
| Windows | ansinew | cp1251 |
| DOS, OS/2 | cp850 | cp866nav |
If you have a multilingual document with conflicting input encodings, you might want to switch to unicode, using the ucs package.
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
will enable you to create LaTeX input files in utf8x, a multi-byte encoding in which each character can be encoded in as little as one byte and as many as four bytes.
Font encoding is a different matter. It defines at which position inside a TeX-font each letter is stored. Multiple input encodings could be mapped into one font encoding, which reduces number of required font sets. Font encodings are handled through fontenc package:
\usepackage[encoding]{fontenc}
where encoding is font encoding. It is possible to load several encodings simultaneously.
The default LaTeX font encoding is OT1, the encoding of the original Computer Modern TeX font. It contains only the 128 characters of the 7-bit ASCII character set. When accented characters are required, TeX creates them by combining a normal character with an accent. While the resulting output looks perfect, this approach stops the automatic hyphenation from working inside words containing accented characters. Besides, some of Latin letters could not be created by combining a normal character with an accent, to say nothing about letters of non-Latin alphabets, such as Greek or Cyrillic.
To overcome these shortcomings, several 8-bit CM-like font sets were created. Extended Cork (EC) fonts in T1 encoding contains letters and punctuation characters for most of the European languages based on Latin script. The LH font set contains letters necessary to typeset documents in languages using Cyrillic script. Because of the large number of Cyrillic glyphs, they are arranged into four font encodings—T2A, T2B, T2C, and X2. The CB bundle contains fonts in LGR encoding for the composition of Greek text. By using these fonts you can improve/enable hyphenation in non-English documents. Another advantage of using new CM-like fonts is that they provide fonts of CM families in all weights, shapes, and optically scaled font sizes
Here is a collection of suggestions about writing a LaTeX document in a language other than English. If you have experience in a language not listed below, please add some notes about it.
Czech
Czech is fine using
\usepackage[czech]{babel}
\usepackage[T1]{fontenc}
\usepackage[utf8x]{inputenc}
You may use different encoding, but UTF-8 is becoming standard and it allows you to have „czech quotation marks“ directly in your text. Otherwise, there are macros \glqq and \grqq to produce left and right quote.
French
Some hints for those creating French documents with LaTeX: you can load French language support with the following command:
\usepackage[frenchb]{babel}
Note that, for historical reasons, the name of babel’s option for French is either frenchb or francais but not french. This enables French hyphenation, if you have configured your LaTeX system accordingly. It also changes all automatic text into French: \chapter prints Chapitre, \today prints the current date in French and so on. A set of new commands also becomes available, which allows you to write French input files more easily. Check out the following table for inspiration:
| input code | rendered output |
|---|---|
| \og guillemets \fg{} | « guillemets » |
| M\up{me}, D\up{r} | Mme, Dr |
| 1\ier{}, 1\iere{}, 1\ieres{} | 1er, 1re, 1res |
| 2\ieme{} 4\iemes{} | 2e 4es |
| \No 1, \no 2 | N° 1, n° 2 |
| 20~\degres C, 45\degres | 20 °C, 45° |
| M. \bsc{Durand} | M. Durand |
| \nombre{1234,56789} | 1 234,567 89 |
You will also notice that the layout of lists changes when switching to the French language. For more information on what the frenchb option of babel does and how you can customize its behavior, run LaTeX on file frenchb.dtx and read the produced file frenchb.dvi.
German
Some hints for those creating German documents with LaTeX: you can load German language support with the following command:
\usepackage[german]{babel}
This enables German hyphenation, if you have configured your LaTeX system accordingly. It also changes all automatic text into German. Eg. “Chapter” becomes “Kapitel.” A set of new commands also becomes available, which allows you to write German input files more quickly even when you don’t use the inputenc package. Check out table 2.5 for inspiration. With inputenc, all this becomes moot, but your text also is locked in a particular encoding world.
| "a | ä |
| "s | ß |
| "` or \glqq | „ |
| "' or \grqq | “ |
| "< or \flqq | « |
| "> or \frqq | » |
| \flq | ‹ |
| \frq | › |
| \dq | " |
In German books you often find French quotation marks («guillemets»). German typesetters, however, use them differently. A quote in a German book would look like »this«. In the German speaking part of Switzerland, typesetters use «guillemets» the same way the French do. A major problem arises from the use of commands like \flq: If you use the OT1 font (which is the default font) the guillemets will look like the math symbol "", which turns a typesetter’s stomach. T1 encoded fonts, on the other hand, do contain the required symbols. So if you are using this type of quote, make sure you use the T1 encoding. (\usepackage[T1]{fontenc})
Greek
This is the preamble you need to write in the Greek language.
\usepackage[english,greek]{babel}
\usepackage[iso-8859-7]{inputenc}
This preamble enables hyphenation and changes all automatic text to Greek. A set of new commands also becomes available, which allows you to write Greek input files more easily. In order to temporarily switch to English and vice versa, one can use the commands \textlatin{english text} and \textgreek{greek text} that both take one argument which is then typeset using the requested font encoding. Otherwise you can use the command \selectlanguage{...} described in a previous section. Use \euro for the Euro symbol.
Hungarian
Similar to Italian, but use the following lines:
\usepackage[magyar]{babel}
\usepackage[latin2]{inputenc}
\usepackage[T1]{fontenc}
- More information in hungarian.
Italian
Italian is well supported by LaTeX. Just add \usepackage[italian]{babel} at the beginning of your document and the output of all the commands will be translated properly. You can add letters with accents without any particular setting, just write \`a \`e \'e \`i \`o \`u and you will get à è é ì ò ù (NB: the symbol changes if the inclination of the accent changes). Anyway, if you do so, it could be quite annoying since it's time-wasting. Moreover, if you are using any spell-checking program, "città" is correct, but "citt\`a" will be seen as a mistake. If you add \usepackage[latin1]{inputenc} at the beginning of your document, LaTeX will include correctly all your accented letters. To sum up, just add
\usepackage[italian]{babel}
\usepackage[latin1]{inputenc}
at the beginning of your document and you can write in Italian without being worried of translations and fonts. If you are writing your document without getting any error, then don't worry about anything else. If you start getting some unknown errors whenever you use an Italian letter, then you have to worry about the encoding of your files. As known, any LaTeX source is just plain text, so you'll have to insert accented letters properly within the text file. If you write your document using always the same program on the same computer, you should not have any problem. If you are writing your document using different programs, if could start getting some strange errors from the compiler. The reason could be that the accented letters were not included properly within your source file and LaTeX can't recognize them. The reason is that an editor modified your document with a different encoding from the one that was used when creating it. Most of the operating systems use UTF-8 as default, but this could create problems if are using programs based on different libraries or different operating systems. The best way to solve this problem is to change the encoding to ISO-8859-1, that includes all the letters you need. Some text editors let you change the encoding in the settings.
Korean
To use LATEX for typesetting Korean, we need to solve three problems:
- We must be able to edit Korean input files. Korean input files must be in plain text format, but because Korean uses its own character set outside the repertoire of US-ASCII, they will look rather strange with a normal ASCII editor. The two most widely used encodings for Korean text files are EUC-KR and its upward compatible extension used in Korean MS-Windows, CP949/Windows-949/UHC. In these encodings each US-ASCII character represents its normal ASCII character similar to other ASCII compatible encodings such as ISO-8859-x, EUC-JP, Big5, or Shift_JIS. On the other hand, Hangul syllables, Hanjas (Chinese characters as used in Korea), Hangul Jamos, Hiraganas, Katakanas, Greek and Cyrillic characters and other symbols and letters drawn from KS X 1001 are represented by two consecutive octets. The first has its MSB set. Until the mid-1990’s, it took a considerable amount of time and effort to set up a Korean-capable environment under a non-localized (non-Korean) operating system. You can skim through the now much-outdated http://jshin.net/faq to get a glimpse of what it was like to use Korean under non-Korean OS in mid-1990’s. These days all three major operating systems (Mac OS, Unix, Windows) come equipped with pretty decent multilingual support and internationalization features so that editing Korean text file is not so much of a problem anymore, even on non-Korean operating systems.
- TEX and LATEX were originally written for scripts with no more than 256 characters in their alphabet. To make them work for languages with considerably more characters such as Korean or Chinese, a subfont mechanism was developed. It divides a single CJK font with thousands or tens of thousands of glyphs into a set of subfonts with 256 glyphs each. For Korean, there are three widely used packages; HLATEX by UN Koaunghi, hLATEXp by CHA Jaechoon and the CJK package byWerner Lemberg. HLATEX and hLATEXp are specific to Korean and provide Korean localization on top of the font support. They both can process Korean input text files encoded in EUC-KR. HLATEX can even process input files encoded in CP949/Windows-949/UHC and UTF-8 when used along with Λ, Ω. The CJK package is not specific to Korean. It can process input files in UTF-8 as well as in various CJK encodings including EUC-KR and CP949/Windows-949/UHC, it can be used to typeset documents with multilingual content (especially Chinese, Japanese and Korean). The CJK package has no Korean localization such as the one offered by HLATEX and it does not come with as many special Korean fonts as HLATEX.
- The ultimate purpose of using typesetting programs like TEX and LATEX is to get documents typeset in an ‘aesthetically’ satisfying way. Arguably the most important element in typesetting is a set of welldesigned fonts. The HLATEX distribution includes UHC PostScript fonts of 10 different families and Munhwabu fonts (TrueType) of 5 different families. The CJK package works with a set of fonts used by earlier versions of HLATEX and it can use Bitstream’s cyberbit True-Type font.
To use the HLATEX package for typesetting your Korean text, put the following declaration into the preamble of your document:
\usepackage{hangul}
This command turns the Korean localization on. The headings of chapters, sections, subsections, table of content and table of figures are all translated into Korean and the formatting of the document is changed to follow Korean conventions. The package also provides automatic “particle selection.” In Korean, there are pairs of post-fix particles grammatically equivalent but different in form. Which of any given pair is correct depends on whether the preceding syllable ends with a vowel or a consonant. (It is a bit more complex than this, but this should give you a good picture.) Native Korean speakers have no problem picking the right particle, but it cannot be determined which particle to use for references and other automatic text that will change while you edit the document. It takes a painstaking effort to place appropriate particles manually every time you add/remove references or simply shuffle parts of your document around. HLATEX relieves its users from this boring and error-prone process.
In case you don’t need Korean localization features but just want to typeset Korean text, you can put the following line in the preamble, instead.
\usepackage{hfont}
For more details on typesetting Korean with HLATEX, refer to the HLATEX Guide. Check out the web site of the Korean TEX User Group (KTUG) at http://www.ktug.or.kr/.
Cyrillic script
Please add the section "Writing in Cyrillic" from http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf . You are allowed to copy it.
See also the Bulgarian translation of the "Not so Short Introduction to LaTeX 2e” from http://www.ctan.org/tex-archive/info/lshort/bulgarian/lshort-bg.pdf
Portuguese
Add the following code to your preamble:
\usepackage[portuguese]{babel}
\usepackage[latin1]{inputenc}
\usepackage[T1]{fontenc}
if you are in Brazil, you can substitute the language for brazilian. The first line is to get everything translated properly, the second is for being able to input text correctly and the third one to get hyphenation right. Note that we are using the latin1 input encoding here, so this will not work on a Mac or on DOS. Just use the appropriate encoding for your system. If you are using Linux, use
\usepackage[utf8]{inputenc}
Arabic script
For languages which use the Arabic script, including Arabic, Persian, Urdu, Pashto, Kurdish, Uyghur, etc., add the following code to your preamble:
\usepackage{arabtex}
You can input text in either romanized characters or native Arabic script encodings. Use any of the following commands/environment to enter in text:
\< … >
\RL{ … }
\begin{arabtext} … \end{arabtext}.
See the ArabTeX Wikipedia article for further details.