Non-Latin based languages
Capture non-Latin based languages (e.g. Greek, Hebrew, Russian, Chinese, Arabic, etc.) using an xml:lang attribute where the value is the ISO 639-2 language code.
For phrases, sentences, or single characters embedded within a Latin-based text, please wrap the non-Latin language in a span element with the xml:lang attribute.
For entire paragraphs or sections, the xml:lang attribute may go inside p, div, or, in the case of legal extracts, extract (see section Extracts of legal documents).
Take care when capturing Greek capital letters. Several Greek capital letters look identical to Latin letters.
For example, Latin A
(U+0041) looks like Greek Α
(U+0391, capital letter
alpha)
With alphabets which are read right-to-left (e.g. Hebrew), the characters must be captured in the XML file in the order in which they are meant to be read. (In a text editor such as Oxygen, the characters will be rendered in the correct right-to-left order.)
<p>The most obvious interpretation of Genesis 1.1 from Hebrew (<span xml:lang="heb">בְּרֶאשּית
בָּרָא אֱלֹהים</span>) to Greek (<span xml:lang="ell">βιβλος γενεσεως ιησου χριστου υιου
δαβιδ υιου αβρααμ</span>) makes the case perfectly clear.</p>