University of Utah arrow (graphic) Marriott Library arrow (graphic) Special Collections arrow (graphic) Middle East Library

Language Issues

 

Arabic script support

With the advent of Unicode support in many web browsers, Arabic script display in web pages is no longer an issue. However, Arabic script support in word processing software is still problematic–but it need not be so. The good news for users of recent versions of MS Office is that Arabic support comes as part of the package; however, it is not enabled automatically.

We strongly recommend that you follow these guidelines and enable Arabic script input on your computer. This added functionality is necessary in order to make full use of the electronic resources described on these pages.


Diacritics and transliteration

How does one produce Arabic transliteration in a document? - It seems that most people resort to a special transliteration font (either freeware like McGill's Arabic transliteration font (http://www.mcgill.ca/islamicstudies/students/arabic/), or commercial products, like Semitic Transliterator (http://www.linguistsoftware.com/st.htm)). One disadvantage of all these products is that a special keyboard layout has to be learnt; what is more important is that text produced with these fonts may not be legible in other applications or by other users. One possible solution is to make use of Unicode.

"Using Unicode" may sound technical and complicated, but it is actually very easy if you are using Office 2000 or the like, because these software packages support Unicode, and include appropriate fonts, such as Arial MS Unicode.

If you wish to insert a character with diacritics for the transliteration of Arabic [or Persian, or Urdu] into a document, all you need to do is go to:

  • INSERT > SYMBOL
    This will call up a table with all available glyphs in that font. Arial MS Unicode has the most comprehensive coverage of all fonts, and it is recommended that you use it for transliterating Arabic. If it is not selected automatically, you can choose it manually from the dropdown menu at the top of the box.
  • You will note that the table offers a range of upper- and lowercase letters with macrons; these are so-called composite glyphs (because they consist of a letter + macron). For the best display, you should choose these composite glyphs for all characters with macrons.
  • The symbols for the glottal stop and ʿ Ayn can be found at positions 02BE and 02BF respectively, with alternative shapes at 02C0 and 02C1.
  • Finally, you will find the so-called 'modifier letters' subscript dot (0323), combining diaeresis below (0324), combining breve below, etc... . All of these will combine with the letter preceding the position of the cursor.

Please note the following:

  • It is of course very tiresome to insert each diacritic character in the way described above. To make things easier, you can define a shortcut key for each of the diacritic characters you use.
    To do that, you need to:
  • open the the table of symbols as described above
  • highlight the character of your choice, then click on the button marked Shortcut Key at the bottom of the window
  • a small window will appear, in which the character is displayed together with its code. The cursor will already rest in a field marked Press new shortcut key
  • press any combination of function and normal keys to define your shortcut [e.g. CTRL + a for 'ā' (a+macron)]; click ASSIGN then CLOSE .
  • Repeat these steps for each diacritic character. It is advisable to assign shortcut keys that are easy to remember [e.g. CTRL + a for ā (a+macron); CTRL + Shift + a for Ā (A+macron); CTRL + c for ʿ Ayn and so on]. If that sounds like too much work, remember the benefits, and bear in mind that downloading a font and installing it may also take a while. Depending on the transliteration scheme you use for Arabic, you may only need to define nine characters ( Library of Congress (http://www.loc.gov/catdir/cpso/roman.html) scheme)
  • Finally, you will notice that with certain fonts, the subscript dot is not always neatly centred underneath the letter: while it appears in the correct position with s, z, d, and h, it moves too far to the left with t. A simple fix is to put all transliterated text into italics.
  • The same procedure can be used for the insertion of non-standard characters in text in Arabic script (e.g. dotless nūn, fā' with dot below etc.)

 


Online Dictionaries (Arabic, Hebrew, Persian, Turkish, Urdu)

Sakhr's Translation and Dictionary site (http://translate.sakhr.com) (EN > AR; AR > EN) For subscribers only, but subscription is free, and the dictionary tool is excellent.

  • Morfix (http://www.morfix.com/)(EN>HE; HE>EN) An excellent online dictionary for modern Hebrew; includes a parser, and can analyze complex componds
  • Aryanpour Persian Dictionary (http://www.aryanpour.com/) (EN > FA; FA > EN)
  • FarsiDic (http://www.farsidic.com/) (EN > FA; FA > EN)
  • Türk Dil Kurumu (http://www.turkish-media.com/sozluk/)
  • Turkish 1 (http://www.hazar.com/) (EN > TR; TR > EN; includes separate glossary for IT terminology)
  • English > Urdu (http://biphost.spray.se/tracker/dict/) (transliteration)
  • Urdu > English (http://urdudict.cjb.net/) (in transliteration only)

 

Fonts

Several websites on the internet offer free downloads for Arabic fonts. One of the more comprehensive lists (http://cgm.cs.mcgill.ca/~luc/arab.html) is that by Luc Devroye at McGill. Do also try the index (http://jeff.cs.mcgill.ca/~luc/fonts.html) of that site for lists of fonts for other languages.

If you would like to modify an existing font (e.g. to add special characters not found elsewhere), or create a new font from scratch, you can do so using a simple font editor like Softy (http://fonts.katgyrl.com/core/mgrs.htm), which can be downloaded for free. Modifying a font is not as difficult as it may seem, and may be useful if you need to represent archaic characters or unusual letter forms routinely. However, your changes to a font will not be readable on other computers (unless you install your modified font, of course).


Text processing in non-Roman scripts

The steps to enable users to produce Word documents in Arabic have already been outlined above. This paragraph addresses special problems that one one may encounter in processing Arabic text.

 

Directionality

The use of scripts with a different directionality (left-to-right or right-to-left) creates special problems, particularly when words or phrases are inserted into a text with a different directionality, and when punctuation or numbers are used. In some cases, both text and punctuation marks appear in the wrong order. The underlying processes (http://www.unet.univie.ac.at/aix/aixprggd/genprogc/layout_over.htm) that govern the arrangement of characters on the screen may explain why text behaves so oddly at times.

In order to avoid some the more common problems with bidirectional texts, try to avoid pasting Arabic text into an English one (or vice versa). It is helpful to type all text in one go (including insertions in a different script), and to set apart large chunks of Arabic text is a separate paragraph (in order to avoid problems with line wrapping).

Layout

One of the peculiarities of classical Arabic (Persian, Ottoman Turkish, Urdu...) poetry is that it is usually presented in lines of regular length that are divided into two hemstitches. When typing poetry of this kind, you may want to reproduce the same layout (http://students.washington.edu/irina/persianword/poetry.htm).

Keyboards

In order to type with some speed in Arabic, one has to memorize the positions of the characters on the keyboard. Initially, one may find it helpful to have an Arabic keyboard, which can be bought via the web. Alternatively, one can use special transparent stickers with Arabic letters printed on them, which attach to the keys on the normal Latin keyboard. The cheapest —though optically not most appealing— solution is to buy a sheet of small sticky labels and to write out the Arabic letters oneself.


Unicode

Unicode is an encoding standard which aims to assign a unique number to each and every character or glyph in all known writing systems. Thanks to this standard, text in Arabic [Japanese, Thai, Hindi ... etc.] script can be displayed correctly by different programmes and different systems - provided that the system is able to deal with Unicode.

More information is available directly from the Unicode Consortium (http://www.unicode.org/). Their site also offers lists of Unicode-Enabled programmes and information on how to adjust settings for a correct display in a Mackintosh or UNIX environment. A detailed discussion of resources for non-Roman scripts in Unicode (with particular reference to browsers and HTML editors) is available from Alan Wood (http://www.alanwood.net/unicode/).

 

 


 

 

April 21, 2006

 

 

University of Utah Home