Romanization is the act of converting to Roman letters a Non-Roman script - such as the Arabic one used by Persian. There are a variety of ways to do this. This is why the name of Iran's former president, for example, is variously written as Ahmadinejad, Ahmadīnežād, or Ahmadinedschad. But because no authority has managed to declare a particular style as the official one, we will continue to deal with what some might call bike-shedding.
Not the nerds here at farsi.school though - we love to talk transliteration systems . Which is why, when it was time to pick one of those systems to use on the site, we instead wrote a bunch of code such that we would be able to output any romanization style, and let you pick the one you prefer - maybe the one you are already familiar with.
This article will catalog as many romanization schemes as we could find, and which of them we currently support. Be warned: There are a lot.
There are many minor differences between different romanizations - a different Latin character used here or there - but in the main, we can identify a handful of fundamentally different approaches:
The academic approach is to give each Persian letter a unique Latin equivalent. A scholar looking at a transcribed Persian word can tell exactly how it is spelled in the original. The four Persian letters pronounced /z/ will all need their own unique Latin letter - usually be adding diacritic marks, such as ẕ, ž, ż or ẓ. This is a very strict approach to transliteration, and kind of annoying to non-scholars who are mostly interested in knowing what the word is, or maybe how it is pronounced.
The other side of this is what is sometimes called transcription - where we focus on representing the sound of a word. Because the Latin letters are pronounced differently in each language, such schemes differ from language to language. And there is no need to have four different /z/ characters - they are all pronounced the same.
Unfortunately, not all Persian sounds map cleanly to the orthography of your language. For example, English doesn't have a single letter that could be used to represent the sound of the Persian letter ش. At this point, you have to options: First, you could use a two-letter combination - say sh
- it's a good choice for English speakers, as they are familiar with that sound through words such as shoe. But that leaves you with a problem: you already used s
and h
individually to represent two other Persian letters. So now you've introduced a possible source of confusion: sh
could be the letter ش, or it could be the two letters represented by s
and h
. And this is compounded by the fact that the Persian alphabet has slightly more letters than the English one, so you'll run out of letters at some point.
The alternative is using diacritic marks, for example š
. Now you are not relying on character pairs, but you have introduced a learning curve because you need to explain to people how š
is pronounced.
In short, it depends on the trade-offs you want to make, and your goal in transliterating.
The systems listed here are sanctioned by some kind of governmental or international organization, standards-body, or an organization of some renown.
ث | ذ | ص | ض | ط | ظ | ع | ء | long u | vowel ی | short o | short e | silent و | silent alef |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s̱ | ẕ | ṣ | z̤ | ṭ | ẓ | ' | ' | ū | ī | u | i | v | á |
This is a romanization standard defined by the US Library of Congress. The American Library Association seems to also have signed on, which is why this scheme is also known as ALA-LC romanization. It is is not just for Persian - they define romanization tables for many languages.
The thing is - it's kind of annoying. They chose to represent the Persian vowel /e/, pronounced just like the English e
in lesson, using the character i
. In the same vein, the vowel /o/, pronounced like the English o
in Oscar, must be written u
. This is is just very unintuitive. The book Languages of the World: Cataloging Issues and Problems offers an answer as to why: "The library of congress has chosen to romanize Persian as if it were Arabic. [...] LC's romanization of Persian does genuine damage in its transliteration of the source text in that it requires incorrect and inaccurate vocalizations. This romanization system is a perennial source of complaint amount librarians of Iranian origin, Iranian readers, and scholars of Persian."
ث | ح | ذ | ص | ض | ط | ظ | ع | ء | long u | vowel ی |
---|---|---|---|---|---|---|---|---|---|---|
s̄ | ḩ | z̄ | ş | ẕ | ţ | z̧ | ‘ | ‘ | ū | ī |
These systems are used by the United States Board on Geographic Names and the lovingly named Permanent Committee on Geographical Names for British Official Use. It is apparently used on the gov.uk website. And while it was first adopted in 1958, this isn't the stuff of spiderwebs. At the time of this writing, their reference document was last updated in 2019.
This is the system Iran submitted to the United Nations Group of Experts on Geographical Names as the proposed transliteration system for Persian-origin names. It was adapted in 1976, and there is a revised edition from 1998. However, that same UN commission approved a new system in 2012, replacing this older recommendation.
This is a newer system adopted by the UNGEGN, replacing the older system from 1976. The new system is based on what is being pushed within Iran itself these days. This report summarized the situation. Also a fun read are the resolutions from the UN conference adopting the change, where in-between "Discouraging the commercialization of geographical names" and a "Web-based course in toponymy", the subject of the Persian romanization system is addressed: "Recognizing that the romanization system for geographical names adopted by the Conference in its resolution I/13 is no longer used in the Islamic Republic of Iran", the conference decides to recommend this new system.
This scheme was developed by a German Oriental Studies organization called Deutsche Morgenländische Gesellschaft. Accordingly, their approach is scholarly. For example, each of the 4 Persian letters pronounced /z/ is given its own Latin equivalent: ẕ, ž, ż, ẓ. It is a transliteration in the strict sense - by looking at the Roman characters, you should be able to tell how the word was written in Persian.
It has been standardized as DIN 31635.
These romanization schemes that have been adopted by publishers for use in dictionaries, books or journals. We picked some of the notable ones, but this is always going to be a selection - most every Persian book uses its own method, designed according to the author's preferences.
I think a lot of people are not aware that the English Wiktionary collects entries for all the languages in the world, and has a large number of entries for Persian. It is arguably one of the best English/Persian dictionaries available online. They give you the pronunciation of each word via transliteration, and you can have a look at the table they use.
Wikipedia mentions this as an early system used. I could not find this further documented.
ث | چ | خ | ذ | ژ | ش | ص | ض | ط | ظ | غ | ع | ء |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ṯ | č | ḵ | ḏ | ž | š | ṣ | ż | ṭ | ẓ | ḡ | ʻ | ʻ |
This is the translation system used by the Encyclopædia Iranica today. It is a scholarly approach, giving each Persian letter a unique Latin equivalent.
ث | ذ | ص | ض | ط | ظ | ع | ء | long u | vowel ی | short o | short e |
---|---|---|---|---|---|---|---|---|---|---|---|
s̱ | ẕ | ṣ | ż | ṭ | ẓ | ʿ | ʾ | ū | ī | u | i |
If you want to publish an article in the International Journal of Middle East Studies, they require you to use their transliteration rules for Persian. (They also have their own system for Arabic and Turkish, and fall back to the Library of Congress transliteration charts otherwise.)
ج | چ | ح | خ | ذ | ش | غ | ع | ء | ق | و | consonant ی |
---|---|---|---|---|---|---|---|---|---|---|---|
ğ | č | h | x | z | š | ġ | ' | ' | ġ | w | j |
Used in the Langenscheidt German/Persian dictionary. Approximates the Persian phonology using the way the letters are pronounced in German.
Romanization systems make some attempt to map the Latin characters in such a way that the way they are pronounced matches the pronunciation of the Persian letters. Take the Persian vowel /u/. In English, u tends to be pronounced /a/, as in "unhappy". But the vowel in good sounds like a Persian /u/. So this type of Pinglish uses oo to represent that vowel, so you might see something like meedooneem - and as an English speaker you'll likely get it right immediately.
These are attempts to define an alternative Latin-based script for Persian. The idea in all cases is to define a common standard for actually writing Persian text (not merely to transliterate). In some cases, the authors even invision to gradual of the Arabic alphabet. I am not aware that of them have gained a notable amount of adoption, though.
چ | ح | خ | ذ | ژ | ش | غ | ع | ء | consonant ی |
---|---|---|---|---|---|---|---|---|---|
c | h | x | z | ž | š | q | ∅ | ∅ | j |
This is the most recent of the alternative script projects. It is based on the UN 2020 scheme, and formalizes it further. On it's website, you can find a handful of books that use the script, including the Divan of Hafez.
This is another paper by Jalal Maleki, the author of eFarsi - his second attempt to define a Latin Alphabet for Farsi. It's more succinct and changed its approach in certain ways. For example, it now rejects the (disputed) existence of diphthongs in Persian.
This is often referenced by other papers such as eFarsi as a predecessor, but not a lot of information is available. Its author seems to be Ali Moslehi Moslehabadi. The IPA seems to stand for International Persian Alphabet. Only the Internet Archive still has details available. That page is fairly explicit that Pársik is seen by the authors not as a transliteration scheme, but as an alternative alphabet for Persian in its own right.
Persik is a proposal by Esmail Nooriala, who seems like an interesting guy in general. Struggling to teach his children the Persian alphabet, he found a solution in writing in Latin letters and found lacking the way Persian was romanized differently all over the world.
UniPers (short of Universal Persian) is probably the most activist of the alternative scripts. Its authors desire to see the Arabic Alphabet gradually phased out and replaced by a romanized system. It is also one of only two systems I know of that have their own logo.
The project website now seems to be down, but you can read an extensive Q&A on their Facebook Page. There is very little information about who is behind it (some websites claim the author's name is Mohamed Keyvan, but I could not source it).
چ | خ | ذ | ژ | ش | غ | ع | ء | ا |
---|---|---|---|---|---|---|---|---|
c | x | z | ẑ | ŝ | q | ’ | ’ | â |
eFarsi is described in a paper by Jalal Maleki from Linköping University. Like UniPers, there is an activist element to it, though its goals are much more modest - it simply believes Persian would benefit from a standard way to romanize the script. Its approach is also much more rigorous. Most romanization systems are little more than a table and some notes. The eFarsi paper, on the other hand, discusses in detail all kinds of corner cases, gives reasons for the choices made, and gives a great deal of thought to orthographic issues such as when to use dashes to separate compound words.
Jalal defined another scheme some years later called Dabire.
An even earlier attempt at a standard. The now offline website says that the project dates back to 1993. Its goal is stated as "to promote the romanization of the Persian Script & Alphabet". A Facebook Group also survives.