Bulgaria is a nation which is directly impacted by the current Fast Track automatic disqualification when Top-Level Domain (TLD) strings are "confusingly similar" to other TLDs, in this case an Internationalized Domain Name (IDN) country code Top-Level Domain (ccTLD). Bulgaria has already been declined twice (in late 2009, and in May of 2010) to register the *.бг Cyrillic IDN on the premise that it looks confusingly similar to Brazil's *.br ASCII TLD.
Being a native Bulgarian, I did not see how these two strings are similar—nor confusing for that matter—so a research on how ICANN determines a confusingly similar string was due. While reviewing the ICANN rules, it hit me that a very important part of the comparison was left out, namely how these strings will be used.
Before going into this, let me start with a few words on where the problem lies, i.e. why ICANN finds these strings to be confusingly similar.
Similarities and differences between Cyrillic and Latin characters – The Cyrillic letter б does not look like a b; it actually looks much more like the number 6, however every person who speaks a Cyrillic language will recognize the difference between the two letters and the number, especially when put into context (Click to Enlarge).The world population that speaks Cyrillic languages
Although a Latin-speaking user can certainly find these strings quite similar, a Cyrillic speaking person will know which one is which. The Cyrillic letter б does not look like a b (see my comparison of the Latin and Cyrillic alphabets); it actually looks much more like the number six 6, however every person who speaks a Cyrillic language will recognize the difference between the two letters and the number, especially when put into context (again, more on this later).
The difference between the subsequent top-level domain letters г and r are not as noticeable in regular fonts, but are very noticeable in hand-written and italic fonts. Still, a person who knows a Cyrillic language will know the difference. This case is even more obvious in hand-written and italic fonts:
.бг vs .br
As a result, it seems that the population that speaks Latin languages is the one finding these strings confusingly similar, which has resulted in the ICANN rules for string similarity, but without taking into consideration the population that speaks Cyrillic languages.
The population that speaks Latin languages
A major point that ICANN is missing in their current evaluation criteria for confusingly similar strings is that they do not review the TLDs, especially IDNs, in the context they will be used in. When reviewing an IDN in context, the evaluation of the string (and its alphabetical differentiation) becomes much clearer and easier. As an example, let's look at how а company's domain would look like in Latin and Cyrillic IDNs:
company.br
компания.бг (компания (BG) = company (ENG))
I doubt that someone will mistakenly take one for the other. Still, let's analyze this in more detail and review some extreme similarity cases.
Brazil's IDN vs. Bulgaria's IDN
The main reasons that differentiate Brazil's IDN from the Bulgarian IDN are:
- A URL consists of a top-level domain and a second-level domain. Since .бг and .br are just top-level domains, they are meaningless without a second-level domain. When comparing full URLs, the difference between the two is exceptionally obvious.
Example: company.br and компания.бг - Brazil uses three tier domains (host+gTLD+ccTLD), whereas Bulgaria uses two-tier domains (host+ccTLD), which makes the visual gap between the two even larger. As a result, a Brazilian user looking at a Bulgarian URL will know right away that this is not a Brazilian domain, even if the host uses the same letters.
Example 1: Vivo is one of Brazil's mobile network operators. Their site is vivo.com.br which in Bulgarian would be виво.бг. There is no resemblance between the two.
Example 2: An imaginary company called American Electric has registered ae.com as its main domain. Its Bulgarian domain would be ае.бг, which does not resemble its Brazilian counterpart ae.com.br, even though the host is exactly the same. Even if Bulgaria starts using three-tier domain names (host+gTLD+ ccTLD), this URL will look like ае.ком.бг, which is also decidedly not the same as the Brazilian domain.
The Extreme Case of string similarity
IMPORTANT NOTE: The analysis below is excessive, and this is on purpose, because it could happen. It raises the importance of having regulations in the case that such situations arise in the future. This analysis presumes that Brazil uses two-tier domain names (host+ccTLD), and that there is a company with a domain string that is exactly the same in Cyrillic and Latin languages.
- If a non-native English speaker (such as a Frenchman or a Spaniard) sees ae.бг, but knows the context where the URL is used/mentioned, s/he will most probably know that this is a Cyrillic/Bulgarian domain. No action here.
- If a non-native English speaker (such as a Frenchman or a Spaniard) or a native English speaker sees ae.бг without knowing the context where the domain is used/mentioned, s/he may think that it is in Latin.
In such cases, regulation (which is ICANN's strength) should be in place to control the use of these strings and to ensure that a single registrant owns visually similar domains. In addition, browser vendors need to update their error message in case ae.бг is entered in Latin letters in the browser, and there is no such domain. The error message should reflect that the domain may be in Cyrillic. Here is an example for a possible error message:
Server not found
Firefox can't find the server at www.ae.bg.
- Check the address for typing errors such as ww.example.com instead of www.example.com
- Check the address for being in Cyrillic such as ае.бг instead of ae.br
- If you are unable to load any pages, check your computer's network connection.
- If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.
ICANN Staff's reasoning on *.бг
ICANN staff's reasoning for declining Bulgaria is that "internet is a world resource and uniqueness is most important." However, its decision will have an impact on at least 7 million Bulgarians, not to mention their relatives and the Bulgarian-speaking population around the world. In addition, with the IDN ccTLD Fast Track Process ICANN wants to open the Internet to languages based on scripts other than Latin in order to make it more accessible, but at the same time impose limitations on its openness, thus effectively contradicting itself.
The good news is that ICANN is open for feedback (I have already submitted these comments to the ICANN), so hopefully these findings will make it into the ccTLD application and Fast Track review later this year. I will nevertheless appreciate your thoughts on this, so please leave a comment.
The history of the Cyrillic alphabet
To finish off, I would like to give you a little background on the Cyrillic alphabet.
The Cyrillic script is an alphabet developed in the 9th century by two brothers, Cyril and Methodius, who were later on venerated in the Eastern Orthodox Church as saints. The Cyrillic alphabet was first adopted by Bulgaria, my home country, and because of that Cyrillic is believed to be a Bulgarian alphabet, although this is debatable. The Cyrillic script is used in the Slavic nations of Belarus, Bosnia, Bulgaria, Russia, Serbia, Macedonia, Montenegro, and Ukraine, and in the non-Slavic nations of Moldova, Kazakhstan, Uzbekistan, Kyrgyzstan, Tajikistan, Tuva, and Mongolia. With the accession of Bulgaria to the European Union on 1 January 2007, Cyrillic became the third official alphabet of the European Union, following the Latin and Greek alphabets. It is also one of the few alphabets that has its own holiday (May 24th), which is celebrated internationally.
Written by Vassil Petev, Unit Manager
Follow CircleID on Twitter
More under: DNS, Domain Names, Domain Registries, ICANN, Multilinguism, Policy & Regulation, Top-Level Domains, Web