THE way the world’s languages are displayed digitally can be a topic of raging, if somewhat arcane, debate. Coders and designers may disagree over whether a particular script has differentiated upper and lower cases, or which set of accents it needs. But the latest discussion, about emoji (the icons used in electronic communications to convey meaning or emotion—think smiling yellow faces), has been stickier than most.

It is all to do with Unicode. This is a standard that assigns numbers and a corresponding description to the characters of the world’s alphabets, as well as to many more things such as mathematical symbols. It allows different operating systems and applications to show the same characters across thousands of languages. So a WhatsApp message written in, say, Sanskrit on an iPhone in California can be read by a recipient using a Windows laptop in Kathmandu. The standard is managed by a non-profit, the Unicode Consortium, which began operations in the early 1990s. It regularly adds more characters to the list, whether for ancient languages which have letters that academics want to use, or for modern ones with relatively few speakers or with so many characters that some do not yet have an entry on the list. The Script Encoding Initiative, which was established by the University of California, Berkeley, has a list of 100 scripts from South and South-East Asia, Africa and the Middle East that have yet to be incorporated into Unicode. 

The standard started listing codes for emoji in 2010. After emerging in Japan in 1999, emoji spread worldwide in the 2000s, but no operating system or messaging app had a common numbering or representation scheme. So Windows, Android and iOS not only use different graphical renditions of those smiling yellow faces (and rice bowls, etc), but also at one time coded them with different numbers. An emoji texted from one system might appear as a completely different emoji, or even as a blank rectangular box, on arrival. It took the Unicode Consortium to standardise the numbers used, even though the specific appearance depends on the receiving platform or application, which now includes Slack, Facebook and Twitter. The difficulty for Unicode is that demand for more emoji is strengthening. This is driven by the likes of Apple and Google, as well as by businesses, industries, individuals and interest groups keen to see a particular symbol represented. The American state of Maine recently supported a lobster proposal. Proposals for emoji put to the Unicode Consortium must be discussed and voted upon.

Some of the consortium’s members worry that the focus on emoji is distracting from more scholarly matters and delaying the addition of new characters from scripts both ancient and modern. Proposals for frowning piles of poo (the smiling version already exists) drew particular ire, and were described as “damaging…to the Unicode standard”, by Michael Everson, a typographer. The concerns are exaggerated, however, says Mark Davis, co-founder of the Unicode Consortium. While emoji occupy a disproportionate percentage of media attention, the consortium has structured a separate committee to handle most of it. At the same time, Mr Davis notes that the focus on emoji has had beneficial side-effects. Many software products previously lacked Unicode support. But designers keen to incorporate emoji installed upgrades that allowed the inclusion in Unicode of hundreds of languages that would otherwise have been ignored.

Correction (December 18th, 2017): A previous version of this explainer used the word emoticon as a synonym for emoji. The two are different things. The reference to emoticons has been removed. Sorry :(