Just How do you Type in Chinese? Introduction to Chinese Input Methods

Did you know that one has to learn specifically how to type in Chinese?

Chinese keyboard with various root-based method key drawn in

Ah, the ultimate question for the computer age. I’ve been blogging for almost six years now. As my non-Chinese speaking friends start to learn of my ventures (spoiler alert: there isn’t’ much), people seem particularly interested to know how Chinese characters can magically appear on-screen from an English keyboard. Having gathered some research, I can tell you it’s more varied but complicated than I had imagined!

As all words in English, outside of Internet slang, can be composed of a ‘mere’ 26 alphabets, the designers of the Latin keyword typewriter had a relatively straightforward mission: the job is done so long as all 26 alphabets are represented in some logical order. Unfortunately, the printers and the computer scientists in the Chinese-speaking world was dealt a far more daunting task. The language contains no less than 50,000 individually distinct characters, of which at least a couple thousand is the minimum requirement for the most basic of communications. Therefore, unless you wished for a keyboard running for the entire length of the Great Wall of China, quite clearly the QWERTY approach won’t cut it.

Instead, Chinese speakers rely on the so-called ‘input methods’, which breaks down all Chinese characters so that they can be entered through a Latin keyboard. At least a dozen methods are in everyday use, each with their own merits and shortcomings. While some methods are popular, or even ubiquitous in a particular region, there is no one standard for all.

Thus ‘learning to type’ has a further special and academic meaning for Chinese speakers. Input methods are frequently taught at schools and to learn and master at least one way of imputing Chinese forms a part of everyday professional life. In the same vein as a driving license, it is something that an employer would seek in job advertisements.

Most of the input methods fall into one of two categories, based on ‘phonetics’ or ‘root shapes’:

Phonetics

Do you get it? I don’t get it.

Phonetics-based methods are the most popular in mainland China and Taiwan. Its basic premise  is via transliteration. That is to say, you sound out the pronunciation of individual Chinese characters using the Latin/English alphabets. For example, for 媽媽 (“mum”), you’d type “ma ma”. In turn, the input method will conjure up the relevant character you required. Simple.

The advantages of phonetic-based methods stem from its ease of learning. The transliteration you require has been standardised.  All children in mainland China will have been taught the Pinyin system of romanisation that forms the basis of the most popular input method there. Most Western learners of Chinese would also be taught Pinyin, due to its widespread popularity in mainland China and, well, because it’s still much easier to deal with the Xs and Zs common in Pinyin than the Chinese characters themselves.

A similar-in-purpose but utterly different system called Zhuyin exist across the Taiwan strait, in another typical episode of contrast between China and Taiwan.*

Assuming you are already familiar with Pinyin or similar systems, a phonetic-based method requires no further learning. Indeed, you do not even need to know how exactly to write the character. All you require is what the character sounds like and recognising it from a list when prompted. Therefore, it is no wonder that most English-speaking learner of Chinese will mostly rely on phonetics.

However, a method like Pinyin does suffer from two inherent shortcomings. Firstly, many Chinese characters are homophones, in that they are pronounced the same way despite being written differently and meaning very different things. While this isn’t a phenomenon limited to Chinese, (an English example would be ‘new’ and ‘knew’) the connection (or lack of) between how a Chinese character is written and pronounced leads to some very drastic and confusing examples, along with a plethora of pun opportunities.

For example, 悲劇  (“tragedy”) and 杯具 (“cups”) sounds the same in Mandarin; in fact, the latter has become a euphemistic slang for the former. Alternatively, some words are pronounced the same way but are written ever so slightly differently while carrying an entirely different meaning: just look at 過度 (“excess”) and 過渡 (“transition”). 

As a result, when inputting in Pinyin you’d often have to select the exact character you required from a sea of options. Even with the emergence of predictive text which somewhat alleviated the matter, typing phonetically remains the much slower option of the two.

In addition, speakers of other Chinese languages/dialects complicate everything. Pinyin refers to the Mandarin pronunciation and thus only makes sense if you speak Mandarin; for Cantonese speakers, many of whom were taught neither Pinyin nor the Cantonese Jyutping**this means you’d have to learn both Mandarin and the Pinyin system to type phonetically, negating its biggest advantage.

This is partly why Pinyin is never as popular in Cantonese-speaking Hong Kong then the following…

Root-based (Cangjie)

Instead of a phonetics-based approach, the other major method focuses on how a Chinese character is constructed. The idea is simple enough: to break down characters into parts, then assigned them a keyboard key(in a seemingly random order) You then type all these keys in a particular order (usually top left to bottom right) in order to arrive at your exact character.

This is best explained with an example. In Cangjie, the most common of all root-based input methods, the 24 most basic parts are each represented by a keyboard key (X and Z are the sad omissions). The easiest of the lot, a horizontal line (一, meaning “one”), is assigned M.

Let’s say you want to type 明(brightness). You can see that the character is a combination of 日(“sun”, assigned key: A) and 月 (“moon”, assigned key: B). Following the order of top-left corner first, you would input 明 by typing… AB!

Unfortunately, the Chinese characters do not fit perfectly in this mould. There are at least another seventy-odd shapes that routinely forms popular characters. The problem is resolved by ‘auxiliary shapes’: parts that ‘sort-of-look-like’ a basic shape are grouped in the same key. Another example to demonstrate: the top-left bit of the character 祭(“sacrifice”) is similar enough to 月 (“moon” as above) and is likewise grouped under B. Complicated right?

You’ve probably gauged that root-based methods have a steep learning curve. Indeed, its difficulty to master is its major weakness. Just to be able to begin typing, you’d need to learn ALL basic shapes (often via keyboard covers with them inscribed), plus ALL auxiliary shapes (many of whom requires leaps of your imagination for its concept of ‘similarity’), AND the general order of how a character is written by hand. These are very high thresholds even for fluent Chinese speakers; months of training may be required just for the performance of a basic computing function.

In addition, any root-based methods rely on the exact shape of a character, which can be radically different in Traditional and Simplified Chinese (there’s a whole other piece on this; suffice to say they are two separate scripts). Indeed, Cangjie was a Taiwanese invention originally intended for the Traditional script only, and it was until decades later that a Simplified version was created.

You may ask, why do people bother then with such a challenging and cumbersome method? The answer lies with its speed. Once you’ve mastered Cangjie, you can transcribe the specific character without selecting them from a list, or pay any attention to the screen at all; as a result, it is hands-down the quickest method available.

Root-based methods can be helpful to Chinese learners too. Since the method breaks down characters based on its appearance, if (again, big if) you are familiar with all the shapes, then, in theory, you can type any Chinese characters that you ever see in your life?

As a compromise between speed and difficulty, Quick is the little cousin of Cangjie. Instead of typing in every single shape found in the word, you’d only type the first and last shape required, whereby a list of character fitting the criteria would pop up. It’s much easier to learn but consequentially also slower.

It’s how I type.

If you are unfamiliar are uninterested by either method, there is always the third, best option. GOOGLE TRANSLATE.

* I have absolutely no idea how Zhuyin functions.

**I have absolutely no idea what Jyutping is until researching for this piece.