How to get a CAP (cantonese IME /w swearwords) and coexist with mozc (Japanese IME) in Linux

I’ve been using CPIME for ages and I’m comfortable with Sidney Lau’s phonetic scheme. Jyutping is unnatural to Hongkongers because we do not consider ‘j’ a ‘y’ sound like Germans do.

However, since Windows 10, there aren’t much choices when it comes Cantonese IME that defaults to Sidney Lau’s and yet it accommodates common swearwords (including the most common ones that were technically incorrect) well. The only reasonable choice is Andrew Choi’s CAP, which I will write about how to get it working for Windows 10 on another blog post.

There aren’t much choices for Linux either. There’s RIME, but it’s super hard to install and Sidney Lau’s phonetic scheme is buried deep down that can only be changed with shortcut keys during the IME composition mode. The deal breaker is t.he lack of swearword support. Being able to type 林鄭我𨳒你老母 is essential for every self-respecting Cantonese speaker. 「屌」你老母 just won’t cut it. Lol.

CAP takes a quite bit of wrestling to get it to install in Windows (in another post) and quite a bit of wrestling to get it to function in Linux. Once you get it working, it’s a very powerful Cantonese IME that allows superfast typing unless you plan to play with words (玩食字). I just can’t praise the IME design enough and I was willing to deal with the quirks which curbs its wide adoption.

Andrew Choi made a few release at his blog page in 2012 (ibus), 2015 (ibus), 2018 (fcitx), 2019 (fcitx4) and 2021 (fcitx4). For the linux version, this blog post is only concerned with the 2021 version (latest at the time of writing).

For Linux CAP, installing the debian package is the easiest part:

sudo dpkg -i fcitx-cap_1.0.0_amd64.deb

The thorns are

  1. get the CAP show up on the list of valid input methods
  2. fend of fcitx5 which is trying to kill CAP
  3. deal with IME settings state corruption (especially when working with other IME)
  4. live with being unable to select characters from subsequent pages of selection candidates

CAP is not immediately available as an IME out of the box
(even after installing the .deb package)

.

You will need to add Chinese to “Install / Remove Language” under “Language Support” to get anything to show up there!

Fend off fcitx5

fcitx5 was recently released and Ubuntu is aggressively trying to push it onto every user. However, its very existence kills the currently available CAP which is written for fcitx4 as of the 2021 release. This means you will have to give up fcitx5 if you want to use CAP!

Fcitx5 is considered as a replacement for fcitx4, so whenever Ubuntu sees that you have fcitx installed (which is likely fcitx4), it’ll tempt you into installing fcitx5. DO NOT ACCEPT THE INVITATION! However, fcitx5 do not coexist with fcitx4. Your fcitx4 will be removed the moment you installed fcitx4. To go back to fcitx4, you have first to remove fcitx5 completely then re-install fcitx.

What makes things more complicated is that Ubuntu’s gnome Language Support GUI keeps prompting you to install fcitx5 whenever you start it or do something with it such as installing new languages (which is required as the first non-obvious step). It’ll typically try to deceive you into installing fcitx5 with a dialog box like this:

but if you open up the details it’s fcitx5 which will cockblock CAP

However, when you try to perform the first step, if you already have fcitx (fcitx4) installed, adding new languages (required to get CAP) to work will come bundled with upgrades to fcitx5! It’d be super frustrating. So you can choose between the two paths

1) Concede to fcitx5 and downgrade to fcitx4

  • install the languages first (with fcitx5 IMEs),
  • remove fcitx5
  • install fcitx4

2) Prevent fcitx5 in the first place

  • remove fcitx4,
  • install the languages (no fcitx IMEs)
  • install fcitx4

Remove fcitx4

sudo apt purge fcitx

Remove fcitx5

sudo apt purge fcitx5*

Install languages

If you already have fcitx installed (path #1), you’ll have to click yes and live with fcitx4 being upgraded to fcitx5 which you’ll have to destroy it later and reinstall fcitx4.

If you already removed fcitx, Language Support will only install IME for other systems such as ibus associated with installed languages

Install fcitx4 AND activate it

sudo apt install fcitx

Remember to select the installed Fcitx 4 as your IME system (not ibus, etc, or none):


Now CAP is on the list of available IMEs in fcitx-configuration


Learn the new shortcut keys that are different from Windows

One default out of the box that’s hard to guess is moving from page to page. It used to be PageUp/PageDown but CAP follows fcitx’s global configuration moving between pages, which is the lower case of +/- keys which is basically =/- because + is upper case while it was intended to be lower case. I know, this is confusing!

IME switching follows the OLD windows shortcut keys (like Windows 98 and XP days), which

  • Ctrl+Space means turning IME on/off (global sense),
  • Ctrl+Shift changes IME languages (newer Windows use Alt+Shift by default)
  • Shift to temporarily disable/enable the IME (i.e. English mode) but stay within the language state

More customizations to get it closer to Windows IME behavior

CAP follows the global config settings in fcitx, unlike mozc (Japanese IME) which sometimes play by its own rules which behaves similar to its Windows’ counterpart.

If you are used to CPIME’s vertical lists, you can change it in ‘Appearance’ tab.


CAP candidate selection quirks when used with mozc (bug?)

For some reason, when both CAP and mozc are freshly installed, the first time you use the candidate list in mozc by selecting space/tab, the candidate list will disappear!

I installed and uninstalled fcitx, mozc and CAP and realized narrowed the bug to this reproducible path. My suspicion is that there’s a setting regarding the candidate selection shortcut (usually by ‘1’~’9’+’0′) parameter state that’s not exposed in fcitx-configuration that was being changed my mozc. And this guess puts me closer as I was able to play around with mozc’s config and found a candidate selection shortcut option

Note that mozc only has a max shortcut of 9 items (instead of 10, that means the ‘0’ key is not available as shortcut key) despite fcitx-configuration’s Global Config has a different idea (which CAP can use the 10th key, aka ‘0’ as candidate selection shortcut)

I noticed that after switching ‘1’-‘9’ to ‘a’-‘l’ (or no shortcut) mode, activate it in mozc by using space key to expand selection (this is necessary or the change won’t happen), I get the ‘1’-‘9’+’0’ candidate selection shortcut when I go back to CAP. I also noticed if I messed with the maximum number of suggestions in mozc a few times, I can get into an undefined state in CAP where it shows the candidate selection shortcut for the first few but not the rest, such as this:

I also noticed CAP has one consistent bug that the candidate selection (not just the keys) ONLY WORKS FOR THE FIRST PAGE! I tried to use the candidate selection shortcuts or click on the character with mouse for subsequent pages, it only commit the current word choice disregarding the selection!

 13 total views,  1 views today

Advanced classical Taiwanese swears 100年前台灣人都罵什麼髒話?絕對不只「X你娘」,一口氣譙完

https://www.storm.mg/lifestyle/225114

相較於源遠流長直接幹到開台第一代祖先的「幹你開基祖」,「幹你娘」還真的遜色很多,而當年清代漢人移民台灣產生的種種髒話,更是精采。因為泉州人看漳州人不爽,所以有了「幹你大聖王」(大聖王指開漳聖王)這句,直接幹上神明,了不起(放在今日,大概就像你看基督徒不爽,而怒罵「幹你耶和華」)。

由於當年醫療衛生環境不佳、死亡率高,詛咒別人去死,也成了慣用的侮辱手法,威力驚人。例如「死無半個點香點蠋(無人送終)」、「汝著死的十字路頭被狗哺(橫死十字路口頭還被狗咬)」、「拾骨頭尋無墓(後人撿骨找不到墓)」,句句都是要人命。

 275 total views

Input Methods (IME) in Linux: Fcitx

IBus is considered as retiring, but it’s still the default in MX Linux. Because the only Cantonese IME in Linux that allows me to swear is Andrew Choi’s CAP, which runs on fcitx, I settled for fcitx as my default IME engine.

Languages

  • Cantonese: Download the debian package for CAP
  • Japanese: Mozc is already installed
  • Simplified Chinese: Pinyin is already installed

Shortcuts (Very much like Windows):

  • Ctrl + Space: turn it on/off
  • Ctrl + Shift: switch between languages
  • Shift: in and out of temporary English mode (inactivate) within the language

I’ve moved the contents of setting Fcitx CAP to this page as the release fcitx5 turned this already tricky process into a maze.

 618 total views

Cantonese IME for Windows 10

There are not many decent Cantonese IME around. The best option for Windows 7 and before are CPIME. It borderline worked for Windows 8/10 (desktop mode only), but I heard recently Windows 10 broke it in its 1903 update.

Dr. Choi kindly wrote another Cantonese IME called CAP, which I came across while looking for Cantonese IME for Linux. This is the only option that works with Windows 10 natively (apps and desktop).


Getting CAP 2018 to install
[Deprecated, please use CAP 2021 instead, see below]

Unfortunately the installer failed on a fresh Windows 10, saying that “CAP.dll” cannot be registered. I looked at the error code and it usually suggest a missing dependency for the DLL. I used Dependency Walker to look at what’s broken and noticed those are Visual C++ 2015 DEBUG runtime DLLs. Since debug builds aren’t suppose to have a redistributable runtime (it’s actually called NonRedist), the only solution is to install the community edition of Visual C++ 2015 to obtain these DLLs.

Note that “Common Tools for Visual C++ 2015must be included (installed) so the IME won’t be broken (grayed):

The cause is the missing UCRTBASED.DLL. The files are located at:

C:\Program Files (x86)\Windows Kits\10\bin

It’s under the (x86) variant of Program Files regardless of whether it’s 32-bit or 64-bit.

The missing link to API-MS-WIN-CORE-PATH-L1-1-0.DLL is not important.

After you installed the IME after installing Visual C++ 2015 (any flavor, minimal is OK), you can remove Visual C++ 2015 without breaking the IME, EXCEPT you need to back up the UCRTBASED.DLL first and put it next to the core CAP.DLL file for the IME:

C:\Program Files\Sixth Happiness\CAP\x64

Getting CAP 2021 to install

CAP 2021 still won’t install on fresh installation of Windows 10, and I ran it through Dependency Walker and noticed it’s missing VCRUNTIME140_1.dll. Based on this post, this is part of the Microsoft Visual C++ 2019 Redistributable:

Microsoft rolled the runtimes for 2015, 2017, 2019 and 2022 into one package, so if you want missing 2019 runtime DLLs, you might as well install it. This time the package didn’t use any debug version of the runtime like in 2018, which makes life much easier.

 533 total views

不是懶趴是卵脬,不是雞是膣

 

屄/閪 膣 (誤讀 雞)
屌/𨳒 姦 (誤讀 幹) 肏 (誤讀 操)
𡳞/𨶙,㞗/𨳊,杘/𨳍,
脧脧/JerJer
𡳞/卵鳥 (誤讀 懶鳥) チンポ (珍宝)
春袋 蛋蛋 卵脬 きんたま(金玉)
儸柚 屁股 尻(kha)川(tshng) おしり (お尻)

 

 

真趣味的台灣俗諺(尻川): http://www.tma.tw/ltk/107611213.pdf

 1,116 total views

Obscure differences between Kanji and Chinese characters

People who already know Chinese characters are often said to have the advantage of being able to pick up Japanese quickly. However, to learn it properly, in addition to the  difference between infix (English, Chinese) and reverse polish (Japanese) notations, it also comes with quite a bit of baggage. It’s the differences that requires work to observe, such as:

  • some made up ‘Chinese’ characters (和製漢語),
  • some are written slightly differently, including artistic variations
  • some has a completely different meaning,
  • some has opposite preferences for using which character in the pair when simplifying
  • and some has drastically different overtones despite they technically mean the same thing
  • the mixture of simplified and traditional characters, occasionally a character written like simplified Chinese means something totally different from traditional Chinese, such as 机(つくえ)which means desk vs 機(キ)which means machines or chances depending on the context.
  • the roles of historical and modern writings are randomly reversed

学習 is a good example. Modern Chinese considers 学 to be more colloquial (e.g. 学武功)and 習 to be more formal (e.g. 習武). Japanese is the other way round for 学ぶ and 習う。学ぶ has a more serious tone.


Actually, the kinds of variations mentioned above applies to regional differences in Chinese languages (such as Taiwanese, Cantonese and Mandarin). Most places agree to write Chinese in a way that can be read directly using Mandarin so that we can at least communicate on paper. So as time goes by, we lost the ability to write in Taiwanese and Cantonese. I hope it’ll change as both dialects are very colorful. Re-expressing them in Mandarin will take away all the flavors in them.

It’s evident that humans can pick up more than one language, so there is no reason to compromise dialects in the process of standardization. People advocating to kill other languages are simpletons who believe in the kind of logic supporting a competitive system: you find ways to make your peers do worse to stay ahead, instead of improving yourself.

Different regions occasionally have different preferences for character order in phrases. Basically we have to watch out for all kinds of combinations. Like 介紹 is used in the same order for Taiwanese/Cantonese/Mandarin to mean introduction, but it’s reversed 紹介(しょうかい) in Japanese. To make it a total mindfuck, Mandarin sticks with 客人 for guests, which is used the same way as Japanese’s 客人(きゃくにん), Taiwanese mostly says 人客, while Cantonese uses both with slight overtones: 客人 is usually used as a particular noun (e.g. 呢位客人) while 人客 is often used as a collective noun (e.g. 人客嚟齊未?), most likely because 客人 sounds more formal than 人客.


Putting traditional and simplified Chinese aside, different regions have different preferences for Chinese characters. I couldn’t tell the difference between traditional Chinese characters used in Hongkong/Macau (港澳繁體) and Taiwan (台灣正體) on Wikipedia, and later learned that it was because I’ve been randomly mixing both all along and nobody ever pointed it out.

裏/着 (Hongkong) vs 裡/著 (Taiwan) are good examples. For these two, modern Japanese sided with Hongkong in the character choices for 裏(うら) and 着(ちゃく). On the other hand, 峰(みね) in Japanese sided with the Taiwanese’s preferred writing 峰, while the 峯 is the ‘officially’ preferred writing in Hongkong.

I remember writing 峰 most of the time even when I was a kid and only used 峯 for names that specifically calls for it. We respect the original writing for names. This is the similar situation as in Japanese: 沢(さわ/たく) is used in most cases and reserve 澤(サワ) for names that specifically requests to be written in this form. The only difference is that I used the official character 峯 exclusively for names, while using the off-label 峰 for the rest.

Speaking of names, there are some similar-looking characters that has the same Japanese sound (かな) but are actually different in both writing and meaning. 斉藤 and 斎藤 are different, but they are easily confused for native Japanese speakers who don’t have any Chinese language background. Here’s the table for comparison:

齊/齐・斉 齋/齋・
Meaning Gathered, organized Plain, house, recitations
Cantonese chai (cai4) jaai (zaai1)
Taiwanese tsè tsai
Mandarin qi2 zhai1
Japanese (音読み:さい) 斉しい・等しく いつき・(潔斎)物忌み

The bottom line is: as language evolves, different regions have different preferences about what can they be sloppy about and what they must be meticulous about. They also reorder/tweak things to make them flow smoothly with their dialect. This means traps for for those learning a new language that are close to what they’ve already mastered.

I came across a document called 常用漢字表 released by the Agency for Cultural Affairs (文化庁) that explains all the quirks of Kanji that was carefully collecting on my own while taking the classes. Wish I had it back in the days. Here’s the link, but I also saved a local copy of 常用漢字表 just in case if their website moves around in the future.

 581 total views,  1 views today