Why CJK OCR is its own problem
OCR for English is a solved problem in 2026. The interesting questions are about everything else. Chinese, Japanese, and Korean each bring a different mix of difficulties: thousands of glyphs instead of a few dozen letters, mixed scripts within a single line, vertical typesetting, decorative typography that bleeds into images, and writing systems that have been evolving for two millennia. A model that gets 99 percent on English newspaper text might land at 92 percent on a scanned Japanese paperback, and that 7-point gap is where most of the user pain lives.
The good news is that Apple has invested heavily in CJK OCR through the Vision framework, and as of macOS 14 Sonoma all three languages are first-class citizens. The bad news is that "first-class" doesn't mean "perfect." There are real edge cases worth knowing about before you trust an OCR pipeline with a research paper, a manga panel, or a screenshot of your boss's KakaoTalk message.
This guide walks through what Live Text gets right for each language, where it falls short, and how to fill the gaps without uploading anything to a cloud service.
What macOS Live Text actually supports
Apple's official Live Text language list as of macOS 14 includes English, Spanish, French, Italian, German, Portuguese, Chinese (Simplified and Traditional), Japanese, Korean, Ukrainian, and Thai, with Vietnamese added more recently. macOS 13 Ventura was the release that brought Simplified Chinese, Traditional Chinese, and Japanese into the supported list. macOS 14 Sonoma added Korean. If you're on macOS 12 Monterey or older, you're stuck with English-only Live Text and you should plan to upgrade before fighting the OCR battle.
One of the most useful properties of Apple Vision is that you don't have to declare the language. The framework runs language detection internally and can mix scripts inside a single recognition pass. A screenshot of a Tokyo train station sign with Japanese, English, and Korean labels comes out as one block of correctly-segmented text. A WeChat message that mixes Chinese sentences with English brand names lands the way you'd expect. This is a real advantage over OCR tools that ask you to pick a language before each scan.
Where Live Text falls short for CJK
Live Text is the right tool when your source is already inside an Apple-native surface — Photos, Preview, Safari, Notes, Quick Look. Hover the pointer, the text becomes selectable, you copy. Done.
It gets less convenient as soon as you step outside that bubble. A few patterns recur across all three CJK languages:
- Third-party app windows. Slack, Notion, Zoom, WeChat, LINE, KakaoTalk, Discord, and most browsers' video players don't expose Live Text inside the window itself. You have to screenshot first, then open the screenshot in Preview, then use Live Text. That's three extra steps for what should be one.
- Multi-column layouts. Academic PDFs in any CJK language often have two or three columns plus footnotes. Live Text reads them in image order, not column order, so you get the first line of column one followed by the first line of column two — useless for downstream copy-paste.
- Vertical text. Modern Japanese and traditional Chinese can both be set vertically. Live Text handles cleanly typeset modern vertical text reasonably well, but loses accuracy quickly on older typesetting, decorative covers, and anything where line spacing is irregular.
- Decorative or stylized typography. Manga sound effects, K-pop album cover lettering, Chinese calligraphy, and modern brand logos that lean on art-direction more than legibility all degrade accuracy.
- Classical script. Classical Chinese (文言文, 古籍), pre-modern Japanese, and older Korean texts use vocabulary, glyph variants, and layouts the model wasn't trained on. Live Text is competent on modern printed editions but unreliable on old prints and manuscripts.
None of these are reasons to dismiss Live Text. They're reasons to know its shape, so you reach for a different tool when the shape doesn't fit.
How Cheese! OCR handles CJK
Cheese! OCR is a menu bar app that calls the same Apple Vision framework Live Text uses. That has two consequences worth being honest about. First, on the same image, accuracy will match Live Text — we're not running a different OCR model and there's no secret sauce. Second, the difference comes from workflow, not recognition quality.
The workflow is: press a global hotkey (default ⇧⌘E), drag-select any rectangle on screen, and the recognized text is on your clipboard. It works the same whether the text is in a Photos window, a Slack thread, a YouTube video paused on a foreign-language slide, a WeChat chat, a Korean PDF in Preview, or a manga page in a comics reader. There is no "open in Preview first" detour.
Two other things make a meaningful difference for CJK use. We ship with all four supported languages enabled by default — English, Simplified Chinese, Japanese, Korean — so the first time you OCR a Japanese train timetable or a Chinese receipt, it just works without preference clicks. And the menu bar history keeps a searchable log of every capture, so when you OCR a Chinese term you didn't recognize and look it up later, you can re-find the original in two seconds.
Privacy is worth flagging because CJK users are often scanning sensitive material — work documents in Japanese, medical paperwork in Korean, financial statements in Chinese. Cheese! OCR uses Apple Vision entirely on-device. The app ships without a network entitlement, which is verifiable in the Mac App Store sandbox metadata. Nothing is uploaded, nothing is logged on a server, no model training happens on your captures. It's the same privacy property Live Text has, with the same engine doing the same work.
Chinese: simplified, traditional, mixed, classical
Modern simplified and traditional
Apple Vision treats Simplified and Traditional Chinese as two distinct trained variants and auto-detects which one is on screen. You can OCR a Hong Kong newspaper, a Taiwanese product label, and a Mainland WeChat post in the same session without changing settings. The output is verbatim — Cheese! OCR doesn't auto-convert between simplified and traditional. If you need conversion, run the captured text through a tool like OpenCC or a converter built into your editor.
Chinese mixed with English
Modern Chinese writing borrows English heavily — brand names, technical terms, Latin abbreviations like CEO and PDF, sometimes whole quoted phrases. Apple Vision handles these well in a single pass. A Zhihu post that drops "ChatGPT" into a Chinese sentence comes out clean. The most common failure mode is when Latin letters are styled to look like Chinese characters or vice versa, in which case you may see one or two glyph confusions.
Scanned academic and foreign-journal PDFs
Chinese-language academic PDFs and Chinese translations of foreign journals are a frequent OCR target. Native PDFs (where text is already a layer in the file) don't need OCR — copy works directly. Scanned PDFs (where every page is an image) do, and this is where Live Text's column-order weakness shows up. For a multi-column scanned paper, OCR each column separately rather than the whole page at once. Yes, it's two captures instead of one. The output is much cleaner.
Classical Chinese and old prints
Apple Vision was not trained primarily on 古籍 (classical Chinese prints), and accuracy reflects that. Modern reprints of classical works in standard typography come out fine. Photographs of actual block-printed editions, handwritten manuscripts, or stone rubbings often don't. For research-grade work on classical texts, dedicated tools like the Tripitaka Koreana digitization toolchain or specialized humanities OCR projects are still ahead of consumer-grade frameworks. Cheese! OCR is a fine first pass for getting most of the text into your editor, where you can hand-correct.
Japanese: kanji, kana, vertical, manga
Modern printed text
Standard horizontal Japanese — newspapers, magazines, websites, business documents — is one of Apple Vision's strengths. Mixed kanji, hiragana, and katakana within a single sentence are recognized as one stream. Loanwords in katakana that often trip lesser OCR (パソコン, スクリーンショット, ホットキー) come out clean.
Vertical typesetting
Vertical Japanese (縦書き) in modern printed novels and magazines is handled well when the typesetting is clean and line spacing is regular. Older books with mixed line widths, ruby annotations (furigana), and irregular margins lose some accuracy. If a vertical scan comes out scrambled, try cropping a single column at a time.
Manga and comics
Manga is the genre where OCR struggles most, and it's worth being upfront about it. Speech bubbles with clean typography are usually fine. Sound effects (擬音, ぎおん) painted into the artwork in stylized lettering, hand-drawn shouts, and dialogue that bleeds into background detail all degrade. For straight reading and translation purposes, Cheese! OCR will pull most of the speech-bubble dialogue accurately. For complete sound-effect transcription, expect to hand-correct.
Business documents and email screenshots
Japanese business documents, internal Slack screenshots from a colleague, and quoted email captures are the day-to-day OCR use case for many users. These come out reliably. The most common workflow is: paste the screenshot into a chat window or a Notion page, drag-OCR the relevant section, paste the recognized text into a translation tool or a meeting note.
Korean: PDFs, webtoons, KakaoTalk, hanja
Modern Hangul
Korean Hangul is the most uniformly typeset of the three CJK scripts, and accuracy on clean printed Korean is excellent. The 2,350-character KS X 1001 set covers virtually all modern Korean, and Apple Vision is well-trained on it. Korean PDFs (네이티브 or 스캔), Naver search results, and government documents all OCR cleanly.
Korean PDFs and academic material
The same multi-column caveat applies: a two-column scanned Korean paper is best OCR'd column by column. For native Korean PDFs with a text layer, OCR isn't needed at all — copy works directly. The mistake we see most often is users running OCR on a native PDF and getting a slightly worse copy than they would have with plain copy-paste.
Webtoons and KakaoTalk screenshots
Webtoons (웹툰) on Naver and Kakao Page render dialogue in stylized speech bubbles. Standard typography reads fine. Sound effects and decorative onomatopoeia drawn into the panel struggle the same way manga does. KakaoTalk message screenshots are a frequent OCR target — a friend forwards a screenshot of a long block of Korean text, and you want it as plain text to paste into a translator. This case is reliable.
Hanja in Korean text
Older Korean documents and academic writing sometimes embed Hanja (한자, Chinese characters used for Korean meaning). Apple Vision recognizes these as Chinese characters within the Korean text stream. The output is correct, though the script labels the characters as Chinese, which is fine for most purposes.
The decision: which tool, when
For CJK on Mac in 2026, our honest read:
- Use Live Text when your text is already inside Photos, Preview, Safari, Quick Look, or Notes. It's free, it's built in, and it produces the same Apple Vision output any third-party tool would.
- Use a global-hotkey OCR tool like Cheese! OCR when the text lives in WeChat, KakaoTalk, LINE, Slack, Notion, Zoom, a YouTube video frame, a paused movie subtitle, or any other surface where Live Text doesn't reach. The win is workflow, not recognition.
- Use a specialist tool when you're working on classical Chinese block prints, vertical pre-modern Japanese, handwriting, or any source where general consumer OCR is going to give you a 70-percent first draft. Apple Vision is not the right tool there, and neither Live Text nor Cheese! OCR will rescue you.
For everything that lives between those poles — Chinese receipts, Japanese train signs, Korean PDFs, mixed-script manga dialogue, business documents in any of the three — Apple Vision is genuinely good, and the only question is whether you want to type ⇧⌘E or open Preview first.