If you've used a Mac in the last few years you've probably seen Live Text in action even if you didn't know its name. Hover over a photo of a sign or a screenshot stuck inside Preview and the cursor turns into a text caret. You highlight, you copy, you move on. It's the kind of feature that quietly disappears into the workflow, which is the highest compliment a system feature can earn.
So why write a comparison piece? Because once you start needing OCR for work — pulling a paragraph out of a Slack screenshot, grabbing a Zoom slide, copying a code snippet from a YouTube tutorial paused in Chrome — you bump into the edges of Live Text faster than you'd expect. The edges aren't bugs. They're the natural consequences of how Live Text was integrated into macOS. Understanding where they are tells you whether you can stop at the built-in tool or whether you need a second one in your toolbox.
This piece is the team's honest take on both tools. We make Cheese! OCR, so the bias is real. We've also tried to call out where Live Text is genuinely the better choice, because pretending otherwise would waste your time.
What macOS Live Text is, and what it does well
Apple introduced Live Text in macOS 12 Monterey (2021) and refined it in subsequent releases. The core idea is that any image rendered by an Apple-controlled view — say, an image in Photos, a page in Preview, a frame in QuickTime, an image inside Safari — can be analyzed in place. The text becomes selectable as if it were native text. You don't trigger anything; the recognition runs on demand when you start interacting with the image.
The places where it shines are predictable and pleasant:
- Photos.app: any photo you've ever taken with text in it — a receipt, a whiteboard, a business card, a street sign — becomes searchable and copyable.
- Preview.app: open an image (PNG, JPEG, HEIC) and you can select text directly. For PDFs, Live Text handles native PDFs trivially because the text is already there; it also recognizes scanned PDFs in many cases.
- Safari: paused video frames, images inside articles, and even text inside SVG-rendered content are usually selectable.
- Notes and Mail: drop in an image, and the text inside it is selectable.
- Quick Look: the floating preview from the Finder spacebar shortcut also exposes Live Text.
For the casual user, this is plenty. If your daily OCR need is "occasionally pull a phone number off a photo," you don't need anything more.
The four scenarios where Live Text falls short
Once you start working with text-heavy material across multiple apps, four gaps tend to show up.
1. Third-party apps that render their own image viewers
Live Text is plumbed into AppKit's image rendering, but third-party apps with custom image components don't always pick it up. Slack's image viewer, Discord's lightbox, Notion's embedded images, Telegram's media preview, and many enterprise apps render images in ways that bypass the system text-selection layer. You can see the text on screen and you cannot select it.
The workaround inside macOS — open the image in Preview — is fine for one image but tedious as a habit. You have to right-click, save, find the file, open it, then OCR. By that point you could have screenshotted and OCR'd twice.
2. Video conferencing
Zoom, Microsoft Teams, Google Meet (in Chrome), and Webex frequently mark their meeting windows with screen-protection or DRM-style flags. Apple's screen capture and Live Text both respect those flags in many configurations. The result: you see a slide, you try to drag-select the bullet points, nothing selects. Sometimes a screenshot of the meeting saves as a black rectangle.
This isn't malice on the conferencing apps' part — it's a privacy-by-default choice meant for sensitive presentations. But for the perfectly mundane case of "I want to copy a URL from this slide," it's a wall. A dedicated screenshot OCR tool that uses the system's screen-recording APIs can usually capture and recognize the visible pixels, depending on how the source app draws its frames.
3. Protected PDFs and certain corporate documents
Native PDFs with text layers are easy: any tool can copy from them, Live Text included. Scanned PDFs are harder, and Preview's Live Text usually handles them. The trouble starts with the in-between cases:
- Password-protected PDFs that allow viewing but disable copy and Live Text.
- DRM-protected PDFs distributed by some publishers and corporate document systems.
- PDFs rendered inside specific reader apps that disable text selection at the app level.
If you're allowed to read the document but not to select its text, you have a legitimate use case for OCR: you're not bypassing access control, you're working around UI-level copy restrictions on a document already in your hands. Cheese! OCR captures the rendered pixels and returns the text, the same way you might transcribe a paragraph by hand.
4. Video frames outside Safari
Live Text in paused video works great in Safari. It does not work in Chrome, Firefox, or Edge, because those browsers use their own rendering engines and don't expose frames to AppKit's Live Text hooks. It also doesn't reach IINA, VLC, or most media players. If you're watching a coding tutorial in Chrome and the instructor types out a command, you're left transcribing it.
Same problem, same solution: a screenshot OCR tool reads the pixels regardless of which app put them there.
How Cheese! OCR fills the gaps
Cheese! OCR is a menu bar app that does one thing. You press a global hotkey (default ⇧⌘E, configurable). Your screen dims, the cursor becomes a crosshair, and you drag-select a rectangle. The recognized text lands on your clipboard. Paste it wherever you need it. The whole interaction takes about two seconds once muscle memory kicks in.
A few details that matter for the comparison:
- Same Apple Vision engine. Cheese! OCR is built on the Vision framework — the exact pipeline that powers Live Text. Accuracy on printed text is comparable. We are not claiming a custom model that beats Apple's; we are using Apple's, then wrapping it in a different workflow.
- Cross-app by definition. Because the input is "whatever pixels are on your screen," it doesn't matter whether the source is Slack, Zoom, IINA, a Citrix session, or a remote desktop window. If you can see it, you can OCR it.
- 100% on-device. The app's sandbox declares no network entitlements at all. You can verify this in the Mac App Store privacy report. No screenshot ever leaves the machine.
- Searchable history. Every recognition is timestamped and stored locally. If you pasted a paragraph into a chat an hour ago and lost it, the history pane finds it.
- Multi-language by default. English, Simplified Chinese, Japanese, and Korean are recognized out of the box, automatically. You don't toggle a language for each capture.
None of these are revolutionary on their own. Together they cover the four gaps above without forcing a context switch.
When Live Text is actually enough
It's worth being explicit about the cases where you don't need a second tool:
- You only OCR occasionally and only inside Apple apps. Photos, Preview, Safari, Notes. If that describes your week, Live Text is free, fast, and right there.
- You want full image-text selection, not just copy-paste. Live Text lets you drag through text inside an image to highlight specific words. Cheese! OCR captures whatever you box-select and returns the entire result.
- You don't want any third-party app at all. A reasonable preference. Live Text is built in, signed by Apple, and updated through the OS.
- You need accessibility-level live recognition. Live Text integrates with VoiceOver and other accessibility services; a screenshot OCR tool is a different category of workflow.
If any of these describe you, stop reading and use the tool you already have. We mean it.
Side-by-side comparison
| Aspect | macOS Live Text | Cheese! OCR |
|---|---|---|
| Price | Free, built into macOS 12+ | $5.99 one-time, no subscription |
| Coverage | Apple-native apps and WebKit views | Anything visible on screen |
| Trigger | Hover and select text inside an image | Global hotkey + drag-select |
| Languages | Apple Vision (multiple, on-demand) | EN / ZH-Hans / JA / KO recognized automatically |
| History | None | Local, searchable |
| Engine | Apple Vision, on-device | Apple Vision, on-device |
| Network | None | Zero entitlements, verifiable |
| Best for | Casual, in-app text grabs | Cross-app, repeatable, high-volume |
A simple decision framework
Three rules cover most situations.
- If you can hover and the cursor turns into a text caret, use Live Text. It's already loaded.
- If the source is a third-party app, a video conference, a non-Safari browser, or a protected document, reach for the hotkey.
- If you do this more than a few times a week, configure the hotkey and let muscle memory take over. The choice between tools should be subconscious within a day or two.
In practice we keep both tools enabled all the time. Live Text handles the polite, native cases automatically; Cheese! OCR handles the awkward ones. If you'd like to try the second half of that setup, the app is on the Mac App Store. Below the FAQ we've also linked to a few related articles you might want to read after this one — particularly the PDF guide and the screenshot guide, which build on the same ideas in more depth.