← Back to storiesHow it works

Reading a handwritten chalkboard with a vision model

Most "menu OCR" you've seen runs on printed text. That works about 95% of the time. What's harder, and what we get every week from cafés, breweries, and chef's-counter restaurants, is the handwritten chalkboard menu — the one with daily specials in someone's loopy cursive next to a stenciled "TONIGHT'S BEER" header.

Here's what we learned shipping the parser.

Why a vision model instead of classic OCR

We use Claude's vision capabilities for the parse step, not a classic OCR pipeline like Tesseract or Google Cloud Vision OCR. The reason isn't that classic OCR can't read text — it can, even handwritten text, with enough preprocessing. The reason is that menus are structured, and modern vision models can extract the structure in the same pass as the text.

A vision model can look at a chalkboard photo and return something like:

{
  "sections": [
    {
      "name": "Specials",
      "items": [
        { "name": "Halibut crudo", "price": "$18", "description": "yuzu, pickled fennel" },
        { "name": "Lamb tagliatelle", "price": "$28", "description": null }
      ]
    }
  ]
}

…directly, without us having to OCR the text, then separately parse the layout, then guess which words are headers vs. prices vs. descriptions. That layout-guessing step was historically the brittle part of menu OCR. With a vision model, it's gone.

Where it works well

  • Printed menus, even ones photographed at an angle, with shadows, with glare. The model handles geometry. We routinely get a 95%+ accurate parse from a phone photo taken at the host stand.
  • Mixed printed and handwritten, like a chalkboard specials board photographed next to a printed dinner menu in one frame. The model treats it as a combined menu with two sections.
  • Multilingual menus — Spanish, French, Italian, Japanese, Chinese. We translate dish names to English in the output (with the original preserved in an originalName field) so the website is searchable.

Where it fails

The patterns that break us, roughly in order of how often we see them:

Heavily stylized cursive. Not regular cursive — that's fine. But the specific look you'd find on a hand-painted sign meant to be art more than menu: extreme letter ligatures, ornamental flourishes, words that loop into each other. The model reads maybe 60% of the dish names; 40% come back garbled or missing letters.

Tiny photos of huge menus. A 1000×750 phone photo of a 24"×36" laminated menu. The model technically reads it, but with confidence dropping to 0.3–0.5 per item. We send those back for review rather than auto-publishing. Customers learn quickly to take their menu photos closer.

Menus where the price column wraps differently than the item column. Two columns of dishes, prices way over on the right, with some items wrapping to two lines on the left. The model usually pairs them right, but on dense menus we sometimes see prices shift up or down a row, especially around items with long descriptions.

Heavy printer streaks or fades on a photocopied menu. If we can't see it, neither can the model. We've started suggesting these go through the email path with the original PDF attached if the customer has one.

The review step

We don't trust any individual parse blindly. Every parse comes with a confidence score from 0.0 to 1.0. Above 0.92, we auto-publish. Below that, the menu lands in your review queue with the model's best guess pre-filled — you edit any wrong dishes, click publish, and it's live.

In practice about 70% of parses we see come back above 0.92. Another 25% need a few edits before publishing. The remaining 5% are usually one of the failure modes above, where the photo just wasn't legible enough to begin with.

What's coming

We're working on a multi-page handoff for menus that don't fit in one photo: text us three or four photos in sequence, we stitch them into one parsed menu version. That's shipping soon. After that: video walkthrough support, for the case where the menu is a six-foot chalkboard wall that nobody can frame in a single shot.

If you want to see what the parser does with your menu, demo's here. It's free, no signup, takes about thirty seconds.