Receipt parsing - A comparison between a small and a medium sized model

vision
ml
ai
jpg
png
image recognition
granite
smol
docling
Author

Yasu Flores

Published

September 20, 2025

In this post, I’ll go over the comparison of 2 models in the task of retrieving the total out of a receipt from a purchase.

I downloaded an open dataset from here: link

The experiments

The experiments I ran were with these two models:

The full output of the experiment can be found here

The full repository with the details of the experiment is here: Github Repository

The results

Here are a few examples of the receipts I parsed:

Example receipt 1

Example receipt 2

The results were fascinating, the accuracies:

  • Granite Docling - 60.95% accuracy

  • Moondream2 - 92.38% accuracy

One caveats is that you’ll have to clean the data with regexes or by removing an extra dot at the end for the granite docling model. You can see how that works in the repository.

The Moondream2 model gave me pretty clean output, it just had inconsistency in giving me a “$” sign, which also may just be due to how the receipt image is structured where sometimes the dollar sign is next to the number and sometimes it isnt’.

And there’s still room for improvement in particular by experimenting with the prompt you pass to each model to get a higher accuracy.

With these results I feel very comfortable recommending Microsoft’s Moondream2 as a model to extract totals out of receipts in an app running in production.

There will need to be some user tolerance for error with receipt scanning but product-wise I’d guess that there will be some user tolerance for error given that scanning a photo is way faster by an order of magnitude (typing down the details of a receipt takes about a minute, scanning it takes about 5-10 seconds) where the 8% of the time that the receipt isn’t parsed correctly does NOT impact the experience negative.

Granite Docling is still decent, since it’s a tiny model (256M params), you’ll need much less memory to load it on a device and inference will run much faster.

Let me know what you think!

If you need any help with any image data extraction tasks let me know!

Go here if you’d like to chat

Shoutouts

Thanks ExpressExpense for the open dataset!

https://expressexpense.com/blog/free-receipt-images-ocr-machine-learning-dataset/