In this post, I’ll go over the comparison of 2 models in the task of retrieving the total out of a receipt from a purchase.
I downloaded an open dataset from here: link
The experiments
The experiments I ran were with these two models:
The full output of the experiment can be found here
The full repository with the details of the experiment is here: Github Repository
The results
Here are a few examples of the receipts I parsed:
The results were fascinating, the accuracies:
Granite Docling - 60.95% accuracy
Moondream2 - 92.38% accuracy
One caveats is that you’ll have to clean the data with regexes or by removing an extra dot at the end for the granite docling model. You can see how that works in the repository.
The Moondream2 model gave me pretty clean output, it just had inconsistency in giving me a “$” sign, which also may just be due to how the receipt image is structured where sometimes the dollar sign is next to the number and sometimes it isnt’.
And there’s still room for improvement in particular by experimenting with the prompt you pass to each model to get a higher accuracy.
With these results I feel very comfortable recommending Microsoft’s Moondream2 as a model to extract totals out of receipts in an app running in production.
There will need to be some user tolerance for error with receipt scanning but product-wise I’d guess that there will be some user tolerance for error given that scanning a photo is way faster by an order of magnitude (typing down the details of a receipt takes about a minute, scanning it takes about 5-10 seconds) where the 8% of the time that the receipt isn’t parsed correctly does NOT impact the experience negative.
Granite Docling is still decent, since it’s a tiny model (256M params), you’ll need much less memory to load it on a device and inference will run much faster.
Let me know what you think!
If you need any help with any image data extraction tasks let me know!
Shoutouts
Thanks ExpressExpense for the open dataset!
https://expressexpense.com/blog/free-receipt-images-ocr-machine-learning-dataset/