Inspired by Brian Roemmele, I set up DeepSeek-OCR on colab. Even with a T4 GPU and 4-bit quantization, it scans a page in about 45 seconds. In this video, you can see that it compressed 527 text tokens to 249 image tokens.
DeepSeek-OCR is trained on nearly 100 languages, they say in their paper. So, I'm going to try using it on some old manuscripts written in indic languages soon.
IT FREAKING WORKED!
At 4am today I just proved DeepSeek-OCR AI can scan an entire microfiche sheet and not just cells and retain 100% of the data in seconds…
AND
Have a full understanding of the text/complex drawings and their context.
I just changed offline data curation!