Aravind Sundar · Nov 8, 2025 · 5:23 PM UTC

Aravind Sundar · Nov 8, 2025 · 5:23 PM UTC

Aravind Sundar

Aravind Sundar

@aravind3sundar

Nov 8

Imagine turning any document in any language into perfectly readable text, all on a single GPU, no cloud required. That’s exactly what @deepseek_ai -OCR lets you do. This 3B-parameter vision model hits 97% precision while using 10× fewer vision tokens than traditional LLMs, handling tables, papers, and handwriting without killing your GPU or budget. Most vision models choke on long documents because they treat them as massive token sequences. DeepSeek-OCR solves this with context optical compression, turning 2D layouts into compact vision tokens for fast, efficient processing. The coolest part? You can fine-tune it locally for your language, dataset, or domain. I tested it on Persian text with Unsloth, and the results blew me away: → Base model: 149% character error rate (CER) → Fine-tuned model: 60% CER (57% more accurate) → Training time: 60 steps on a single GPU Persian was just a demo; you can plug in any language or document type. I’ve shared a full guide in the next tweet with all the code, notebooks, and setup ready to go. Everything is 100% open source and fully local OCR, with no limits. #deepseek #ocr #OpenSource

Nov 8, 2025 · 5:23 PM UTC