Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! โ€ข Co-founder @dailydoseofds_โ€ข BITS Pilani โ€ข 3 Patents โ€ข ex-AI Engineer @ LightningAI

Learn AI Engineering ๐Ÿ‘‰
Joined July 2012
Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR is a 3B-parameter vision model that achieves 97% precision while using 10ร— fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. โ†ณ Base model: 149% character error rate (CER) โ†ณ Fine-tuned model: 60% CER (57% more accurate) โ†ณ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!
Here are the benchmarks for token efficiency and retrieval accuracy as provided by the Toon team. You can find the same information in their GitHub repo: github.com/toon-format/toon
1
1
34
A simple trick cuts your LLM costs by 50%! Just stop using JSON and use this instead: TOON (Token-Oriented Object Notation) slashes your LLM token usage in half while keeping data perfectly readable. Here's why it works: TOON's sweet spot: uniform arrays with consistent fields per row. It merges YAML's indentation and CSV's tabular structure, optimized for minimal tokens. Look at the example below. JSON: { "๐˜‚๐˜€๐—ฒ๐—ฟ๐˜€": [ { "๐—ถ๐—ฑ": ๐Ÿญ, "๐—ป๐—ฎ๐—บ๐—ฒ": "๐—”๐—น๐—ถ๐—ฐ๐—ฒ", "๐—ฟ๐—ผ๐—น๐—ฒ": "๐—ฎ๐—ฑ๐—บ๐—ถ๐—ป" }, { "๐—ถ๐—ฑ": ๐Ÿฎ, "๐—ป๐—ฎ๐—บ๐—ฒ": "๐—•๐—ผ๐—ฏ", "๐—ฟ๐—ผ๐—น๐—ฒ": "๐˜‚๐˜€๐—ฒ๐—ฟ" } ] } Toon: ๐˜‚๐˜€๐—ฒ๐—ฟ๐˜€[๐Ÿฎ]{๐—ถ๐—ฑ,๐—ป๐—ฎ๐—บ๐—ฒ,๐—ฟ๐—ผ๐—น๐—ฒ}: ๐Ÿญ,๐—”๐—น๐—ถ๐—ฐ๐—ฒ,๐—ฎ๐—ฑ๐—บ๐—ถ๐—ป ๐Ÿฎ,๐—•๐—ผ๐—ฏ,๐˜‚๐˜€๐—ฒ๐—ฟ It's obvious how few tokens are being used to represent the same information! To summarise, here are the key features: ๐Ÿ’ธ 30โ€“60% fewer tokens than JSON ๐Ÿ”„ Borrows the best from YAML & CSV ๐Ÿคฟ Built-in validation with explicit lengths & fields ๐Ÿฑ Minimal syntax (no redundant braces, brackets, etc.) IMPORTANT!! That said, for deeply nested or non-uniform data, JSON might be more efficient. In the next tweet, I've shared some benchmark results demonstrating the effectiveness of this technique in reducing the number of tokens and improving retrieval accuracy with popular LLM providers. Where do you think this could be effective in your existing workflows? Find the relevant links in the next tweet!
Multi-head attention in LLMs, visually explained:
Replying to @_avichawla
ReAct is really popular. Here's an example from CrewAI, which uses ReAct to let LLM think through problems and use tools to act on the worldโ€‹.
1
1
6
Turn any GitHub repository into rich, navigable docs. Simply replace "github" with "deepwiki" in the repo URL.
As usual, Anthropic just published another banger. This one is on building efficient agents that handle more tools while using fewer tokens. Agents scale better by writing code to call tools and the article explains how to use MCP to execute this code. A must-read for AI devs!
3
5
1
65
XBOW raised $117M to build AI hacking agents. Now someone just open-sourced it for FREE. Strix deploys autonomous AI agents that act like real hackers - they run your code dynamically, find vulnerabilities, and validate them through actual proof-of-concepts. Why it matters: The biggest problem with traditional security testing is that it doesn't keep up with development speed. Strix solves this by integrating directly into your workflow: โ†ณ Run it in CI/CD to catch vulnerabilities before production โ†ณ Get real proof-of-concepts, not false positives from static analysis โ†ณ Test everything: injection attacks, access control, business logic flaws The best part? You don't need to be a security expert. Strix includes a complete hacker toolkit - HTTP proxy, browser automation, and Python runtime for exploit development. It's like having a security team that works at the speed of your CI/CD pipeline. The best part is that the tool runs locally in Docker containers, so your code never leaves your environment. Getting started is simple: - pipx install strix-agent - Point it at your codebase (app, repo, or directory) Everything is 100% open-source! I've shared link to the GitHub repo in the replies!
RAG vs. CAG, clearly explained! RAG is great, but it has a major problem: Every query hits the vector database. Even for static information that hasn't changed in months. This is expensive, slow, and unnecessary. Cache-Augmented Generation (CAG) addresses this issue by enabling the model to "remember" static information directly in its key-value (KV) memory. Even better? You can combine RAG and CAG for the best of both worlds. Here's how it works: RAG + CAG splits your knowledge into two layers: โ†ณ Static data (policies, documentation) gets cached once in the model's KV memory โ†ณ Dynamic data (recent updates, live documents) gets fetched via retrieval The result? Faster inference, lower costs, less redundancy. The trick is being selective about what you cache. Only cache static, high-value knowledge that rarely changes. If you cache everything, you'll hit context limits. Separating "cold" (cacheable) and "hot" (retrievable) data keeps this system reliable. You can start today. OpenAI and Anthropic already support prompt caching in their APIs. I have shared a link to OpenAI's prompt caching guide in the replies. Have you tried CAG in production yet?