Dream big, and build beyond limits! Learning AI at Carnegie Mellon Univ.

Pittsburgh, US
Joined December 2021
Abhishek Reddy retweeted
Fine-tune DeepSeek-OCR on your own language! (100% local) DeepSeek-OCR is a 3B-parameter vision model that achieves 97% precision while using 10ร— fewer vision tokens than text-based LLMs. It handles tables, papers, and handwriting without killing your GPU or budget. Why it matters: Most vision models treat documents as massive sequences of tokens, making long-context processing expensive and slow. DeepSeek-OCR uses context optical compression to convert 2D layouts into vision tokens, enabling efficient processing of complex documents. The best part? You can easily fine-tune it for your specific use case on a single GPU. I used Unsloth to run this experiment on Persian text and saw an 88.26% improvement in character error rate. โ†ณ Base model: 149% character error rate (CER) โ†ณ Fine-tuned model: 60% CER (57% more accurate) โ†ณ Training time: 60 steps on a single GPU Persian was just the test case. You can swap in your own dataset for any language, document type, or specific domain you're working with. I've shared the complete guide in the next tweet - all the code, notebooks, and environment setup ready to run with a single click. Everything is 100% open-source!
Abhishek Reddy retweeted
System design is the art of making scale look boring. Here are 3 system-design project ideas you can actually build and reason about ๐Ÿ‘‡ Project idea 1: โ€œInstagram-style Feed Serviceโ€ Your goal: design a timeline that scales reads. Key challenges to solve: - fan-out on write vs fan-out on read - caching the feed (Redis? CDN?) - handling the โ€œcelebrity problemโ€ (1M followers) Deliverable: write a design doc that defends why you picked your fan-out strategy and how you avoid thundering herds. Project idea 2: โ€œURL Shortener at 5k RPSโ€ Your goal: tiny API that forces huge decisions. Key challenges to solve: - ID generation strategy (Snowflake IDs? base62?) - consistent hashing across shards - hot key protection Deliverable: build a prototype, hammer it with a load generator, and tune your write path until you get predictable low-latency writes. Project idea 3: โ€œE-Commerce Checkout as a SAGAโ€ Your goal: durability + correctness over everything. Key challenges to solve: โ†’ Payment, Inventory, Order microservices coordination โ†’ Orchestrator vs Choreography โ†’ idempotency and retries Deliverable: show how you avoid double-charging customers through idempotent event handling + a durable orchestrator. Just picking a โ€œcoolโ€ tool wonโ€™t save you. Good system design comes from defending your trade-offs.
14
72
532
Study Von Neumann (Game Theory), Study McCulloch (Neural Networks), Study Jung ( Consciousness).
25
355
8
2,962
Abhishek Reddy retweeted
How do we design effective and safe APIs? APIs have increasingly become the backbone of modern software. To understand some of the key principles and best practices of API design, Let's analyze a social media platform example: ๐Ÿ”น ๐—ฅ๐—ฒ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ป๐—ฎ๐—บ๐—ถ๐—ป๐—ด โ†ณ Clarity is key when creating APIs. Adopting simple resource names, like /users for accessing user profiles and /posts for retrieving user posts, streamlines the development process and reduces mental strain. ๐Ÿ”น ๐—จ๐˜€๐—ฒ ๐—ผ๐—ณ ๐—ฝ๐—น๐˜‚๐—ฟ๐—ฎ๐—น๐˜€ โ†ณ It's important to maintain a standard of consistency in API design. For consistency and readability, use plural resource names, such as GET /users/{userId}/friends vs. /friend), to avoid ambiguity in API requests. ๐Ÿ”น ๐—–๐—ฟ๐—ผ๐˜€๐˜€-๐—ฟ๐—ฒ๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ถ๐—ป๐—ด ๐—ฟ๐—ฒ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐˜€ โ†ณ Interlinking resources, like taking comments on a post using GET /posts/{postId}/comments, simplifies the retrieval of related data. It provides a more streamlined and well-organized user experience. ๐Ÿ”น ๐—œ๐—ฑ๐—ฒ๐—บ๐—ฝ๐—ผ๐˜๐—ฒ๐—ป๐—ฐ๐˜† โ†ณ Maintaining API reliability is crucial. Idempotency ensures that operations like profile updates (PUT /users/{userId}/profile) produce the same result no matter how many times itโ€™s executed. Learn more about idempotency here: lucode.co/idempotency-in-apiโ€ฆ ๐Ÿ”น ๐—ฆ๐—ฒ๐—ฐ๐˜‚๐—ฟ๐—ถ๐˜๐˜† โ†ณ It goes without saying, security is a must-have. To secure the API endpoints, employ authentication methods like X-AUTH-TOKEN and X-SIGNATURE, and use authorization headers for verifying user permissions. ๐Ÿ”น ๐—ฉ๐—ฒ๐—ฟ๐˜€๐—ถ๐—ผ๐—ป๐—ถ๐—ป๐—ด โ†ณ Communicating version updates is another important practice. Endpoints like GET /v2/users/{userId}/posts allow API versioning to maintain functionality regardless of updates. This approach ensures backward compatibility and a smooth transition for users and us. ๐Ÿ”น ๐—ฃ๐—ฎ๐—ด๐—ถ๐—ป๐—ฎ๐˜๐—ถ๐—ผ๐—ป โ†ณ Important for performance. Paginate large datasets, like feeds or comment lists, with GET /posts?page=5&pageSize=20 to enhance data delivery and UX. Great APIs come from good practices. Clear docs, strong monitoring, consistent error handling, and more. Adopting these practices helps us build secure, performant APIs that deliver great user experiences. What else would you add? -- ๐Ÿ‘‹ PS: If you like this post, then you'll love our newsletter. Join 25,000+ software engineers: lucode.co/luc-newsletter-lm1โ€ฆ PPS: You get our Architecture Patterns Playbook for free when you join. Itโ€™s packed with visuals, tradeoffs, & real-world examples. -- ๐Ÿ”– Save for later โ€ข โ™ป๏ธ Repost to help others ๐Ÿ™‹๐Ÿปโ€โ™€๏ธ Follow Nikki Siapno โ€ข Turn on notifications ๐Ÿ””
Abhishek Reddy retweeted
Maryland's "Mathematical Logic" notes by David W. Keuker PDF 1: math.umd.edu/~dkueker/712.pdโ€ฆ PDF 2: math.umd.edu/~dkueker/713.pdโ€ฆ
Abhishek Reddy retweeted
Matrix Cheat Sheet
32
363
37
4,859
Abhishek Reddy retweeted
On the subject of Laplace Transforms, our Differential Equations course is well underway. I'm excited about this one. Here's a sneak peek.
I love @3blue1brown. However, the biggest beneficiaries are those who already have solid fundamentals. Take the recent excellent video on Laplace Transforms. I wouldn't recommend starting there if you've no idea what a Laplace Transform is! Build the fundamental skills first.
Abhishek Reddy retweeted
14
85
8
1,072
Abhishek Reddy retweeted
Multi-head attention in LLMs, visually explained:
Abhishek Reddy retweeted
Linear Regression Image Credit- Data Interview
2
124
676
Abhishek Reddy retweeted
There is a paper from 2017 that introduced a trick that I love but never seen used. Consider two linear layers f and g that you initialize with the same parameters, and then you use h(x)=f(relu(x))+g(-relu(-x)) Then at initialization, h is linear! 1/2
Abhishek Reddy retweeted
Our API handled 500 requests per second with zero issues. Marketing ran a campaign. Traffic hit 5,000 rps. What broke wasn't what we expected: - Application servers? Fine. - Database? Fine. - Load balancer? Fine. - Our rate limiter crashed because we stored rate limit counters in Redis with no memory limits. Redis ran out of memory. Rate limiter failed open. Actual traffic hit the API unthrottled. Then everything crashed. We optimized for scale but not for the failure modes of our safety mechanisms.
Abhishek Reddy retweeted
One of the most common mix-ups in statistics is between standard deviation (SD) and standard error (SE). They sound similar, but they describe two completely different thingsโ€”and using the wrong one can lead to misleading conclusions. Here's how to tell them apart. ๐Ÿ”น Standard Deviation (SD): SD measures how spread out individual values are in your sample. It tells you about the variability within the data set. Example: How much do individual incomes vary in a sample of 1,000 people? ๐Ÿ”น Standard Error (SE): SE measures how much an estimate (like a mean or proportion) would vary across repeated samples. It tells you how precise your estimate is. Example: How much would the sample mean income change if you ran the survey again? As your sample gets larger, SE gets smaller because you're more confident in your estimate. But SD often stays about the same since it reflects the natural spread in the data, not how many observations you have. Use SD to describe the data, and SE to describe the reliability of the estimate. For more on statistics, data science, R, and Python, subscribe to my email newsletter. Click this link for detailed information: eepurl.com/gH6myT #datastructure #Python #RStats #Python3 #Data #programmer
Abhishek Reddy retweeted
I feel the Gemma 3 models are quite underrated. - They are multimodal (each image โ†’ 256 tokens). - Have a robust tokenizer (~260k unique tokens, one of the most efficient tokenizers Iโ€™ve seen across multiple languages). - Support a 128k input context. - Offer relatively fast inference and the training speed is faster when compared to qwen2.5/3vl - Are very easy to fine-tune, and they generalize well to many downstream tasks using both SFT and RL (GRPO). - Are widely supported across frameworks. Iโ€™ve been working with them for a while theyโ€™re not groundbreaking models, but they make an excellent starting point for fine-tuning espeically the 4b parameter model can be finetuned on a T4 gpu as well If any one is interested can share a sample finetuning script
13
5
1
100
Abhishek Reddy retweeted
Everything about Derivative and Integral.
18
166
4
1,060
Abhishek Reddy retweeted
Extracts clean data from documents using vision-language models
3
48
410
Abhishek Reddy retweeted
We reproduce deepseek-ocr training from scratch, the code, model, results can be found in our website #DeepSeek pkulium.github.io/DeepOCR_weโ€ฆ
5
34
3
279
Abhishek Reddy retweeted
I'm looking for a student researcher to work with me at Google DeepMind in London, preferably starting early next year -- topics will be around novel video model architectures / learning from a single video stream / representation learning .
Abhishek Reddy retweeted
It is viva day, my dudes.