Shylesh Kumar retweeted
Turn any GitHub repository into rich, navigable docs. Simply replace "github" with "deepwiki" in the repo URL.
Shylesh Kumar retweeted
She blocked me from everywhere. Not because I cheated. Not because I was broke. Not even because I said her sister was more beautiful. But because, in her words: "No matter what you do, you never win in life." At first, I thought she was just calling me a loser. Then my inner math brain clicked... She was literally describing the martingale principle. In martingale theory, fair games are designed so your expected fortune never goes up or down, no matter how many times you've won or lost in the past. The math says: all the history you drag with you? It doesn't change what happens next. A martingale is a mathematical framework for modeling fair games and stochastic processes where the conditional expected value of the next observation, given all past observations, equals the current value. Formulas Discrete time: - E[X(n+1) | X(1), X(2), ..., X(n)] = X(n) General (with filtrations): - E[X(t) | F(s)] = X(s) for all s < t Where - X(n) = Value of the process at time n - E[·|·] = Conditional expectation - F(s) = Information (filtration) available up to time s Let's take an example and solve step by step. A gambler plays a fair coin flip game: - Heads: Win $1 - Tails: Lose $1 - Probability: P(H) = P(T) = 0.5 - Starting fortune: X(0) = 50 We want to verify this is a martingale. Step 1: Current fortune After 3 flips, suppose the fortune is $52: X(3) = 52 Step 2: Possible outcomes for the next flip - Heads → X(4) = 52 + 1 = 53 - Tails → X(4) = 52 - 1 = 51 Step 3: Compute conditional expectation - E[X(4) | X(1), X(2), X(3)] - P(H) · X(4) + P(T) · X(4) - 0.5 × 53 + 0.5 × 51 - 26.5 + 25.5 = 52 Therefore, E[X(4) | past] = X(3) = 52 Conclusion: The expected fortune after the next flip equals the current fortune → this is a martingale. Congratulations 🎉 , you've just learned Martingale Theory! Bonus: Applications in AI/ML Reinforcement Learning Value functions in Markov Decision Processes (MDPs) rely on martingale properties to ensure convergence of temporal-difference (TD) learning algorithms. Stochastic Gradient Descent (SGD) The noise in mini-batch gradient estimates forms a martingale difference sequence, enabling rigorous convergence analysis and generalization bounds in neural network training. Sequential Hypothesis Testing Likelihood ratio tests under the null hypothesis form martingales, supporting efficient stopping rules in A/B testing and statistical quality control (e.g., Wald's SPRT).
Shylesh Kumar retweeted
starting next week, and to make everything easier to find, i’m moving all my LLM/AI educational content, past and future, to Substack if you’ve been learning from my threads, consider subscribing
Shylesh Kumar retweeted
I've worked with @kylancodes to put together a little dedicated Replicate model that gives you: - all the face images - a sprite/grid of all the faces - vanilla HTML, JS and CSS to render the effect - a preview video replicate.com/kylan02/face-l…
I can’t be the first one to do this… right? Just built a python script using one of @fofrAI’s models on @replicate to map out a grid of images of my face looking in every direction on my portfolio website, tracking the mouse on screen. Should I drop the code on GitHub?
Shylesh Kumar retweeted
got reminded of one of my most fav tweet on the internet
Shylesh Kumar retweeted
Share your GitHub profile, I’ll review it and drop feedback!
Shylesh Kumar retweeted
rainy day in edinburgh
Shylesh Kumar retweeted
Drop your website and let's see how much creative you are in frontend engineering? here is mine: ruturajbayad.netlify.app
Shylesh Kumar retweeted
1 Year into Machine Learning One year ago, I took my first step into Machine Learning Here is what one year of consistency looks like: ➤ Stanford CS229 - Machine Learning ➤Stanford CS109 - Probability & Data Science ➤Mathematics for ML and Data Science- Coursera ➤Hands-On ML with Scikit-Learn Keras & TensorFlow ➤ Deep Learning Specialization - Andrew Ng ➤Machine Learning Cookbook ➤ 100 Days of Machine Learning ➤Python & Scikit-Learn - freeCodeCamp ➤ Data Visualization - freeCodeCamp ➤ Tensorflow - FreeCodeCamp ➤3 small projects ➤2 major projects (one ongoing!) ➤ 3 hackathons - learned a ton each time Year 2 loading…
Shylesh Kumar retweeted
My scrambled eggs shader is really simple - the most important things are the heavy SSS and high roughness combined with low roughness clearcoat component. The displacement modifier with cloud noise gives a little more chunky details. And sprinkled chives help sell the illusion.
36
298
30
4,276
Shylesh Kumar retweeted
For Day 2 of #30DayMapChallenge (Lines): did you know that mapgl has a built-in measurement tool in its draw control? Use `show_measurements = TRUE` to interactively measure line distances (and polygon areas) on your map. #rstats
Shylesh Kumar retweeted
A 155-foot Hindu statue is rising in North Carolina, and suddenly everyone is arguing about American culture. Funny how we celebrate European cathedrals and Greek/Roman architecture here without a blink… but a Tamil monument triggers panic. Is the real debate about a statue or about who gets to shape the future face of American culture?
Shylesh Kumar retweeted
The modern way to read a research paper 👇 Surprisingly, reading a paper 4 times is faster than reading it once. Adapted from @eugeneyan’s 3-pass approach. As someone who was not in the habit of reading papers, this has made it way easier for me to understand and retain information from technical papers. My version: 1 / SKIM the abstract, intro, and conclusion and highlight key ideas and main points 2 / RE-READ the intro and conclusion, also read headers/top of each section. 3 / DEEP READ through the entire paper and take notes, annotate. 4 / (BONUS) If there’s code, drop the repo into Windsurf + walk through the implementation with CodeMaps. CodeMaps by @windsurf is one of my favorite under-rated features. you can step through line-by-line the paper's implementation inside of the code. Finished a great morning read on pruning experts to compress SMoE (sparse MoE) models—congrats to @Vithu and the @Cerebras team on the paper.
19
115
7
1,245
Shylesh Kumar retweeted
if you pause this at any moment the tldraw disappears
Shylesh Kumar retweeted
Best OCR ever, huh?
No, its not the best OCR ever here is the result from olmoOCR2 on the same and it does have a frightening degree of accuracy
35
64
16
1,605
Shylesh Kumar retweeted
Does anyone have more of this genre of image of old and new cultures colliding
Shylesh Kumar retweeted
Transforming OpenStreetMap data into a stunning 3D city! Can't wait to share the tutorial with everyone this weekend Who's ready to dive in?
I find these mfers funny but sometimes it's hard to bear
18
7
3
588
Shylesh Kumar retweeted
- MSE you use when you don’t have outliers - RMSE you use when you want to interpret the above better - MAE you use when you have positive/zero/negative values and outliers - MAPE you use when you only have positive values and emphasize interpretability - RMSLE you use for positive values with nonnormal distribution - wMAPE you use when you want MAPE but have large vs small values - sMAPE you use when you want MAPE but have zero/negative values - R2 you use because your boss only knows this one
TOP 8 Machine Learning Regression Metrics
12
205
5
1,905
Shylesh Kumar retweeted
Converts PDF documents to Markdown format using DeepSeek-OCR with FastAPI backend. This guy bave it 10000 pdfs to convert to markdown. averaging less than 1 second per page. Hardware - 1 x A6000 ADA on a Ryzen 1700 /w 32gb ram Dockerized model with fastapi in a wsl environment.
👨‍🔧 Inside the smart design of DeepSeek OCR DeepSeek-OCR looks like just another OCR model at first glance, something that reads text from images. But it’s not just that. What they really built is a new way for AI models to store and handle information. Normally, when AI reads text, it uses text tokens (the units that LLMs process). Each word or part of a word becomes a token. When text gets long, the number of tokens explodes, and this makes everything slower and more expensive because the model’s computation cost grows roughly with the square of the number of tokens. That’s why even the most advanced models struggle with very long documents. 💡DeepSeek’s core idea was simple but revolutionary: Instead of feeding an LLM thousands of text tokens, it turns long text into an image, encodes that image into a small set of vision tokens, then lets a decoder reconstruct the text. The team asked a simple question, how many vision tokens are minimally needed to decode N text tokens, and they measured it end to end. The paper reports about 97% OCR precision when compressing text by 9–10x, and about 60% precision even at 20x. This shows that dense visual representations can carry the same information far more efficiently than plain text tokens. The engineering that makes this practical is a new encoder called DeepEncoder. It processes high resolution pages without blowing up memory by doing local window attention first, then a 16x convolutional downsampler, then global attention. That serial design keeps activations small while aggressively cutting token count. So why this is a big deal? Context is the currency of LLMs, and it is expensive. If visual tokens can represent past dialogue, documents, or code at 10x smaller size with high fidelity, you can keep far more context active, cut costs, and speed up inference. The paper also sketches a practical “forgetting” mechanism, you can progressively downscale older context images so recent information stays sharp while older context becomes cheaper over time, which matches how human memory fades. This makes long running assistants, RAG replacements, and whole codebase in context workflows much more realistic.
38
271
8
2,321