Building @kodomamo_JP Presently focusing on LLM Finetuning and Scaling

Tokyo
Joined September 2016
This tweet is about how I have studied ML and made it my profession. I'll share the resources I've used and the sequence of my study. Straight to point ML pre-requisites(maths) : Linear Algebra, Probability Theory, Calculus, Optimization Theory(optional), Information theory(optional) Linear Algebra: Lecture course by Gilbert Strang Probability theory: MIT 6.041 (it contains parts of Bayesian inference as well) Calculus: your high school and college classes are enough Once basic maths is done then we move to ML. Classical ML : CS229. Either by Andrew NG or someone else. Follow their lecture notes and solve their problem sets. Reference books for classical ML that I followed: PRML by Christopher bishop, Pattern Classification by Duda, Hart and Stork After getting comfortable with classical ML we move to Deep Learning and everything else. Deep Learning and Computer Vision: CS231n. Very good lecture and assignments Reference book: Deep Learning by Ian Goodfellow. This is the best book on deep learning. I’ve read some chapters of it many many times. Beautiful maths and intuitions MLOps: dvc, WandB, MLFlow NLP: I just read hugging face blogs. I haven’t spent much time with classical NLP though. Alignment/AI safety/AI explainability: Anthropic Blogs(I’m a noob in this, just started learning couple months ago) Additionally: Blogs: Lilian Weng(OpenAI)’s blogs, colah’s blogs Additionally: arxiv. I read many papers from arxiv Karas and Tensorflow blogs: for introductory code about modern deep learning frameworks Competitions: Kaggle Cloud compute. GCP/collab/Kaggle notebooks PS: this is not a roadmap. Just what I followed till now and I find it quite structured. Even after 5 years I still find myself learning new stuff everyday.
84
563
35
4,070
I have explored the idea of loading pre-cached data since last 4-5 years. The issue with data inconsistency always haunts me when I want to implement this. > data inconsistency: local data and live data are often out of sync, should we show "incorrect" data to user at the cost of speed for 2-3 seconds? Loading screens were invented for this Maybe there's a reason firestore doesn't provide cache as first option for data fetching
Replying to @jacksharkey11
How? We implement a sync engine to store all your data in a local SQLite db. Upon app launch, we start observing the database to render your whop sidebar. Simultaneously, a background task fetches a snapshot of the data needed to render each inner whop. When you click the whop, we render it with the snapshotted data, then observe the local DB to stream in live data. The background task loads the data in a prioritized queue, starting from your last selected whop, then in order of your most used ones. This was all built by our incredible iOS team: - @mahir_oberai (led this effort // his phone in the demo) - @zebra_dev - @ohwhen - @jantschulev
1
I'm sorry what?
Crazy things are happening in Gujarat
1
> our defense export has a net surplus > semiconductor fab is getting started and going at good pace > automobile export is increasing at a good pace in non-US markets > our pharma export is irreplaceable > electronics export catching up as well Don’t worry, India will be fine.
holy shit AGI will be an existential threat to India
3
25
Python everyone There’s static type checking Then there’s no static type checking Then there’s Python, where you incorrectly annotate types and just live with it
2
10
We suffer because we take seriously what god created for fun.
1
2
19
Saurabh Kumar retweeted
And the award for shameless racist of the day goes to…
1 Trillion parameter MoE beast btw
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built as a thinking agent, K2 Thinking marks our latest efforts in test-time scaling — scaling both thinking tokens and tool-calling turns. K2 Thinking is now live on kimi.com in chat mode, with full agentic mode coming soon. It is also accessible via API. 🔌 API is live: platform.moonshot.ai 🔗 Tech blog: moonshotai.github.io/Kimi-K2… 🔗 Weights & code: huggingface.co/moonshotai
5
This was a turning point in NVIDIAs history
1
1
33
“Engineers wanted K8s experience” No engineer want that I can assure you.
Kubernetes migration almost killed our startup. Where we were: - 8 EC2 instances - Ansible for deploys - Boring but working - $1200/month AWS bill Why we migrated: - New investor wanted 'cloud-native' - Engineers wanted K8s experience - Competitors were using it - Seemed like the future 6 months later: - 3 engineers spending full-time on K8s - AWS bill at $4500/month - Deploys took longer than before - More outages, not fewer - Product development stalled We rolled back: - Moved to ECS Fargate - 2 week migration - Back to $1800/month - Engineers back on features K8s is amazing for scale. We weren't at scale. Technology should solve problems you actually have.
3
44
This!!
Replying to @drummatick
Indian IT industry is basically an extension of American economy into Indian territory which thrives (for employees) on USD-INR arbitrage. You won't find a single country in developed world where doctors get lower pay than Engineers.
2
25
I’m convinced the way to increase sales of any product is to write “Japanese style” Welcome back, 1980s Japan product supremacy
s-s-ss-sugoi
1
21
The wage disparity between engineers and other professions in India is one of a kind. Haven’t seen anything like that anywhere
Software engineers in India make more money than Airline pilots?
17
3
2
288
What next? “use cpu” on directives?
This marvel is built in TypeGPU, a TypeScript WebGPU library [1/2] Among other coolness, it features a "𝚞𝚜𝚎 𝚐𝚙𝚞" directive that compiles JS to WSGL, to run on the GPU: 𝚌𝚘𝚗𝚜𝚝 𝚊𝚍𝚍 = (𝚊, 𝚋) => { "𝚞𝚜𝚎 𝚐𝚙𝚞"; 𝚛𝚎𝚝𝚞𝚛𝚗 𝚊 + 𝚋; }
3
The dumbest person you know is currently being told “wow you’re so smart” by ChatGPT.
1
1
21
Saurabh Kumar retweeted
撮影してる職員が水与えてやれよ
Laughing so hard, but yes he's the GOAT
17
これ日本に例えたら広島県知事に気に入らないやつが当選したから原爆投下の映像を流してるってことだよな エグすぎて泣く
WAKE UP NEW YORK!
>what kind of skills are you testing me for Ideally good to ask this question before the interview is done, like in an email to HR or whoever is hiring.
A friend of mine did this to crack every single interview he ever took. Whenever an interview started, even before the interviewer would speak he would start by asking them a question as a part of conversation. Like "Oh, hi! We've not spoken before. Who are you, what do you do?" and this one line changes the entire power dynamic. Now the interviewer is answering the questions. My friend would then go on to ask a few more questions like "what kind of skills are you testing me for" or "how do you want me to answer". And this sets him up as a charismatic leader who can ask questions and answer them as well. And people love this. Small talk is a life hack if done properly istg.
1
8
Was there some update to Claude Sonnet 4.5 on cursor Acting weirdly dumb
1
10