Multimodal research | Past - UMD, MetaAI, Amazon, IIT Madras | Rants, Memes my own.

Mountain View, CA
Joined April 2015
In SF this afternoon, ping me if you want to meet up. Know a thing or two about visual generative models. :)
1
3
37
Someone told me to go back to where I came from - maybe I should go back to grad school!
1
2
53
Accidentally used DPO instead of GRPO and they kicked me out of SF.
Accidentally used supervised learning instead of RL and they kicked me out of SF
4
2
123
Oh, forgot to mention - you are more likely to get internship interviews or NeurIPS party invites if you call yourself final year. 😉
Life hack: Just call yourself final year PhD! Apparently you can renew it every year. 😜 #phdlife
5
6
3
263
Gowthami retweeted
Accidentally used supervised learning instead of RL and they kicked me out of SF
Accidentally said "hard" instead of "non-trivial" and they kicked me out of SF
2
2
1
49
So much talk about @kscalelabs closing. Why isn't anyone organizing small-time investors to populate this round? I'm happy to invest!
1
1
11
Life hack: Just call yourself final year PhD! Apparently you can renew it every year. 😜 #phdlife
5
2
2
217
X’s algo sometimes reminds me of Amazons algo. You bought a tv last week, here’s lot more tvs we are recommending for you…
2
1
18
Accidentally used google search instead of ChatGPT and they kicked me out of SF.
Accidentally said "hard" instead of "non-trivial" and they kicked me out of SF
15
> be me > come across a paper with interesting premise. > excitedly start reading > claims are on mnist/cifar > all excitement gone, reduced to atoms
10
4
140
I realized you can learn a lot by teaching. We handwave some things to ourselves, but when teaching a concept to someone (especially a larger class), you need a much clearer mental picture! I learned a lot by giving an intro lecture on diffusion models at @Cohere_Labs summer school a couple of months back! So yeah, I'm looking for opportunities - guest lectures, summer schools, or just reading groups. I would love to talk about foundational stuff rather than just my research. I've been working on multimodal vision research for close to 5 years at this point, and I've dabbled in a lot of things. Any pointers would be greatly appreciated! :)
11
9
173
Need to do clustering on long image generation prompts - what's the best embedding model for this purpose? Any pointers? Or should I just go by the MTEB leaderboard?
2
6
Where is Claude when I need him the most! 🤨 (on that note - is @claudeai down? I'm getting rate limit exceeded error but i didnt use it much today. )
1
5
With all the 1x Neo discourse here… didn’t see this meme yet, so, here you go. 😜
11
A few thoughts on the new 1x consumer robot - Honestly, it's a pretty bold move, and definitely exciting. Getting a general-purpose robot into people's homes is a massive milestone. Using remote operators (tele-op) to handle the really tricky, last-minute tasks is actually a very practical solution at the moment. - But the way I see it, the biggest challenge isn't just technical. It's all about public trust. I've been casually asking friends about this exact concept for a while (long before this robot was announced), and the privacy and security issue always comes up first, especially with my women friends. The idea of some remote person having eyes inside their private home was a much bigger concern for them than the robot just being buggy or not working right. It's a totally different and far more personal trust barrier they have to clear. - This means how they handle their first pilot programs is absolutely critical. I'm really, really, really rooting for them to get this right, as it can totally impact the consumer robotics timelines not just for 1x but for everyone.
NEO The Home Robot Order Today
4
20
Why is there a weird distinction in post-training, like SFT first and then RL? Why can't they be done together? IMO, I see merit in interleaving these paradigms. Is there any research pointing to the contrary or supporting this sequential process?
5
22
Why is everyone obsessed with “hopping on call”? Don’t you know - texting >>>> talking for us nerds. 🫣
4
1
61
People who’ve faced failure and recovered fight harder than those who’ve never faced it. Unfortunately, U.S. ecosystems never appreciate such candidates until they’re already successful.
3
1
73
Hallucinations is still a major issue in video understanding models! Go check out our paper where we proposed new quantifying technique for hallucination and omissions of video LLMs on dense captioning task! #ICCV2025
In Hawaii this week to present our work on hallucinations in VideoLLMs! Swing by our poster on Thursday, 11.15 am to talk about the bizarre failure modes even today’s best models face, and feel free to reach out to chat about all things multimodal understanding and generation!
2
13