Artist and Engineering Prof @UCBerkeley, Co-Founder @AmbiRobotics & @JacobiRobotics. Interests: robots, rockets, redwoods, rebels.

California
Joined November 2009
This talk @MIT starts at 3pm in 45-230 (the new Schwarzman Lab)
Looking fwd to discussions after this today 11:40am-12:40pm ET: "How to Close the 100,000-Year 'Data Gap' in Robotics" | @Columbia University columbia.edu/content/events/…
2
21
Looking fwd to discussions after this today 11:40am-12:40pm ET: "How to Close the 100,000-Year 'Data Gap' in Robotics" | @Columbia University columbia.edu/content/events/…
1
1
1
27
Ken Goldberg retweeted
Robotics has been VC's favorite way to lose money. ~10 cents returned for $1 invested over the past decade. Hardware is hard, they said. They were right. At our partner offsite, I presented on Physical AI. "This time is different." Famous last words! But converging tailwinds are rewriting the equation: powerful VLMs, edge compute, lower cost hardware, and top talent commercializing breakthrough research. Publishing excerpts from the internal presentation w/ @AlexandraSukin @bhavikvnagda Why this time is actually different, what we're looking for, what we're avoiding. Founders, hit us up!
Ken Goldberg retweeted
DOFBOT dropped, a $1K Robot Dog. Rover X1 is a companion dog hat can carry your groceries, run with you, do home security Again: $1k! An iPhone Pro is more expensive than this robot. x.com/XRoboHub/status/198524…
Looking fwd to speaking and discussions on “Good Old Fashioned Engineering Can Close the 100,000 Year 'Data Gap' in Robotics,” this Tues 3:30pm @Penn, then @Columbia on Wed, @BostonDynamics on Thurs, and @MIT on Friday: cis.upenn.edu/events/
6
8
1
84
Ken Goldberg retweeted
Holy shit… Meta might’ve just solved self-improving AI 🤯 Their new paper SPICE (Self-Play in Corpus Environments) basically turns a language model into its own teacher no humans, no labels, no datasets just the internet as its training ground. Here’s the twist: one copy of the model becomes a Challenger that digs through real documents to create hard, fact-grounded reasoning problems. Another copy becomes the Reasoner, trying to solve them without access to the source. They compete, learn, and evolve together an automatic curriculum with real-world grounding so it never collapses into hallucinations. The results are nuts: +9.1% on reasoning benchmarks with Qwen3-4B +11.9% with OctoThinker-8B and it beats every prior self-play method like R-Zero and Absolute Zero. This flips the script on AI self-improvement. Instead of looping on synthetic junk, SPICE grows by mining real knowledge a closed-loop system with open-world intelligence. If this scales, we might be staring at the blueprint for autonomous, self-evolving reasoning models.
Doing some research on grasping:
5
36
0
Very curious to see how this goes!
Initial hype for the 1X NEO cooled after demos showed the robot was still fully tele-operated, sparking skepticism online. decrypt.co/346648/humanoid-h…
9
Bravo to you Kevin! (and to @GoogleDeepMind for selecting you.)
Super happy and honored to be a 2025 Google PhD Fellow! Thank you @Googleorg for believing in my research. I'm looking forward to making humanoid robots more capable and trustworthy partners 🤗
1
21
Cleaning, augmenting, and curating demonstration data.
Replying to @Ken_Goldberg @X
Just curious. What do you think should be the main focus points for incoming PhD researchers in the RL + Robotics space? I’m currently in the process of PhD applications and want to narrow down my focus. Would love to hear your thoughts given your extensive experience!
3
7
1
81
But avoiding cars and pedestrians is very different than interacting with complex deformable objects: “Same neural networks will power Optimus humanoid robot.”
A new 30-minute presentation from @aelluswamy, Tesla’s VP of AI, has been released, where he talks about FSD, AI and the team’s latest progress. Highlight from the presentation: • Tesla's vehicle fleet can provide 500 years of driving data every single day. Curse of Dimensionality: • 8 cameras at high frame rate = billions of tokens per 30 seconds of driving context. • Tesla must compress and extract the right correlations between sensory input and control actions. Data Advantage: • Tesla has access to a “Niagara Falls of data” — hundreds of years’ worth of collective fleet driving. • Uses smart data triggers to capture rare corner cases (e.g., complex intersections, unpredictable behavior). Quality and Efficiency: • Extracts only the essential data needed to train models efficiently. Debugging and Interpretability: • Even though the system is end-to-end, Tesla can still prompt the model to output interpretable data: 3D occupancy, road boundaries, objects, signs, traffic lights, etc. • Natural language querying: ask the model why it made a certain decision. • These auxiliary predictions don’t drive the car but help engineers debug and ensure safety. Tesla’s Advanced Gaussian Splatting (3D Scene Modeling): • Tesla developed a custom, ultra-fast Gaussian splatting system to reconstruct 3D scenes from limited camera views. • Produces crisp, accurate 3D renderings even from few camera angles — far better than standard NeRF/splatting approaches. • Enables rapid visual debugging of the driving environment in 3D. Evaluation & World Models: • Evaluation is the hardest challenge: models may perform well offline but fail in real-world conditions. • Tesla builds balanced, diverse evaluation datasets focusing on edge cases — not just easy highway driving. Introduced a learned world simulator (neural network-generated video engine): • Can simulate 8 Tesla camera feeds simultaneously — fully synthetic. • Used for testing, training, and reinforcement learning. • Allows adversarial event injection (e.g., adding a pedestrian or vehicle cutting in). • Enables replaying past failures to verify new model improvements. • Can run in near real-time, letting testers “drive” inside a simulated world. What’s Next: • Scale robotaxi service globally. • Unlock full autonomy across the entire Tesla fleet. • Cybercab: next-gen 2-seat vehicle designed specifically for robotaxi use, targeting lowest transportation cost (cheaper than public transit). • Same neural networks will power Optimus humanoid robot. • The same video generation system is now being applied to Optimus. • The system can simulate and plan movement for robots, adapting easily to new forms. via the International Conference on Computer Vision (ICCV). Full presentation: piped.video/watch?v=wHK8GMc9…
1
33
My @X feed has gotten MUCH better in the past few days…
Yann LeCun: "The big secret of the [humanoid robot] industry is that none of those companies has any idea how to make those robots smart enough to be useful." He says that while they're fine for specific tasks, "a bunch of breakthroughs" are needed for domestic use.
8
1
1
71
Ken Goldberg retweeted
Can a robot inspect all views of an object? Today @IROS, we present Omni-Scan from @berkeley_ai, a novel method for bimanual robo 360° object scanning & reconstruction using 3D Gaussian Splats. (1/8) 🔗berkeleyautomation.github.io…
Agreed. Dealing cards requires dexterity; this was the most impressive demo I saw @IROS2025. (Clever choice btw as cards are lightweight and 22-dof hands are heavy)
Thanks Yunfang fr @SharpaRobotics walking me thro these incredibly dexterous robotic hands. This particular demo is teleop; but still really impressive. Evidently the teleoperator do get force feedback on his teleop device (very much needed to card dealing) #IROS2025
5
11
81
Ken Goldberg retweeted
A quick glimpse of all the robots at IROS — can you spot Booster?
Ken Goldberg retweeted
How can a robot provide details of plant anatomy for plant phenotyping? Today @IROS2025 , we present Botany-Bot from @berkeley_ai @Siemens. Botany-Bot 1) creates segmented 3D models of plants using Gaussian splats and GarField 2) uses a robot arm to expose hidden details. (1/9)
2
4
24
0
Ken Goldberg retweeted
BOOOOOOOM! CHINA DEEPSEEK DOES IT AGAIN! An entire encyclopedia compressed into a single, high-resolution image! — A mind-blowing breakthrough. DeepSeek-OCR, unleashed an electrifying 3-billion-parameter vision-language model that obliterates the boundaries between text and vision with jaw-dropping optical compression! This isn’t just an OCR upgrade—it’s a seismic paradigm shift, on how machines perceive and conquer data. DeepSeek-OCR crushes long documents into vision tokens with a staggering 97% decoding precision at a 10x compression ratio! That’s thousands of textual tokens distilled into a mere 100 vision tokens per page, outmuscling GOT-OCR2.0 (256 tokens) and MinerU2.0 (6,000 tokens) by up to 60x fewer tokens on the OmniDocBench. It’s like compressing an entire encyclopedia into a single, high-definition snapshot—mind-boggling efficiency at its peak! At the core of this insanity is the DeepEncoder, a turbocharged fusion of the SAM (Segment Anything Model) and CLIP (Contrastive Language–Image Pretraining) backbones, supercharged by a 16x convolutional compressor. This maintains high-resolution perception while slashing activation memory, transforming thousands of image patches into a lean 100-200 vision tokens. Get ready for the multi-resolution "Gundam" mode—scaling from 512x512 to a monstrous 1280x1280 pixels! It blends local tiles with a global view, tackling invoices, blueprints, and newspapers with zero retraining. It’s a shape-shifting computational marvel, mirroring the human eye’s dynamic focus with pixel-perfect precision! The training data? Supplied by the Chinese government for free and not available to any US company. You understand now why I have said the US needs a Manhattan Project for AI training data? Do you hear me now? Oh still no? I’ll continue. Over 30 million PDF pages across 100 languages, spiked with 10 million natural scene OCR samples, 10 million charts, 5 million chemical formulas, and 1 million geometry problems!. This model doesn’t just read—it devours scientific diagrams and equations, turning raw data into a multidimensional knowledge. Throughput? Prepare to be floored—over 200,000 pages per day on a single NVIDIA A100 GPU! This scalability is a game-changer, turning LLM data generation into a firehose of innovation, democratizing access to terabytes of insight for every AI pioneer out there. This optical compression is the holy grail for LLM long-context woes. Imagine a million-token document shrunk into a 100,000-token visual map—DeepSeek-OCR reimagines context as a perceptual playground, paving the way for a GPT-5 that processes documents like a supercharged visual cortex! The two-stage architecture is pure engineering poetry: DeepEncoder generates tokens, while a Mixture-of-Experts decoder spits out structured Markdown with multilingual flair. It’s a universal translator for the visual-textual multiverse, optimized for global domination! Benchmarks? DeepSeek-OCR obliterates GOT-OCR2.0 and MinerU2.0, holding 60% accuracy at 20x compression! This opens a portal to applications once thought impossible—pushing the boundaries of computational physics into uncharted territory! Live document analysis, streaming OCR for accessibility, and real-time translation with visual context are now economically viable, thanks to this compression breakthrough. It’s a real-time revolution, ready to transform our digital ecosystem! This paper is a blueprint for the future—proving text can be visually compressed 10x for long-term memory and reasoning. It’s a clarion call for a new AI era where perception trumps text, and models like GPT-5 see documents in a single, glorious glance. I am experimenting with this now on 1870-1970 offline data that I have digitalized. But be ready for a revolution! More soon. [1] github.com/deepseek-ai/DeepS…
Ken Goldberg retweeted
Just noticed the 2:1 humans:robot ratio in this simple demo here 😅 We still need many many humans-in-the-loop on the path of solving embodied AI #IROS2025
Looking fwd to @AUTOLab students presenting 4 papers @IROS2025 this week in Hangzhou, starting today with "A 'Botany-Bot' for Digital Twin Monitoring of Occluded and Underleaf Plant Structures" co-authored with @funmilore @shuangyuXxx & our collaborators @Siemens Eugene Solowjow.
3
2
22