Member of Technical Staff @xAI | prev. @ubisoft

London, UK
Joined May 2015
Leonardo Mariscal retweeted
man I am so glad that people are saying "cracked" less these days. that word lost all meaning a long time ago haha anyway, xAI is hiring EXCEPTIONAL engineers worldwide. London, San Francisco, DC, Tokyo, Singapore, and more! x.ai/careers/open-roles
Leonardo Mariscal retweeted
Mañana mi Grok sale en la mañanera como uno de los periodistas de la mafia del poder…
Replying to @RealArturoH
Andrés Manuel López Obrador.
Each update is an engineering marble, I invite everyone to go through a couple of their technical blogs.
Friday Facts #439 - Factorio and Space Age on Nintendo Switch 2™ factorio.com/blog/post/fff-4… #factorio #gamedev
1
Leonardo Mariscal retweeted
Is that world-class AI infrastructure in the room with us?
The EU is expanding its AI network! New AI Factories Antennas are launching in 🇧🇪🇨🇾🇭🇺🇮🇪🇱🇻🇲🇹🇸🇰 and partner countries 🇮🇸🇲🇩🇨🇭🇬🇧🇲🇰🇷🇸 — giving innovators wider access to Europe’s world-class AI infrastructure. #DigitalEU
22
69
569
Leonardo Mariscal retweeted
Just wrapped an incredibly productive week at @xai 's London office! The energy here is unreal—brilliant minds, endless ideas, and some serious momentum. Let's just say the MACROHARD project is simmering nicely, and the team is absolutely cooking. 🚀🚀🚀
The @xAI MACROHARD project will be profoundly impactful at an immense scale 😉 Our goal is to create a company that can do anything short of manufacturing physical objects directly, but will be able to do so indirectly, much like Apple has other companies manufacture their phones.
Leonardo Mariscal retweeted
Exceptional products need exceptional models that need exceptional compute resources. @xai has built a compute advantage that will grow exponentially with the delivery of Colossus 2, unlocking next-gen models. It's why I work here and it's why you should join, too. Links below.
Leonardo Mariscal retweeted
Hay más Muertos por la violencia en México🇲🇽 que la guerra en Palestina🇵🇸 Pero las Muertes de México no dan suficientes Like's en Instagram😉
Leonardo Mariscal retweeted
Introducing Grok Code Fast 1, a speedy and economical reasoning model that excels at agentic coding. Now available for free on GitHub Copilot, Cursor, Cline, Kilo Code, Roo Code, opencode, and Windsurf. x.ai/news/grok-code-fast-1
Leonardo Mariscal retweeted
Liftoff of Starship!
Leonardo Mariscal retweeted
The @xAI Grok 2.5 model, which was our best model last year, is now open source. Grok 3 will be made open source in about 6 months. huggingface.co/xai-org/grok-…
Leonardo Mariscal retweeted
"They don't understand because they're a game developer" is total nonsense. Games are a superset domain. We make editors, servers, databases, build systems, large asset management, etc. If anything, if you're going to dismiss someone, dismiss people who haven't worked on games.
Replying to @ThePrimeagen
I would love your nuanced take! I asked an engineer I trust and here was theirs
73
92
22
1,603
Leonardo Mariscal retweeted
Finally a high score we can be proud of.
LLMs acing math olympiads? Cute. But BALROG is where agents fight dragons (and actual Balrogs)🐉😈 And today, Grok-4 (@grok) takes the gold 🥇 Welcome to the podium, champion!
7
7
190
Pretty interesting: research.google/blog/android… Sad that Mexico is not part of the program though, literally one of the most seismically active countries. @GoogleResearch @marcsto
2
TIL agent in today's LLM world comes from the book "Artificial Intelligence: A Modern Approach", where each agent aims to understand, reason, and act upon the world it lives on.
2
Leonardo Mariscal retweeted
You think you can't announce waifu as a feature and a government defense deal on the same day... But this is xAI 😎
Leonardo Mariscal retweeted
We are creating a multi-agent AI software company @xAI, where @Grok spawns hundreds of specialized coding and image/video generation/understanding agents all working together and then emulates humans interacting with the software in virtual machines until the result is excellent. This is a macro challenge and a hard problem with stiff competition! Can you guess the name of this company? 🤭
I took Grok 4 for a spin this weekend to build this game prototype. I used SuperGrok Chat to generate the initial game prototype and then brought it over to Cursor to continue coding with Grok 4 MAX. Grok 4 in Cursor is like a no-nonsense agent. Doesn't speak much, but delivers the goods. There were moments where I was rate-limited or stuck on a bug or two that I had to get other models to help, but otherwise it's a fast, reliable model to work with. This makes me incredibly excited for Grok Code to launch in August!
Leonardo Mariscal retweeted
I took Grok 4 for a spin this weekend to build this game prototype. I used SuperGrok Chat to generate the initial game prototype and then brought it over to Cursor to continue coding with Grok 4 MAX. Grok 4 in Cursor is like a no-nonsense agent. Doesn't speak much, but delivers the goods. There were moments where I was rate-limited or stuck on a bug or two that I had to get other models to help, but otherwise it's a fast, reliable model to work with. This makes me incredibly excited for Grok Code to launch in August!
Leonardo Mariscal retweeted
Grok 4 feels like Artificial General Intelligence to me. It is clearly not just constructing statistically likely connections, but is drawing fairly deep insights on problems it hasn’t seen before, in ways I haven’t seen elsewhere. Here’s an example: grok.com/share/bGVnYWN5_b97c…
Leonardo Mariscal retweeted
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA
Leonardo Mariscal retweeted
xAI gave us early access to Grok 4 - and the results are in. Grok 4 is now the leading AI model. We have run our full suite of benchmarks and Grok 4 achieves an Artificial Analysis Intelligence Index of 73, ahead of OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude 4 Opus at 64 and DeepSeek R1 0528 at 68. Full results breakdown below. This is the first time that @elonmusk's @xai has the lead the AI frontier. Grok 3 scored competitively with the latest models from OpenAI, Anthropic and Google - but Grok 4 is the first time that our Intelligence Index has shown xAI in first place. We tested Grok 4 via the xAI API. The version of Grok 4 deployed for use on X/Twitter may be different to the model available via API. Consumer application versions of LLMs typically have instructions and logic around the models that can change style and behavior. Grok 4 is a reasoning model, meaning it ‘thinks’ before answering. The xAI API does not share reasoning tokens generated by the model. Grok 4’s pricing is equivalent to Grok 3 at $3/$15 per 1M input/output tokens ($0.75 per 1M cached input tokens). The per-token pricing is identical to Claude 4 Sonnet, but more expensive than Gemini 2.5 Pro ($1.25/$10, for <200K input tokens) and o3 ($2/$8, after recent price decrease). We expect Grok 4 to be available via the xAI API, via the Grok chatbot on X, and potentially via Microsoft Azure AI Foundry (Grok 3 and Grok 3 mini are currently available on Azure). Key benchmarking results: ➤ Grok 4 leads in not only our Artificial Analysis Intelligence Index but also our Coding Index (LiveCodeBench & SciCode) and Math Index (AIME24 & MATH-500) ➤ All-time high score in GPQA Diamond of 88%, representing a leap from Gemini 2.5 Pro’s previous record of 84% ➤ All-time high score in Humanity’s Last Exam of 24%, beating Gemini 2.5 Pro’s previous all-time high score of 21%. Note that our benchmark suite uses the original HLE dataset (Jan '25) and runs the text-only subset with no tools ➤ Joint highest score for MMLU-Pro and AIME 2024 of 87% and 94% respectively ➤ Speed: 75 output tokens/s, slower than o3 (188 tokens/s), Gemini 2.5 Pro (142 tokens/s), Claude 4 Sonnet Thinking (85 tokens/s) but faster than Claude 4 Opus Thinking (66 tokens/s) Other key information: ➤ 256k token context window. This is below Gemini 2.5 Pro’s context window of 1 million tokens, but ahead of Claude 4 Sonnet and Claude 4 Opus (200k tokens), o3 (200k tokens) and R1 0528 (128k tokens) ➤ Supports text and image input ➤ Supports function calling and structured outputs See below for further analysis 👇