Filter
Exclude
Time range
-
Near
More AI breakthroughs at @essential_ai from the incredible team @ashVaswani has been building. More to come...
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
1
Data is the new fabric. The richer the dataset, the more vibrant the creation. This unlocks new dimensions in ME+AI.
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
Absolute gold! 💥💥
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
2
Brilliant! @ashVaswani goes legend again by releasing a massive, fully labeled dataset that makes training easier. Excited to see people train SOTA models with Essential-Web v1 and @AMD Instinct GPUs. Explore the dataset here- huggingface.co/datasets/Esse…… 👏 Congrats to @andrewhok, Michael Pust, @timr1126 & the brilliant Data team at @essential_ai for your commitment to open data curation. Thank you for including @AIatAMD in your open science journey.
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
7
Check out the most meticulous open-sourced foundation model data. Amazing work by the team!
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
4
@ashVaswani and his team @EssentialAI are cooking 👨‍🍳
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
3
data team @essential_ai has been cooking! we just shipped a 24T–token corpus of richly-labeled CC-based data, significantly lowering the bar to entry for curating web-scale targeted datasets s/o @AndrewHojel Michael Pust @RitvikKapila @ashVaswani
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
1
1
7
datasets unlock the future! I am reminded of this phenomenal read from @RenPhil21 renaissancephilanthropy.org/…
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
1
1
9
Ashish is active… big deal
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
10
Check out our recent work at @essential_ai; Essential-Web v1.0- a web scale corpus of 24T tokens which we find useful to curate high-performing domain specific datasets for LLM pre-training. Paper link: arxiv.org/abs/2506.14111 cc @AndrewHojel @timr1126 @YashVanjani @ashVaswani
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
1
14
@ashVaswani doing gods work over here and making the basics much better. Bigger and better quality means better models. I wonder what he should use to embed to deal with all those tokens.
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
Very strong result @ashVaswani !
Check out our latest research on data. We're releasing 24T tokens of richly labelled web data. We found it very useful for our internal data curation efforts. Excited to see what you build using Essential-Web v1.0!
1
1
2