Pre-training an LLM in 9 days
A recipe to pre-train models in 9 days, to become comparable AI assistants to the likes of Apple OpenELM and Microsoft Phi.
This repo contains the model architecture, training scripts, and utilities of 1.5-Pints and 0.12-Pint
🔑 Architecture: Transformer-based, 1.5-Pints (2B params), 0.12-Pint (120M params)
🔬 Data: Expository Prose V1, emphasizes quality over quantity
🚀 Training: 9 days, uses flash attention and xformers
📊 Performance: Comparable to larger models like Apple OpenELM, Microsoft Phi
🛠️ Implementation:
- PyTorch Lightning for distributed training
- Finetuning and DPO scripts
- HuggingFace conversion
- Efficient techniques: flash attention, xformers
- Configurable architecture
- Comprehensive testing suite