Running on CPU Upgrade Featured 2.87k The Smol Training Playbook 📚 2.87k The secrets to building world-class LLMs
Running on Zero Featured 424 DeepSeek OCR Demo 🆘 424 An interactive demo for the DeepSeek-OCR model.
Datasets for Pretrained Thai LLM Collection List Datasets for pretrained Thai LLM by PyThaiNLP • 25 items • Updated Aug 5, 2025 • 14
Thai instruction dataset list Collection Thai instruction datasets that have high quality and doesn't are the translated dataset by Google translate (low quality) • 14 items • Updated Oct 9, 2025 • 2
mlfoundations/refinedweb_banned_domains_curated Viewer • Updated Jul 21, 2024 • 4.57M • 89 • 1
HuggingFaceFW/fineweb-edu-classifier Text Classification • 0.1B • Updated Nov 17, 2024 • 26.3k • • 205
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order Paper • 2404.00399 • Published Mar 30, 2024 • 42