neo-3
Collection
My series of fully open, state-of-the-art small mixture-of-experts models.
•
11 items
•
Updated
This is the 1B-A90M-Base model. Check out the 3B-A400M-Base model and the post-trained version of your model.
The neo-3-1B-A90M and 3B-A400M models are the successor to neo-2-345M-C1 and C2, featuring:
The post-trained models will release January 3rd, 2026. The pre-trained models have released on December 31st, 2025. A technical report is on the works, but training data is fully available for replication.
This model has an MIT license.
| MMLU | HellaSwag | PIQA | ARC avg | GSM8K | Avg. | |
|---|---|---|---|---|---|---|
| neo-3-1B-A90M | 32.7 | 52.3 | 63.4 | 42.1 | 2.2 | 38.54 |
| neo-3-3B-A400M | 42.1 | 59.9 | 67.5 | 50.6 | 5.7 | 45.16 |
| SmolLM2-360M | 32.7 | 52.3 | 63.4 | 42.1 | 2.2 | 38.54 |
| Gemma 3 270M | 26.5 | 40.9 | 67.7 | 43.4 | 1.1 | 35.92 |
| Qwen3-0.6B-Base | 44.0 | 55.3 | 60.9 | 52.4 | 49.7 | 52.46 |
| Active Parameters | Training Tokens | Avg. p/ B active | Avg. p/ T training tokens | |
|---|---|---|---|---|
| neo-3-1B-A90M | 120M | 1.2T | 321.17 | 32.12 |
| neo-3-3B-A400M | 380M | 1.2T | 118.84 | 37.63 |
| SmolLM2-360M | 362M | 4T | 106.46 | 9.64 |
| Gemma 3 270M | 268M | 6T | 134.03 | 5.99 |
| Qwen3-0.6B-Base | 642M | 36T | 81.72 | 1.46 |