neo-3

This is the 1B-A90M-Base model. Check out the 3B-A400M-Base model and the post-trained version of your model.

The neo-3-1B-A90M and 3B-A400M models are the successor to neo-2-345M-C1 and C2, featuring:

Pre-trained and post-trained models
Context windows of 8K for 1B and 32K for 3B
The first ever neo reasoning model
Mixtral architecture

The post-trained models will release January 3rd, 2026. The pre-trained models have released on December 31st, 2025. A technical report is on the works, but training data is fully available for replication.

This model has an MIT license.

Evaluations for base models

Performance

	MMLU	HellaSwag	PIQA	ARC avg	GSM8K	Avg.
neo-3-1B-A90M	32.7	52.3	63.4	42.1	2.2	38.54
neo-3-3B-A400M	42.1	59.9	67.5	50.6	5.7	45.16
SmolLM2-360M	32.7	52.3	63.4	42.1	2.2	38.54
Gemma 3 270M	26.5	40.9	67.7	43.4	1.1	35.92
Qwen3-0.6B-Base	44.0	55.3	60.9	52.4	49.7	52.46

Efficiency

	Active Parameters	Training Tokens	Avg. p/ B active	Avg. p/ T training tokens
neo-3-1B-A90M	120M	1.2T	321.17	32.12
neo-3-3B-A400M	380M	1.2T	118.84	37.63
SmolLM2-360M	362M	4T	106.46	9.64
Gemma 3 270M	268M	6T	134.03	5.99
Qwen3-0.6B-Base	642M	36T	81.72	1.46

Charts

Task list

Train neo-3-Base models
Train neo-3 Instruct/Thinking models
Release neo-3-Base models
Release neo-3 Instruct/Thinking models
Publish neo-3 Technical Report
Train neo-3-VL Base model
Train neo-3-VL Instruct model
Release neo-3-VL Base model
Release neo-3-VL Instruct model
Publish neo-3-VL Technical Report
Train neo-3.1-Base models
Train neo-3.1 Instruct/Thinking/VL-Instruct models
Release neo-3.1-Base models
Release neo-3.1 Instruct/Thinking/VL-Instruct models
Publish neo-3.1 Technical Report

Downloads last month: 13

Safetensors

Model size

1.0B params

Tensor type

F16

Datasets used to train aquiffoo/neo-3-1B-A90M-Base

Collection including aquiffoo/neo-3-1B-A90M-Base

neo-3

Collection

My series of fully open, state-of-the-art small mixture-of-experts models. • 11 items • Updated 6 days ago