OctoThinker-Llama-1B Family - a OctoThinker Collection

OctoThinker 's Collections

Mid-training Analysis Checkpoints (Llama-3.2-3B)

updated Jul 6, 2025

What makes a base language model suitable for RL? Through controlled experiments, we identify key factors then leverage them to scale up mid-training.

Upvote

OctoThinker/OctoThinker-1B-Long-Base

Text Generation • 1B • Updated Jul 6, 2025 • 7
OctoThinker/OctoThinker-1B-Hybrid-Base

Text Generation • 1B • Updated Jul 6, 2025 • 4
OctoThinker/OctoThinker-1B-Short-Base

Text Generation • 1B • Updated Jul 6, 2025 • 8
OctoThinker/OctoThinker-1B-Long-Zero

Text Generation • 1B • Updated Jul 6, 2025 • 6
OctoThinker/OctoThinker-1B-Hybrid-Zero

Text Generation • 1B • Updated Jul 6, 2025 • 3
OctoThinker/OctoThinker-1B-Short-Zero

Text Generation • 1B • Updated Jul 6, 2025 • 2

Upvote

Collection guide
Browse collections