Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
1
Avery Yen
yen-av
Follow
0 followers
·
2 following
AI & ML interests
None yet
Recent Activity
published
a dataset
about 4 hours ago
yen-av/gemma3-reasoning-sft
updated
a dataset
about 5 hours ago
yen-av/gemma3-reasoning-sft
reacted
to
codelion
's
post
with 🔥
2 days ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
View all activity
Organizations
yen-av
's models
3
Sort: Recently updated
yen-av/modernbert-trump-tweet-voo
0.1B
•
Updated
Oct 19
•
7
yen-av/distilgpt2-trump-tweet-voo
81.9M
•
Updated
Oct 15
•
3
yen-av/bert-trump-tweet-voo
Text Classification
•
67M
•
Updated
Oct 14
•
16