Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Avery Yen's picture
1

Avery Yen

yen-av
·

AI & ML interests

None yet

Recent Activity

published a dataset about 4 hours ago
yen-av/gemma3-reasoning-sft
updated a dataset about 5 hours ago
yen-av/gemma3-reasoning-sft
reacted to codelion's post with 🔥 2 days ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
View all activity

Organizations

Northeastern University 's profile picture

yen-av 's models 3

yen-av/modernbert-trump-tweet-voo

0.1B • Updated Oct 19 • 7

yen-av/distilgpt2-trump-tweet-voo

81.9M • Updated Oct 15 • 3

yen-av/bert-trump-tweet-voo

Text Classification • 67M • Updated Oct 14 • 16
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs