ttseval (TTS Eval (OLD))

sanchit-gandhi

authored a paper about 21 hours ago

Voxtral Realtime

Paper • 2602.11298 • Published 13 days ago • 16

pcuenq

posted an update about 2 months ago

Post

3494

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

2 replies

·

mrfakename

posted an update 3 months ago

Post

15752

Excited to share that I've joined the Hugging Face Fellows program! 🤗

Looking forward to contributing to & working more closely with the open-source ecosystem - huge thanks to everyone who's supported me on this journey! 🚀

mrfakename

posted an update 4 months ago

Post

6262

Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: https://huggingface.co/spaces/mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)

5 replies

·

multimodalart

posted an update 4 months ago

Post

19175

Want to iterate on a Hugging Face Space with an LLM?

Now you can easily convert any HF entire repo (Model, Dataset or Space) to a text file and feed it to a language model!

multimodalart/repo2txt

thomwolf

authored a paper 4 months ago

Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14, 2025 • 123

sanchit-gandhi

authored 2 papers 7 months ago

Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 66

Voxtral

Paper • 2507.13264 • Published Jul 17, 2025 • 32

thomwolf

authored a paper 8 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26, 2025 • 77

multimodalart

posted an update 8 months ago

Post

18185

Self-Forcing - a real-time video distilled model from Wan 2.1 by @adobe is out, and they open sourced it 🐐

I've built a live real time demo on Spaces 📹💨

multimodalart/self-forcing

6 replies

·

cbensimon

posted an update 9 months ago

Post

4352

🚀 ZeroGPU now supports PyTorch native quantization via torchao

While it hasn’t been battle-tested yet, Int8WeightOnlyConfig is already working flawlessly in our tests.

Let us know if you run into any issues — and we’re excited to see what the community will build!

import spaces
from diffusers import FluxPipeline
from torchao.quantization.quant_api import Int8WeightOnlyConfig, quantize_

pipeline = FluxPipeline.from_pretrained(...).to('cuda')
quantize_(pipeline.transformer, Int8WeightOnlyConfig()) # Or any other component(s)

@spaces.GPU
def generate(prompt: str):
    return pipeline(prompt).images[0]

5 replies

·

thomwolf

authored a paper 9 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2, 2025 • 150

cbensimon

posted an update 9 months ago

Post

6129

🚀 ZeroGPU medium size is now available as a power-user feature

Nothing too fancy for now—ZeroGPU Spaces still default to large (70GB VRAM)—but this paves the way for:
- 💰 size-based quotas / pricing (medium will offer significantly more usage than large)
- 🦣 the upcoming xlarge size (141GB VRAM)

You can as of now control GPU size via a Space variable. Accepted values:
- auto (future default)
- medium
- large (current default)

The auto mode checks total CUDA tensor size during startup:
- More than 30GB → large
- Otherwise → medium

3 replies

·

thomwolf

posted an update 11 months ago

Post

8061

If you've followed the progress of robotics in the past 18 months, you've likely noticed how robotics is increasingly becoming the next frontier that AI will unlock.

At Hugging Face—in robotics and across all AI fields—we believe in a future where AI and robots are open-source, transparent, and affordable; community-built and safe; hackable and fun. We've had so much mutual understanding and passion working with the Pollen Robotics team over the past year that we decided to join forces!

You can already find our open-source humanoid robot platform Reachy 2 on the Pollen website and the Pollen community and people here on the hub at

pollen-robotics

We're so excited to build and share more open-source robots with the world in the coming months!

1 reply

·

thomwolf

authored a paper 11 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 205

pcuenq

authored a paper 11 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 205

thomwolf

authored a paper 11 months ago

YourBench: Easy Custom Evaluation Sets for Everyone

Paper • 2504.01833 • Published Apr 2, 2025 • 22

mrfakename

posted an update 11 months ago

Post

3742

Papla P1 from Papla Media is now available on the TTS Arena!

Try out Papla's new ultra-realistic TTS model + compare it with other leading models on the TTS Arena: TTS-AGI/TTS-Arena

thomwolf

posted an update 11 months ago

Post

3891

The new DeepSite space is really insane for vibe-coders
enzostvs/deepsite

With the wave of vibe-coding-optimized LLMs like the latest open-source DeepSeek model (version V3-0324), you can basically prompt out-of-the-box and create any app and game in one-shot.

It feels so powerful to me, no more complex framework or under-the-hood prompt engineering to have a working text-to-app tool.

AI is eating the world and *open-source* AI is eating AI itself!

PS: and even more meta is that the DeepSite app and DeepSeek model are both fully open-source code => time to start recursively improve?

PPS: you still need some inference hosting unless you're running the 600B param model at home, so check the very nice list of HF Inference Providers for this model: deepseek-ai/DeepSeek-V3-0324

1 reply

·

mrfakename

posted an update 11 months ago

Post

3121

GGUF quants (text-only) for the new Mistral Small 3.1 24B are now live:

mrfakename/mistral-small-3.1-24b-instruct-2503-gguf

TTS Eval (OLD)

AI & ML interests

Recent Activity

Voxtral Realtime

Robot Learning: A Tutorial

Magistral

Voxtral

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

SmolVLM: Redefining small and efficient multimodal models

SmolVLM: Redefining small and efficient multimodal models

YourBench: Easy Custom Evaluation Sets for Everyone

AI & ML interests

Recent Activity

Team members 8

ttseval's activity