--- title: Tiny Audio Demo emoji: 🎤 colorFrom: purple colorTo: blue sdk: gradio sdk_version: "4.44.0" python_version: "3.11" app_file: app.py pinned: false license: mit short_description: Efficient ASR with Whisper encoder and SmolLM3 decoder models: - mazesmazes/tiny-audio tags: - audio - automatic-speech-recognition - whisper - smollm - mlp suggested_hardware: cpu-upgrade preload_from_hub: - mazesmazes/tiny-audio --- ## Demo Overview This Space demonstrates an Automatic Speech Recognition (ASR) model that combines: - **Whisper encoder** for audio feature extraction - **SmolLM3 decoder** for efficient text generation ## Features - 🎙️ **Record from microphone** or upload audio files - ⚡ **Fast inference** with a small number of trainable parameters - 🎯 **English transcription** optimized for speech-to-text - 📊 **Lightweight model** suitable for edge deployment ## Model Architecture The model uses a novel architecture that bridges audio and text modalities: 1. **Audio Encoder**: Frozen Whisper encoder 2. **Projection Layer**: Custom audio-to-text space mapping 3. **Text Decoder**: SmolLM3 (frozen) ## Usage 1. **Upload an audio file** (WAV, MP3, etc.) or **record directly** using your microphone 2. Click **"Transcribe"** to convert speech to text 3. The transcription will appear in the output box ## Limitations - Maximum audio length: 30 seconds - Optimized for English language - Best performance with clear speech and minimal background noise ## Links - 📦 [Model on Hugging Face](https://huggingface.co/mazesmazes/tiny-audio) - 💻 [GitHub Repository](https://github.com/alexkroman/tiny-audio) - 📄 [Technical Details](https://github.com/alexkroman/tiny-audio/blob/main/MODEL_CARD.md) ## Citation If you use this model in your research, please cite: ```bibtex @software{kroman2024tinyaudio, author = {Kroman, Alex}, title = {Tiny Audio: Train your own speech recognition model in 24 hours}, year = {2024}, publisher = {GitHub}, url = {https://github.com/alexkroman/tiny-audio} } ```