Spaces:
Runtime error
Runtime error
File size: 2,068 Bytes
d411ac6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
title: Tiny Audio Demo
emoji: π€
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
short_description: Efficient ASR with Whisper encoder and SmolLM3 decoder
models:
- mazesmazes/tiny-audio
tags:
- audio
- automatic-speech-recognition
- whisper
- smollm
- mlp
suggested_hardware: cpu-upgrade
preload_from_hub:
- mazesmazes/tiny-audio
---
## Demo Overview
This Space demonstrates an Automatic Speech Recognition (ASR) model that combines:
- **Whisper encoder** for audio feature extraction
- **SmolLM3 decoder** for efficient text generation
## Features
- ποΈ **Record from microphone** or upload audio files
- β‘ **Fast inference** with a small number of trainable parameters
- π― **English transcription** optimized for speech-to-text
- π **Lightweight model** suitable for edge deployment
## Model Architecture
The model uses a novel architecture that bridges audio and text modalities:
1. **Audio Encoder**: Frozen Whisper encoder
2. **Projection Layer**: Custom audio-to-text space mapping
3. **Text Decoder**: SmolLM3 (frozen)
## Usage
1. **Upload an audio file** (WAV, MP3, etc.) or **record directly** using your microphone
2. Click **"Transcribe"** to convert speech to text
3. The transcription will appear in the output box
## Limitations
- Maximum audio length: 30 seconds
- Optimized for English language
- Best performance with clear speech and minimal background noise
## Links
- π¦ [Model on Hugging Face](https://huggingface.co/mazesmazes/tiny-audio)
- π» [GitHub Repository](https://github.com/alexkroman/tiny-audio)
- π [Technical Details](https://github.com/alexkroman/tiny-audio/blob/main/MODEL_CARD.md)
## Citation
If you use this model in your research, please cite:
```bibtex
@software{kroman2024tinyaudio,
author = {Kroman, Alex},
title = {Tiny Audio: Train your own speech recognition model in 24 hours},
year = {2024},
publisher = {GitHub},
url = {https://github.com/alexkroman/tiny-audio}
}
```
|