File size: 2,068 Bytes
d411ac6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: Tiny Audio Demo
emoji: 🎀
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: "4.44.0"
python_version: "3.11"
app_file: app.py
pinned: false
license: mit
short_description: Efficient ASR with Whisper encoder and SmolLM3 decoder
models:
  - mazesmazes/tiny-audio
tags:
  - audio
  - automatic-speech-recognition
  - whisper
  - smollm
  - mlp
suggested_hardware: cpu-upgrade
preload_from_hub:
  - mazesmazes/tiny-audio
---

## Demo Overview

This Space demonstrates an Automatic Speech Recognition (ASR) model that combines:

- **Whisper encoder** for audio feature extraction
- **SmolLM3 decoder** for efficient text generation

## Features

- πŸŽ™οΈ **Record from microphone** or upload audio files
- ⚑ **Fast inference** with a small number of trainable parameters
- 🎯 **English transcription** optimized for speech-to-text
- πŸ“Š **Lightweight model** suitable for edge deployment

## Model Architecture

The model uses a novel architecture that bridges audio and text modalities:

1. **Audio Encoder**: Frozen Whisper encoder
2. **Projection Layer**: Custom audio-to-text space mapping
3. **Text Decoder**: SmolLM3 (frozen)

## Usage

1. **Upload an audio file** (WAV, MP3, etc.) or **record directly** using your microphone
2. Click **"Transcribe"** to convert speech to text
3. The transcription will appear in the output box

## Limitations

- Maximum audio length: 30 seconds
- Optimized for English language
- Best performance with clear speech and minimal background noise

## Links

- πŸ“¦ [Model on Hugging Face](https://huggingface.co/mazesmazes/tiny-audio)
- πŸ’» [GitHub Repository](https://github.com/alexkroman/tiny-audio)
- πŸ“„ [Technical Details](https://github.com/alexkroman/tiny-audio/blob/main/MODEL_CARD.md)

## Citation

If you use this model in your research, please cite:

```bibtex
@software{kroman2024tinyaudio,
  author = {Kroman, Alex},
  title = {Tiny Audio: Train your own speech recognition model in 24 hours},
  year = {2024},
  publisher = {GitHub},
  url = {https://github.com/alexkroman/tiny-audio}
}
```