YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Parakeet-TDT-0.6B-v2 ONNX
ONNX-exported version of NVIDIA NeMo's Parakeet-TDT-0.6B-v2 ASR model.
Model Information
- Original Model: nvidia/parakeet-tdt-0.6b-v2
- Model Type: Transducer (RNN-T) with BPE tokenization
- Sample Rate: 16000 Hz
- Vocabulary Size: 1024
Files Included
encoder.onnx/encoder.int8.onnx- Encoder model (full precision / quantized)encoder.data/encoder.int8.data- External data for encoder models (required)decoder.onnx/decoder.int8.onnx- Combined decoder+joint model (full precision / quantized)tokens.txt- Vocabulary tokensmetadata.json- Model metadata
Note: The encoder models use external data files to avoid the 2GB protobuf limit. Both the .onnx and .data files must be present in the same directory.
Usage
This model requires the NeMo framework for preprocessing and the ONNX decoder implementation.
import torch
import torchaudio
from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.parts.submodules.rnnt_greedy_decoding import ONNXGreedyBatchedRNNTInfer
# Load the original PyTorch model for preprocessing only
nemo_model = ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
nemo_model.freeze()
device = "cuda" if torch.cuda.is_available() else "cpu"
if torch.cuda.is_available():
nemo_model = nemo_model.to("cuda")
# Initialize ONNX decoder (use .int8.onnx for quantized versions)
encoder_path = "encoder.onnx"
decoder_path = "decoder.onnx" # This is the combined decoder+joint model
max_symbols_per_step = 5
onnx_decoder = ONNXGreedyBatchedRNNTInfer(
encoder_path, decoder_path, max_symbols_per_step
)
# Prepare audio files (must be mono)
audio_files = ["path/to/audio.wav"]
# Configure preprocessor
nemo_model.preprocessor.featurizer.dither = 0.0
nemo_model.preprocessor.featurizer.pad_to = 0
# Setup dataloader
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
config = {
"paths2audio_files": audio_files,
"batch_size": 4,
"temp_dir": tmpdir,
}
dataloader = nemo_model._setup_transcribe_dataloader(config)
# Run inference
all_transcripts = []
for batch in dataloader:
input_signal, input_signal_length = batch[0], batch[1]
input_signal = input_signal.to(device)
input_signal_length = input_signal_length.to(device)
# Extract features
features, features_len = nemo_model.preprocessor(
input_signal=input_signal, length=input_signal_length
)
# ONNX inference
hypotheses = onnx_decoder(audio_signal=features, length=features_len)
# Decode to text
hypotheses = nemo_model.decoding.decode_hypothesis(hypotheses)
texts = [h.text for h in hypotheses]
all_transcripts.extend(texts)
print("\nTranscripts:")
for i, text in enumerate(all_transcripts):
print(f" {i+1}. {text}")
Performance
The quantized models (.int8.onnx) provide faster inference with minimal accuracy loss compared to the full-precision versions.
Citation
@misc{parakeet-tdt-0.6b-v2,
title={Parakeet-TDT-0.6B-v2},
author={NVIDIA NeMo Team},
year={2024},
url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2}
}
License
This model follows the same license as the original NeMo Parakeet model.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support