YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Parakeet-TDT-0.6B-v2 ONNX

ONNX-exported version of NVIDIA NeMo's Parakeet-TDT-0.6B-v2 ASR model.

Model Information

Original Model: nvidia/parakeet-tdt-0.6b-v2
Model Type: Transducer (RNN-T) with BPE tokenization
Sample Rate: 16000 Hz
Vocabulary Size: 1024

Files Included

encoder.onnx / encoder.int8.onnx - Encoder model (full precision / quantized)
encoder.data / encoder.int8.data - External data for encoder models (required)
decoder.onnx / decoder.int8.onnx - Combined decoder+joint model (full precision / quantized)
tokens.txt - Vocabulary tokens
metadata.json - Model metadata

Note: The encoder models use external data files to avoid the 2GB protobuf limit. Both the .onnx and .data files must be present in the same directory.

Usage

This model requires the NeMo framework for preprocessing and the ONNX decoder implementation.

import torch
import torchaudio
from nemo.collections.asr.models import ASRModel
from nemo.collections.asr.parts.submodules.rnnt_greedy_decoding import ONNXGreedyBatchedRNNTInfer

# Load the original PyTorch model for preprocessing only
nemo_model = ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v2")
nemo_model.freeze()

device = "cuda" if torch.cuda.is_available() else "cpu"
if torch.cuda.is_available():
    nemo_model = nemo_model.to("cuda")

# Initialize ONNX decoder (use .int8.onnx for quantized versions)
encoder_path = "encoder.onnx"
decoder_path = "decoder.onnx"  # This is the combined decoder+joint model
max_symbols_per_step = 5

onnx_decoder = ONNXGreedyBatchedRNNTInfer(
    encoder_path, decoder_path, max_symbols_per_step
)

# Prepare audio files (must be mono)
audio_files = ["path/to/audio.wav"]

# Configure preprocessor
nemo_model.preprocessor.featurizer.dither = 0.0
nemo_model.preprocessor.featurizer.pad_to = 0

# Setup dataloader
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
    config = {
        "paths2audio_files": audio_files,
        "batch_size": 4,
        "temp_dir": tmpdir,
    }
    dataloader = nemo_model._setup_transcribe_dataloader(config)
    
    # Run inference
    all_transcripts = []
    for batch in dataloader:
        input_signal, input_signal_length = batch[0], batch[1]
        input_signal = input_signal.to(device)
        input_signal_length = input_signal_length.to(device)
        
        # Extract features
        features, features_len = nemo_model.preprocessor(
            input_signal=input_signal, length=input_signal_length
        )
        
        # ONNX inference
        hypotheses = onnx_decoder(audio_signal=features, length=features_len)
        
        # Decode to text
        hypotheses = nemo_model.decoding.decode_hypothesis(hypotheses)
        texts = [h.text for h in hypotheses]
        all_transcripts.extend(texts)

print("\nTranscripts:")
for i, text in enumerate(all_transcripts):
    print(f"  {i+1}. {text}")

Performance

The quantized models (.int8.onnx) provide faster inference with minimal accuracy loss compared to the full-precision versions.

Citation

@misc{parakeet-tdt-0.6b-v2,
  title={Parakeet-TDT-0.6B-v2},
  author={NVIDIA NeMo Team},
  year={2024},
  url={https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2}
}

License

This model follows the same license as the original NeMo Parakeet model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support