Pumatic English-French Translation Model
A neural machine translation model for English to French translation built with the MarianMT architecture.
Model Description
- Model type: Encoder-Decoder (MarianMT architecture)
- Language pair: English → French
- Parameters: ~74.7M
- GPU: H100
- Trained by: pumad
Training Details
Training Data
The model was trained on high-quality parallel corpora:
- OPUS-100 - Multilingual parallel corpus
- Europarl - European Parliament proceedings
- UN Parallel Corpus (UNPC) - United Nations documents
Training Procedure
- Hardware: NVIDIA H100 GPU
- Framework: Hugging Face Transformers
- Batch size: 128
- Learning rate: 2e-5
- Epochs: 3
- Max sequence length: 128 tokens
Data Preprocessing
- Quality filtering: Removed pairs with fewer than 5 words or more than 200 words
- Length ratio filtering: Excluded pairs with extreme length ratios (< 0.5 or > 2.0)
- Deduplication: Removed duplicate source sentences
Usage
Using the Transformers library
from transformers import MarianMTModel, MarianTokenizer
model_name = "pumad/pumadic-en-fr"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
translated = model.generate(**inputs)
output = tokenizer.decode(translated[0], skip_special_tokens=True)
print(output)
Using the Pipeline API
from transformers import pipeline
translator = pipeline("translation", model="pumad/pumadic-en-fr")
result = translator("The quick brown fox jumps over the lazy dog.")
print(result[0]['translation_text'])
Demo
Try this model live at pumatic.eu
API documentation available at pumatic.eu/docs
Limitations
- Optimized for general-purpose translation; domain-specific terminology may vary in quality
- Maximum input length of ~400 characters per chunk for optimal results
- Best performance on formal/written text; colloquial expressions may be less accurate
License
Apache 2.0
Citation
If you use this model, please cite:
@misc{pumatic-en-fr,
author = {pumad},
title = {Pumatic English-French Translation Model},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/pumad/pumadic-en-fr}
}
- Downloads last month
- 80