Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
utter-project
/
TowerVideo-9B
like
2
Follow
UTTER - Unified Transcription and Translation for Extended Reality
342
Video-Text-to-Text
Transformers
Safetensors
18 languages
llava_onevision
image-to-text
multimodal
multilingual
vlm
translation
arxiv:
2510.21849
License:
cc-by-nc-sa-4.0
Model card
Files
Files and versions
xet
Community
Deploy
Use this model
main
TowerVideo-9B
Commit History
Update README.md
b5cc963
verified
GuilhermeNunes
commited on
Oct 28
Update README.md
36f0ca3
verified
GuilhermeNunes
commited on
Oct 23
Upload Tower.png
fcad27f
verified
GuilhermeNunes
commited on
Oct 19
Upload folder using huggingface_hub
2261404
verified
SaulSantos
commited on
Oct 15
Update README.md
7904306
verified
GuilhermeNunes
commited on
Oct 15
Upload Tower.png
40473fb
verified
GuilhermeNunes
commited on
Oct 15
Delete mc-eval2.png
0734932
verified
GuilhermeNunes
commited on
Oct 15
Delete mc-eval1.png
01d1b56
verified
GuilhermeNunes
commited on
Oct 15
Delete Tower.png
e706e63
verified
GuilhermeNunes
commited on
Oct 15
Update README.md
6207e88
verified
GuilhermeNunes
commited on
Oct 15
Upload 3 files
447bd8f
verified
SaulSantos
commited on
Oct 14
Update README.md
006a7b1
verified
SaulSantos
commited on
Oct 14
initial commit
3d270ab
verified
SaulSantos
commited on
Oct 14