T3-Video
π€ Project | π Paper | π€ Pretrained T3-Video Weights (4K)
Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10$\times$
π Click β to watch 4K World Vision demo by native 4K video generation model T3-Video-Wan2.1-T2V-1.3B

TODO
- Release T2V weights:
T3-Video-Wan2.1-T2V-1.3BandT3-Video-Wan2.2-T2V-5B. - Release 4K-VBench
- Release 4K-World-Vision for the presented demo.
- Release inference code.
- Release training code.
Quickstart
- Refer to DiffSynth-Studio/examples/wanvideo for environment preparation.
- Download Wan2.1-T2V-1.3B model using huggingface-cli:
pip install "huggingface_hub[cli]"
huggingface-cli download --repo-type model Wan-AI/Wan2.1-T2V-1.3B --local-dir weights/Wan2.1-T2V-1.3B --resume-download
huggingface-cli download --repo-type model Wan-AI/Wan2.2-TI2V-5B --local-dir weights/Wan2.2-TI2V-5B --resume-download
- Download T3-Video models using huggingface-cli:
huggingface-cli download --repo-type model APRIL-AIGC/T3-Video --local-dir weights/T3-Video --resume-download
- Infer native 4K videos with
T3-Video-Wan2.1-T2V-1.3.
python infer_multi_gpu.py --model_id Wan-AI/Wan2.1-T2V-1.3B --text_path models/Wan2.1-T2V-1.3B/models_t5_umt5-xxl-enc-bf16.pth --dit_path "models/Wan2.1-T2V-1.3B/diffusion_pytorch_model*.safetensors" --vae_path models/Wan2.1-T2V-1.3B/Wan2.1_VAE.pth --dit_path_full_pretrained models/T3-Video/T3-Video-Wan2.1-T2V-1.3B.safetensors --height 2176 --width 3840 --num_frames 81 --out_dir output/T3-Video-Wan2.1-T2V-1.3B-4K-World-Vision-seed0-step50 --json_file 4K-World-Vision/4K-World-Vision.json --data_root 4K-World-Vision --start_id=0 --end_id=-1 --tiled true --seed 0 --num_groups 8 --num_inference_steps 50
5.Infer native 4K videos with T3-Video-Wan2.2-T2V-5B.
Change the mode to wan2.2_2176_3840 in Line-180 of the file diffsynth/models/wan_video_dit.py.
python infer_multi_gpu.py --model_id Wan-AI/Wan2.2-TI2V-5B --text_path models/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth --dit_path "models/Wan2.2-TI2V-5B/diffusion_pytorch_model*.safetensors" --vae_path models/Wan2.2-TI2V-5B/Wan2.2_VAE.pth --dit_path_full_pretrained models/T3-Video/T3-Video-Wan2.2-T2V-5B.safetensors --height 2176 --width 3840 --num_frames 81 --out_dir output/T3-Video-Wan2.2-T2V-5B-4K-World-Vision-seed0-step50 --json_file 4K-World-Vision/4K-World-Vision.json --data_root 4K-World-Vision --start_id=0 --end_id=-1 --tiled true --seed 0 --num_groups 8 --num_inference_steps 50
License Agreement
Copyright (c) 2025 T3-Video
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Also, users must follow Wan-Video/Wan2.1/LICENSE.txt to use Wan-related models.
Acknowledgements
We would like to thank the contributors to the Wan2.1, Wan2.2, Qwen, umt5-xxl, diffusers and HuggingFace repositories, for their open researches.
Citation
If you find our work helpful, please cite us.
@misc{t3video,
title={Transform Trained Transformer: Accelerating Naive 4K Video Generation Over 10$\times$},
author={Jiangning Zhang and Junwei Zhu and Teng Hu and Yabiao Wang and Donghao Luo and Weijian Cao and Zhenye Gan and Xiaobin Hu and Zhucun Xue and Chengjie Wang},
year={2025},
eprint={2512.13492},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.13492},
}
- Downloads last month
- 23
Model tree for APRIL-AIGC/T3-Video
Base model
Wan-AI/Wan2.1-T2V-1.3B