Submitted by Jingfeng Yao 81 Towards Scalable Pre-training of Visual Tokenizers for Generation MiniMax 147 4