Introduction
We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering, photorealism, deployment efficiency, and developer accessibility prevalent in current leading models.
Key Features
- ๐ Exceptional Efficiency and Performance: With only 6B parameters, LongCat-Image surpasses numerous open-source models that are several times larger across multiple benchmarks, demonstrating the immense potential of efficient model design.
- ๐ Powerful Chinese Text Rendering: LongCat-Image demonstrates superior accuracy and stability in rendering common Chinese characters compared to existing SOTA open-source models and achieves industry-leading coverage of the Chinese dictionary.
- ๐ Remarkable Photorealism: Through an innovative data strategy and training framework, LongCat-Image achieves remarkable photorealism in generated images.
๐จ Showcase
Quick Start
Installation
pip install git+https://github.com/huggingface/diffusers
Run Text-to-Image Generation
Leveraging a stronger LLM for prompt refinement can further enhance image generation quality. Please refer to inference_t2i.py for detailed usage instructions.
๐ Special Handling for Text Rendering
For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese โ...โ / โ...โ styles are supported).
Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.
import torch
from diffusers import LongCatImagePipeline
if __name__ == '__main__':
device = torch.device('cuda')
pipe = LongCatImagePipeline.from_pretrained("meituan-longcat/LongCat-Image", torch_dtype= torch.bfloat16 )
# pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~17 GB); slower but prevents OOM
prompt = 'ไธไธชๅนด่ฝป็ไบ่ฃๅฅณๆง๏ผ่บซ็ฉฟ้ป่ฒ้็ป่กซ๏ผๆญ้
็ฝ่ฒ้กน้พใๅฅน็ๅๆๆพๅจ่็ไธ๏ผ่กจๆ
ๆฌ้ใ่ๆฏๆฏไธๅ ต็ฒ็ณ็็ ๅข๏ผๅๅ็้ณๅ
ๆธฉๆๅฐๆดๅจๅฅน่บซไธ๏ผ่ฅ้ ๅบไธ็งๅฎ้่ๆธฉ้ฆจ็ๆฐๅดใ้ๅคด้็จไธญ่ท็ฆป่ง่ง๏ผ็ชๅบๅฅน็็ฅๆๅๆ้ฅฐ็็ป่ใๅ
็บฟๆๅๅฐๆๅจๅฅน็่ธไธ๏ผๅผบ่ฐๅฅน็ไบๅฎๅ้ฅฐๅ็่ดจๆ๏ผๅขๅ ็ป้ข็ๅฑๆฌกๆไธไบฒๅๅใๆดไธช็ป้ขๆๅพ็ฎๆด๏ผ็ ๅข็็บน็ไธ้ณๅ
็ๅ
ๅฝฑๆๆ็ธๅพ็ๅฝฐ๏ผ็ชๆพๅบไบบ็ฉ็ไผ้
ไธไปๅฎนใ'
image = pipe(
prompt,
height=768,
width=1344,
guidance_scale=4.0,
num_inference_steps=50,
num_images_per_prompt=1,
generator=torch.Generator("cpu").manual_seed(43),
enable_cfg_renorm=True,
enable_prompt_rewrite=True
).images[0]
image.save('./t2i_example.png')
- Downloads last month
- 3,199