Visual Informatics Group @ University of Texas at Austin

university

https://vita-group.github.io/

VITAGroupUT

VITA-Group

Activity Feed Request to join this org

AI & ML interests

Machine learning

Recent Activity

atlaswang updated a Space about 20 hours ago

vita-group/README

jyhong836 authored a paper 4 months ago

LLMs Can Get "Brain Rot"!

jyhong836 authored a paper 6 months ago

Safe and Robust Watermark Injection with a Single OoD Image

View all activity

Papers

LLMs Can Get "Brain Rot"!

View all Papers

Organization Card

Community About org cards

VITA-Group@UT Austin (https://vita-group.github.io/))

We revisit classical sparse and low-rank optimization through the lens of modern AI, developing theory-driven algorithms that accelerate training and inference in large-scale models. We also investigate how algebraic and logical structures emerge during learning, uncovering the interplay between neural and symbolic computation across streamlined architectures, reasoning pipelines, and agentic systems. See https://www.vita-group.space/research for our latest research efforts.

Compressed LLM Model Zone

NOTE: All compressed LLMs are moved to a new repo at compressed-llm.

The models are prepared by VITA-group. Credits to Ajay Jaiswal, Zhenyu Zhang, Zhangheng Li, Lu Yin, Shiwei Liu and Junyuan Hong.

License: MIT License

Setup environment

pip install torch==2.0.0+cu117 torchvision==0.15.1+cu117 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu117
pip install transformers==4.31.0
pip install accelerate
pip install auto-gptq  # for gptq

How to use pruned models

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = 'llama-2-7b'
comp_method = 'magnitude_unstructured'
comp_degree = 0.2
model_path = f'vita-group/{base_model}_{comp_method}'
model = AutoModelForCausalLM.from_pretrained(
        model_path, 
        revision=f's{comp_degree}',
        torch_dtype=torch.float16, 
        low_cpu_mem_usage=True, 
        device_map="auto"
    )
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.cuda()
outputs = model.generate(input_ids, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))

How to use wanda+gptq models

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
tokenizer_path = 'meta-llama/Llama-2-7b-hf'
model = AutoGPTQForCausalLM.from_quantized(
        model_path,
        # inject_fused_attention=False, # or 
        disable_exllama=True,
        device_map='auto',
    )
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.to('cuda')
outputs = model.generate(input_ids=input_ids, max_length=128)
tokenizer.decode(outputs[0])

How to use gptq models

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
# model_path = 'vita-group/llama-2-7b_wanda_2_4_gptq_4bit_128g'
# tokenizer_path = 'meta-llama/Llama-2-7b-hf'
model_path = 'vita-group/vicuna-7b-v1.3_gptq'
tokenizer_path = 'lmsys/vicuna-7b-v1.3'
model = AutoGPTQForCausalLM.from_quantized(
        model_path,
        # inject_fused_attention=False, # or 
        disable_exllama=True,
        device_map='auto',
        revision='2bit_128g',
    )
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
input_ids = tokenizer('Hello! I am a VITA-compressed-LLM chatbot!', return_tensors='pt').input_ids.to('cuda')
outputs = model.generate(input_ids=input_ids, max_length=128)
tokenizer.decode(outputs[0])

	Base Model	Model Size	Compression Method	Compression Degree
0	Llama-2	13b	magnitude_semistruct	0.5_2to4
1	Llama-2	13b	sparsegpt_semistruct	0.5_2to4
2	Llama-2	7b	magnitude_unstructured	s0.1
3	Llama-2	7b	magnitude_unstructured	s0.2
4	Llama-2	7b	magnitude_unstructured	s0.3
5	Llama-2	7b	magnitude_unstructured	s0.5
6	Llama-2	7b	magnitude_unstructured	s0.6
7	Llama-2	7b	sparsegpt_unstructured	s0.1
8	Llama-2	7b	sparsegpt_unstructured	s0.2
9	Llama-2	7b	sparsegpt_unstructured	s0.3
10	Llama-2	7b	sparsegpt_unstructured	s0.5
11	Llama-2	7b	sparsegpt_unstructured	s0.6
12	Llama-2	7b	wanda_gptq	4bit_128g
13	Llama-2	7b	wanda_unstructured	s0.1
14	Llama-2	7b	wanda_unstructured	s0.2
15	Llama-2	7b	wanda_unstructured	s0.3
16	Llama-2	7b	wanda_unstructured	s0.5
17	Llama-2	7b	wanda_unstructured	s0.6
18	Llama-2-chat	13b	magnitude_semistruct	0.5_2to4
19	Llama-2-chat	13b	sparsegpt_semistruct	0.5_2to4
20	vicuna	13b	magnitude_semistruct	0.5_2to4
21	vicuna	13b	sparsegpt_semistruct	0.5_2to4
22	vicuna-v1.3	13b	gptq	10bit_128g
23	vicuna-v1.3	13b	gptq	12bit_128g
24	vicuna-v1.3	13b	gptq	14bit_128g
25	vicuna-v1.3	13b	gptq	2bit_128g
26	vicuna-v1.3	13b	gptq	3bit_128g
27	vicuna-v1.3	13b	gptq	4bit_128g
29	vicuna-v1.3	13b	gptq	8bit_128g
30	vicuna-v1.3	7b	gptq	10bit_128g
31	vicuna-v1.3	7b	gptq	12bit_128g
32	vicuna-v1.3	7b	gptq	14bit_128g
33	vicuna-v1.3	7b	gptq	2bit_128g
34	vicuna-v1.3	7b	gptq	3bit_128g
35	vicuna-v1.3	7b	gptq	4bit_128g
37	vicuna-v1.3	7b	gptq	8bit_128g

Citations

If you are using models in this hub, please consider citing our papers.

@article{jaiswal2023emergence,
  title={The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter},
  author={Jaiswal, Ajay and Liu, Shiwei and Chen, Tianlong and Wang, Zhangyang},
  journal={arXiv},
  year={2023}
}
@article{jaiswal2023compressing,
      title={Compressing LLMs: The Truth is Rarely Pure and Never Simple}, 
      author={Ajay Jaiswal and Zhe Gan and Xianzhi Du and Bowen Zhang and Zhangyang Wang and Yinfei Yang},
      year={2023},
      journal={arXiv},
}

For any question, please contact Junyuan Hong.

models 12

datasets 0

None public yet