MobileCLIP / README.md

jordan0811

Create README.md

6eca370 verified 4 months ago

preview code

raw

history blame contribute delete

1.45 kB

metadata

license: apache-2.0
language:
  - en
base_model:
  - apple/MobileCLIP2-S4
  - apple/MobileCLIP2-S2
pipeline_tag: image-text-to-text
tags:
  - MobileCLIP
  - MobileCLIP2
  - CLIP
  - Classification

MobileCLIP2

The following versions of MobileCLIP2 have been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2

MobileCLIP2-S2
MobileCLIP2-S4

If you want to know how to convert the MobileCLIP2 model into an axmodel that can run on the axera npu board, please read this link in detail.

Support Platform

AX650

End-of-board inference time

MobileCLIP2-S2

Stage Time

image encoder 19.146 ms

text encoder 5.675 ms
MobileCLIP2-S4

Stage Time

image encoder 65.328 ms

text encoder 12.663 ms

Stage	Time
image encoder	19.146 ms
text encoder	5.675 ms

Stage	Time
image encoder	65.328 ms
text encoder	12.663 ms

How to use

Download all files from this repository to the device

Run the following command:

python3 run_axmodel.py -ie ./mobileclip2_s4_image_encoder.axmodel -te ./mobileclip2_s4_text_encoder.axmodel -i ./zebra.jpg -t "a zebra" "a dog" "two zebras"

Model input and output examples are as follows:

the image you want to input:
The description of the text you want to categorize:

["a zebra", "a dog", "two zebras"]
Model output class confidence scores:

Label probs: [[6.095444e-02 5.628616e-14 9.390456e-01]]