File size: 6,025 Bytes
f8ae9bd
1f6a2dc
 
 
 
f8ae9bd
1f6a2dc
f8ae9bd
 
1f6a2dc
f8ae9bd
 
1f6a2dc
 
c190603
1f6a2dc
c190603
1f6a2dc
c190603
 
 
 
 
 
 
1f6a2dc
c190603
 
 
 
1f6a2dc
c190603
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f6a2dc
 
 
c190603
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1f6a2dc
c190603
1f6a2dc
c190603
 
1f6a2dc
c190603
1f6a2dc
c190603
 
 
 
 
1f6a2dc
c190603
1f6a2dc
c190603
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
---
title: MobileCLIP Image Classifier
emoji: πŸ“Έ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---

# πŸ“Έ MobileCLIP-B Image Classifier

Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence.

## 🎯 Key Features

### Core Capabilities
- **πŸ–ΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining
- **🏷️ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly
- **πŸ“Š Interactive Results**: Visual confidence scores with sortable data tables
- **⚑ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks
- **πŸ”’ Secure Admin Panel**: Token-protected label management interface
- **☁️ Hub Persistence**: Optional versioned label storage on Hugging Face Hub

### API Access
- **REST API**: Fully accessible via Gradio's automatic API endpoints
- **Base64 Support**: Direct base64 image input for backend integration
- **Batch Processing**: Efficient handling of multiple classification requests

## πŸ—οΈ Architecture

### Components
- **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints
- **`handler.py`**: Core model management, inference logic, and label operations
- **`reparam.py`**: MobileOne re-parameterization for optimized inference
- **`items.json`**: Default label catalog with metadata

### Model Details
- **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder
- **Text Encoder**: Optimized CLIP text transformer
- **Embedding Cache**: Pre-computed text embeddings for fast inference
- **Device Support**: Automatic GPU/CPU detection with float16 optimization

## πŸš€ Quick Start

### Environment Variables

Configure in your Space Settings β†’ Variables and secrets:

| Variable | Description | Required |
|----------|-------------|----------|
| `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) |
| `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No |
| `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No |
| `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No |

### Usage Examples

#### Web Interface
1. Navigate to the Space URL
2. Upload an image in the Classification tab
3. Adjust top-k results (default: 10)
4. View ranked predictions with confidence scores

#### API Usage

**Standard Classification:**
```python
import requests

response = requests.post(
    "YOUR_SPACE_URL/api/classify_image",
    files={"image": open("photo.jpg", "rb")},
    data={"top_k": 5}
)
results = response.json()
```

**Base64 Input:**
```python
import base64
import requests

with open("photo.jpg", "rb") as f:
    img_base64 = base64.b64encode(f.read()).decode()

response = requests.post(
    "YOUR_SPACE_URL/api/classify_base64",
    json={
        "image": img_base64,
        "top_k": 10
    }
)
results = response.json()
```

## πŸ”§ Admin Operations

### Label Management

Authenticated admins can perform the following operations:

#### Add Labels
```json
{
  "op": "upsert_labels",
  "token": "YOUR_ADMIN_TOKEN",
  "items": [
    {"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"},
    {"id": 101, "name": "airplane", "prompt": "a photo of an airplane"}
  ]
}
```

#### Reload Specific Version
```json
{
  "op": "reload_labels",
  "token": "YOUR_ADMIN_TOKEN",
  "version": 5
}
```

#### Remove Labels
```json
{
  "op": "remove_labels",
  "token": "YOUR_ADMIN_TOKEN",
  "ids": [100, 101]
}
```

### Label Deduplication
- Automatic case-insensitive name deduplication
- Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same)
- ID-based deduplication for consistent label management

## πŸ“¦ Hub Integration

When configured with `HF_LABEL_REPO` and tokens, the system automatically:

1. **Saves Snapshots**: Each label update creates versioned snapshots
   - `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings
   - `snapshots/v{N}/meta.json`: Label metadata and model info
   - `snapshots/latest.json`: Points to current version

2. **Loads on Startup**: Fetches latest snapshot or specified version
3. **Fallback**: Uses local `items.json` if Hub unavailable

## 🎨 Default Label Catalog

The bundled `items.json` includes 50+ kid-friendly objects with:
- Unique IDs and display names
- CLIP-optimized prompts
- Category metadata
- Fun facts and rarity ratings

Categories include animals, toys, food, vehicles, nature, and everyday objects.

## ⚑ Performance Optimization

- **GPU Acceleration**: Automatic CUDA detection with float16 inference
- **CPU Fallback**: Graceful degradation with float32 precision
- **Embedding Cache**: Pre-computed text embeddings updated on label changes
- **Re-parameterization**: MobileOne blocks optimized for inference speed
- **Batch Processing**: Efficient matrix operations for multi-label scoring

## πŸ” Security Considerations

- **Token Protection**: Admin operations require `ADMIN_TOKEN`
- **Private Datasets**: Keep label repos private for sensitive applications
- **Input Validation**: Automatic sanitization of uploaded images
- **Memory Management**: Images processed and discarded after inference

## πŸ“„ License

- **Model Weights**: Apple Sample Code License (ASCL)
- **Interface Code**: MIT License

## 🀝 Contributing

Contributions welcome! Areas for improvement:
- Additional label management features
- Performance optimizations
- Extended API capabilities
- Multi-language support

## πŸ“š Resources

- [MobileCLIP Paper](https://arxiv.org/abs/2311.17049)
- [OpenCLIP Library](https://github.com/mlfoundations/open_clip)
- [Gradio Documentation](https://gradio.app/docs)
- [Hugging Face Spaces](https://huggingface.co/spaces)