File size: 6,025 Bytes
f8ae9bd 1f6a2dc f8ae9bd 1f6a2dc f8ae9bd 1f6a2dc f8ae9bd 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 1f6a2dc c190603 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
---
title: MobileCLIP Image Classifier
emoji: πΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---
# πΈ MobileCLIP-B Image Classifier
Zero-shot image classification powered by Apple's MobileCLIP-B model, served through an interactive Gradio web interface. This application enables real-time image classification against a dynamic set of text labels, with support for admin-managed label updates and optional Hugging Face Hub persistence.
## π― Key Features
### Core Capabilities
- **πΌοΈ Zero-Shot Classification**: Upload any image for instant classification without model retraining
- **π·οΈ Dynamic Label Management**: Add, remove, and update classification labels on-the-fly
- **π Interactive Results**: Visual confidence scores with sortable data tables
- **β‘ Optimized Performance**: Sub-30ms inference on GPU with re-parameterized MobileOne blocks
- **π Secure Admin Panel**: Token-protected label management interface
- **βοΈ Hub Persistence**: Optional versioned label storage on Hugging Face Hub
### API Access
- **REST API**: Fully accessible via Gradio's automatic API endpoints
- **Base64 Support**: Direct base64 image input for backend integration
- **Batch Processing**: Efficient handling of multiple classification requests
## ποΈ Architecture
### Components
- **`app.py`**: Main Gradio interface with public/admin tabs and API endpoints
- **`handler.py`**: Core model management, inference logic, and label operations
- **`reparam.py`**: MobileOne re-parameterization for optimized inference
- **`items.json`**: Default label catalog with metadata
### Model Details
- **Architecture**: MobileCLIP-B with re-parameterized MobileOne image encoder
- **Text Encoder**: Optimized CLIP text transformer
- **Embedding Cache**: Pre-computed text embeddings for fast inference
- **Device Support**: Automatic GPU/CPU detection with float16 optimization
## π Quick Start
### Environment Variables
Configure in your Space Settings β Variables and secrets:
| Variable | Description | Required |
|----------|-------------|----------|
| `ADMIN_TOKEN` | Secret token for admin operations | Yes (for admin) |
| `HF_LABEL_REPO` | Hub dataset for label storage (e.g., `user/labels`) | No |
| `HF_WRITE_TOKEN` | Token with write permissions to dataset repo | No |
| `HF_READ_TOKEN` | Token with read permissions (defaults to write token) | No |
### Usage Examples
#### Web Interface
1. Navigate to the Space URL
2. Upload an image in the Classification tab
3. Adjust top-k results (default: 10)
4. View ranked predictions with confidence scores
#### API Usage
**Standard Classification:**
```python
import requests
response = requests.post(
"YOUR_SPACE_URL/api/classify_image",
files={"image": open("photo.jpg", "rb")},
data={"top_k": 5}
)
results = response.json()
```
**Base64 Input:**
```python
import base64
import requests
with open("photo.jpg", "rb") as f:
img_base64 = base64.b64encode(f.read()).decode()
response = requests.post(
"YOUR_SPACE_URL/api/classify_base64",
json={
"image": img_base64,
"top_k": 10
}
)
results = response.json()
```
## π§ Admin Operations
### Label Management
Authenticated admins can perform the following operations:
#### Add Labels
```json
{
"op": "upsert_labels",
"token": "YOUR_ADMIN_TOKEN",
"items": [
{"id": 100, "name": "bicycle", "prompt": "a photo of a bicycle"},
{"id": 101, "name": "airplane", "prompt": "a photo of an airplane"}
]
}
```
#### Reload Specific Version
```json
{
"op": "reload_labels",
"token": "YOUR_ADMIN_TOKEN",
"version": 5
}
```
#### Remove Labels
```json
{
"op": "remove_labels",
"token": "YOUR_ADMIN_TOKEN",
"ids": [100, 101]
}
```
### Label Deduplication
- Automatic case-insensitive name deduplication
- Prevents duplicate entries (e.g., "cat", "Cat", "CAT" treated as same)
- ID-based deduplication for consistent label management
## π¦ Hub Integration
When configured with `HF_LABEL_REPO` and tokens, the system automatically:
1. **Saves Snapshots**: Each label update creates versioned snapshots
- `snapshots/v{N}/embeddings.safetensors`: Pre-computed text embeddings
- `snapshots/v{N}/meta.json`: Label metadata and model info
- `snapshots/latest.json`: Points to current version
2. **Loads on Startup**: Fetches latest snapshot or specified version
3. **Fallback**: Uses local `items.json` if Hub unavailable
## π¨ Default Label Catalog
The bundled `items.json` includes 50+ kid-friendly objects with:
- Unique IDs and display names
- CLIP-optimized prompts
- Category metadata
- Fun facts and rarity ratings
Categories include animals, toys, food, vehicles, nature, and everyday objects.
## β‘ Performance Optimization
- **GPU Acceleration**: Automatic CUDA detection with float16 inference
- **CPU Fallback**: Graceful degradation with float32 precision
- **Embedding Cache**: Pre-computed text embeddings updated on label changes
- **Re-parameterization**: MobileOne blocks optimized for inference speed
- **Batch Processing**: Efficient matrix operations for multi-label scoring
## π Security Considerations
- **Token Protection**: Admin operations require `ADMIN_TOKEN`
- **Private Datasets**: Keep label repos private for sensitive applications
- **Input Validation**: Automatic sanitization of uploaded images
- **Memory Management**: Images processed and discarded after inference
## π License
- **Model Weights**: Apple Sample Code License (ASCL)
- **Interface Code**: MIT License
## π€ Contributing
Contributions welcome! Areas for improvement:
- Additional label management features
- Performance optimizations
- Extended API capabilities
- Multi-language support
## π Resources
- [MobileCLIP Paper](https://arxiv.org/abs/2311.17049)
- [OpenCLIP Library](https://github.com/mlfoundations/open_clip)
- [Gradio Documentation](https://gradio.app/docs)
- [Hugging Face Spaces](https://huggingface.co/spaces) |