Phi-3.5-MultiCap-ref
This model is a fine-tuned version of microsoft/Phi-3.5-mini-instruct on an unknown dataset.
It achieves the following results on the evaluation set:
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 2
Training results
| Training Loss |
Epoch |
Step |
Validation Loss |
| 1.2416 |
0.1354 |
30 |
1.2242 |
| 0.8312 |
0.2707 |
60 |
0.8171 |
| 0.7014 |
0.4061 |
90 |
0.7067 |
| 0.71 |
0.5415 |
120 |
0.6667 |
| 0.6607 |
0.6768 |
150 |
0.6454 |
| 0.6485 |
0.8122 |
180 |
0.6327 |
| 0.6682 |
0.9475 |
210 |
0.6245 |
| 0.6021 |
1.0829 |
240 |
0.6188 |
| 0.6385 |
1.2183 |
270 |
0.6147 |
| 0.595 |
1.3536 |
300 |
0.6110 |
| 0.6039 |
1.4890 |
330 |
0.6087 |
| 0.6286 |
1.6244 |
360 |
0.6068 |
| 0.6249 |
1.7597 |
390 |
0.6055 |
| 0.5812 |
1.8951 |
420 |
0.6048 |
Framework versions
- PEFT 0.12.0
- Transformers 4.44.2
- Pytorch 2.4.0+cu124
- Datasets 2.21.0
- Tokenizers 0.19.1