SLlamica_PT4SFT_v2
This model is a fine-tuned version of zidsi/SLlamica_PT4SFT_v1 on the mix of SFT datasets. It achieves the following results on the evaluation set:
- Loss: 0.6350
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- total_train_batch_size: 8
- total_eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 3.0
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.6724 | 0.0434 | 1024 | 0.6314 |
| 0.4483 | 0.0868 | 2048 | 0.6381 |
| 0.5805 | 0.1302 | 3072 | 0.6420 |
| 0.5451 | 0.1736 | 4096 | 0.6455 |
| 0.5763 | 0.2171 | 5120 | 0.6472 |
| 0.2692 | 0.2605 | 6144 | 0.6485 |
| 0.7263 | 0.3039 | 7168 | 0.6454 |
| 0.583 | 0.3473 | 8192 | 0.6455 |
| 0.675 | 0.3907 | 9216 | 0.6441 |
| 0.4858 | 0.4341 | 10240 | 0.6441 |
| 0.4009 | 0.4775 | 11264 | 0.6434 |
| 0.3445 | 0.5209 | 12288 | 0.6409 |
| 0.4589 | 0.5644 | 13312 | 0.6390 |
| 0.4756 | 0.6078 | 14336 | 0.6369 |
| 0.7467 | 0.6512 | 15360 | 0.6335 |
| 0.3388 | 0.6946 | 16384 | 0.6305 |
| 0.5442 | 0.7380 | 17408 | 0.6297 |
| 0.4466 | 0.7814 | 18432 | 0.6263 |
| 0.8979 | 0.8248 | 19456 | 0.6243 |
| 0.5545 | 0.8682 | 20480 | 0.6229 |
| 0.4956 | 0.9116 | 21504 | 0.6211 |
| 0.4685 | 0.9551 | 22528 | 0.6193 |
| 0.5568 | 0.9985 | 23552 | 0.6179 |
| 0.5842 | 1.0419 | 24576 | 0.6303 |
| 0.5023 | 1.0853 | 25600 | 0.6318 |
| 0.75 | 1.1287 | 26624 | 0.6315 |
| 0.4927 | 1.1721 | 27648 | 0.6309 |
| 0.6002 | 1.2155 | 28672 | 0.6312 |
| 0.6534 | 1.2589 | 29696 | 0.6311 |
| 0.4356 | 1.3024 | 30720 | 0.6301 |
| 0.2869 | 1.3458 | 31744 | 0.6301 |
| 0.5718 | 1.3892 | 32768 | 0.6304 |
| 0.7933 | 1.4326 | 33792 | 0.6282 |
| 0.5082 | 1.4760 | 34816 | 0.6285 |
| 0.4851 | 1.5194 | 35840 | 0.6280 |
| 0.4756 | 1.5628 | 36864 | 0.6282 |
| 0.437 | 1.6062 | 37888 | 0.6266 |
| 0.4588 | 1.6497 | 38912 | 0.6266 |
| 0.3814 | 1.6931 | 39936 | 0.6259 |
| 0.581 | 1.7365 | 40960 | 0.6257 |
| 0.6018 | 1.7799 | 41984 | 0.6252 |
| 0.5189 | 1.8233 | 43008 | 0.6252 |
| 0.3318 | 1.8667 | 44032 | 0.6246 |
| 0.6071 | 1.9101 | 45056 | 0.6245 |
| 0.5319 | 1.9535 | 46080 | 0.6232 |
| 0.7135 | 1.9969 | 47104 | 0.6230 |
| 0.3897 | 2.0404 | 48128 | 0.6333 |
| 0.6878 | 2.0838 | 49152 | 0.6359 |
| 0.3358 | 2.1272 | 50176 | 0.6354 |
| 0.6161 | 2.1706 | 51200 | 0.6355 |
| 0.3781 | 2.2140 | 52224 | 0.6362 |
| 0.2301 | 2.2574 | 53248 | 0.6354 |
| 0.2913 | 2.3008 | 54272 | 0.6355 |
| 0.3434 | 2.3442 | 55296 | 0.6351 |
| 0.3801 | 2.3877 | 56320 | 0.6351 |
| 0.4559 | 2.4311 | 57344 | 0.6352 |
| 0.3845 | 2.4745 | 58368 | 0.6350 |
| 0.3882 | 2.5179 | 59392 | 0.6352 |
| 0.5683 | 2.5613 | 60416 | 0.6353 |
| 0.7223 | 2.6047 | 61440 | 0.6348 |
| 0.6721 | 2.6481 | 62464 | 0.6346 |
| 0.3656 | 2.6915 | 63488 | 0.6348 |
| 0.441 | 2.7349 | 64512 | 0.6350 |
| 0.4434 | 2.7784 | 65536 | 0.6352 |
| 0.3688 | 2.8218 | 66560 | 0.6350 |
| 0.3774 | 2.8652 | 67584 | 0.6350 |
| 0.4117 | 2.9086 | 68608 | 0.6350 |
| 0.2889 | 2.9520 | 69632 | 0.6350 |
| 0.7988 | 2.9954 | 70656 | 0.6350 |
Eval result
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| sl_arc_challenge | 1 | none | 0 | acc | ↑ | 0.2705 | ± | 0.0130 |
| none | 0 | acc_norm | ↑ | 0.2944 | ± | 0.0133 | ||
| sl_arc_easy | 1 | none | 0 | acc | ↑ | 0.4785 | ± | 0.0103 |
| none | 0 | acc_norm | ↑ | 0.4465 | ± | 0.0102 | ||
| sl_boolq | 2 | none | 0 | acc | ↑ | 0.5508 | ± | 0.0087 |
| sl_hellaswag | 1 | none | 0 | acc | ↑ | 0.3575 | ± | 0.0048 |
| none | 0 | acc_norm | ↑ | 0.4460 | ± | 0.0050 | ||
| sl_nq_open | 4 | remove_whitespace | 0 | exact_match | ↑ | 0.0033 | ± | 0.0010 |
| sl_openbookqa | 1 | none | 0 | acc | ↑ | 0.2240 | ± | 0.0187 |
| none | 0 | acc_norm | ↑ | 0.3760 | ± | 0.0217 | ||
| sl_piqa | 1 | none | 0 | acc | ↑ | 0.6279 | ± | 0.0113 |
| none | 0 | acc_norm | ↑ | 0.6257 | ± | 0.0113 | ||
| sl_triviaqa | 3 | remove_whitespace | 0 | exact_match | ↑ | 0.0176 | ± | 0.0010 |
| sl_winogrande | 1 | none | 0 | acc | ↑ | 0.5572 | ± | 0.0140 |
Framework versions
- Transformers 4.45.0
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.20.0
- Downloads last month
- 61
Model tree for zID4si/SLlamica_PT4SFT_v2
Base model
zidsi/SLlamica_PT4SFT_v1