You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

SLlamica_PT4SFT_v2

This model is a fine-tuned version of zidsi/SLlamica_PT4SFT_v1 on the mix of SFT datasets. It achieves the following results on the evaluation set:

  • Loss: 0.6350

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 8
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
0.6724 0.0434 1024 0.6314
0.4483 0.0868 2048 0.6381
0.5805 0.1302 3072 0.6420
0.5451 0.1736 4096 0.6455
0.5763 0.2171 5120 0.6472
0.2692 0.2605 6144 0.6485
0.7263 0.3039 7168 0.6454
0.583 0.3473 8192 0.6455
0.675 0.3907 9216 0.6441
0.4858 0.4341 10240 0.6441
0.4009 0.4775 11264 0.6434
0.3445 0.5209 12288 0.6409
0.4589 0.5644 13312 0.6390
0.4756 0.6078 14336 0.6369
0.7467 0.6512 15360 0.6335
0.3388 0.6946 16384 0.6305
0.5442 0.7380 17408 0.6297
0.4466 0.7814 18432 0.6263
0.8979 0.8248 19456 0.6243
0.5545 0.8682 20480 0.6229
0.4956 0.9116 21504 0.6211
0.4685 0.9551 22528 0.6193
0.5568 0.9985 23552 0.6179
0.5842 1.0419 24576 0.6303
0.5023 1.0853 25600 0.6318
0.75 1.1287 26624 0.6315
0.4927 1.1721 27648 0.6309
0.6002 1.2155 28672 0.6312
0.6534 1.2589 29696 0.6311
0.4356 1.3024 30720 0.6301
0.2869 1.3458 31744 0.6301
0.5718 1.3892 32768 0.6304
0.7933 1.4326 33792 0.6282
0.5082 1.4760 34816 0.6285
0.4851 1.5194 35840 0.6280
0.4756 1.5628 36864 0.6282
0.437 1.6062 37888 0.6266
0.4588 1.6497 38912 0.6266
0.3814 1.6931 39936 0.6259
0.581 1.7365 40960 0.6257
0.6018 1.7799 41984 0.6252
0.5189 1.8233 43008 0.6252
0.3318 1.8667 44032 0.6246
0.6071 1.9101 45056 0.6245
0.5319 1.9535 46080 0.6232
0.7135 1.9969 47104 0.6230
0.3897 2.0404 48128 0.6333
0.6878 2.0838 49152 0.6359
0.3358 2.1272 50176 0.6354
0.6161 2.1706 51200 0.6355
0.3781 2.2140 52224 0.6362
0.2301 2.2574 53248 0.6354
0.2913 2.3008 54272 0.6355
0.3434 2.3442 55296 0.6351
0.3801 2.3877 56320 0.6351
0.4559 2.4311 57344 0.6352
0.3845 2.4745 58368 0.6350
0.3882 2.5179 59392 0.6352
0.5683 2.5613 60416 0.6353
0.7223 2.6047 61440 0.6348
0.6721 2.6481 62464 0.6346
0.3656 2.6915 63488 0.6348
0.441 2.7349 64512 0.6350
0.4434 2.7784 65536 0.6352
0.3688 2.8218 66560 0.6350
0.3774 2.8652 67584 0.6350
0.4117 2.9086 68608 0.6350
0.2889 2.9520 69632 0.6350
0.7988 2.9954 70656 0.6350

Eval result

Tasks Version Filter n-shot Metric Value Stderr
sl_arc_challenge 1 none 0 acc 0.2705 ± 0.0130
none 0 acc_norm 0.2944 ± 0.0133
sl_arc_easy 1 none 0 acc 0.4785 ± 0.0103
none 0 acc_norm 0.4465 ± 0.0102
sl_boolq 2 none 0 acc 0.5508 ± 0.0087
sl_hellaswag 1 none 0 acc 0.3575 ± 0.0048
none 0 acc_norm 0.4460 ± 0.0050
sl_nq_open 4 remove_whitespace 0 exact_match 0.0033 ± 0.0010
sl_openbookqa 1 none 0 acc 0.2240 ± 0.0187
none 0 acc_norm 0.3760 ± 0.0217
sl_piqa 1 none 0 acc 0.6279 ± 0.0113
none 0 acc_norm 0.6257 ± 0.0113
sl_triviaqa 3 remove_whitespace 0 exact_match 0.0176 ± 0.0010
sl_winogrande 1 none 0 acc 0.5572 ± 0.0140

Framework versions

  • Transformers 4.45.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.0
Downloads last month
61
Safetensors
Model size
0.5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zID4si/SLlamica_PT4SFT_v2

Finetuned
(1)
this model

Space using zID4si/SLlamica_PT4SFT_v2 1

Evaluation results