Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 4.9058

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 128
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 80
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
308.4964 1.0 3 274.9261
307.2173 2.0 6 270.1939
293.1988 3.0 9 254.5227
274.059 4.0 12 241.7988
254.2515 5.0 15 224.8893
242.4326 6.0 18 214.8814
235.586 7.0 21 208.6857
235.9312 8.0 24 202.9560
224.2102 9.0 27 196.3082
215.8342 10.0 30 188.9904
206.017 11.0 33 180.7418
186.8781 12.0 36 168.0520
172.4825 13.0 39 145.3422
152.0806 14.0 42 126.3429
127.6911 15.0 45 111.5025
114.9669 16.0 48 99.2848
105.7803 17.0 51 91.4366
96.6882 18.0 54 83.6074
85.8417 19.0 57 74.4550
74.8959 20.0 60 64.7636
65.7121 21.0 63 56.4248
54.3815 22.0 66 48.4127
47.917 23.0 69 40.9706
39.5198 24.0 72 34.3440
33.711 25.0 75 28.6207
27.3896 26.0 78 23.5210
23.4138 27.0 81 19.5687
18.9363 28.0 84 16.8098
16.6662 29.0 87 14.3299
13.9003 30.0 90 12.4524
12.0831 31.0 93 11.2232
10.505 32.0 96 10.0853
9.5992 33.0 99 9.3580
8.8814 34.0 102 8.9046
7.9504 35.0 105 8.1708
7.3651 36.0 108 7.7294
6.8279 37.0 111 7.2767
6.507 38.0 114 7.0724
6.228 39.0 117 6.9470
6.0787 40.0 120 6.5948
5.7443 41.0 123 6.4305
5.607 42.0 126 6.2583
5.3911 43.0 129 6.0870
5.2864 44.0 132 5.9922
5.2063 45.0 135 5.8702
5.1295 46.0 138 5.7636
5.0156 47.0 141 5.7078
4.7705 48.0 144 5.7188
4.8265 49.0 147 5.5697
4.8814 50.0 150 5.4942
4.7241 51.0 153 5.4862
4.6709 52.0 156 5.4192
4.473 53.0 159 5.3817
4.5304 54.0 162 5.3086
4.4462 55.0 165 5.2772
4.3478 56.0 168 5.2420
4.1911 57.0 171 5.2188
4.3088 58.0 174 5.1736
4.2529 59.0 177 5.1341
4.3505 60.0 180 5.1085
4.2754 61.0 183 5.0898
4.2691 62.0 186 5.0628
4.3049 63.0 189 5.0646
4.1317 64.0 192 5.0228
4.2919 65.0 195 5.0214
4.2777 66.0 198 4.9936
4.2473 67.0 201 4.9851
3.9754 68.0 204 4.9721
4.2845 69.0 207 4.9520
4.1962 70.0 210 4.9529
4.0952 71.0 213 4.9481
4.0827 72.0 216 4.9285
4.0752 73.0 219 4.9251
4.1187 74.0 222 4.9239
4.144 75.0 225 4.9110
4.0002 76.0 228 4.9076
4.0264 77.0 231 4.9095
4.0018 78.0 234 4.9098
4.052 79.0 237 4.9071
4.0436 80.0 240 4.9058

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
1
Safetensors
Model size
46.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results