Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 6.1610

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 128
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 80
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
81.7512 1.0 2 74.4291
81.3083 2.0 4 73.3596
78.6216 3.0 6 71.5365
80.396 4.0 8 70.3538
75.3713 5.0 10 67.4044
74.0418 6.0 12 64.0233
70.1637 7.0 14 60.8437
67.5864 8.0 16 57.9300
64.8984 9.0 18 55.0383
61.2535 10.0 20 52.0253
57.6171 11.0 22 48.9365
54.2922 12.0 24 45.8747
50.3849 13.0 26 43.0132
49.0703 14.0 28 40.4715
45.5158 15.0 30 38.1415
44.3002 16.0 32 35.9572
41.2208 17.0 34 33.8684
39.8837 18.0 36 31.8991
38.1152 19.0 38 29.8574
35.239 20.0 40 28.0249
33.6748 21.0 42 26.4792
30.4729 22.0 44 25.4216
29.436 23.0 46 24.1119
27.72 24.0 48 22.8196
25.5231 25.0 50 21.7862
24.8119 26.0 52 20.4891
23.3658 27.0 54 19.3795
21.4143 28.0 56 18.1634
20.032 29.0 58 17.0348
18.43 30.0 60 16.1163
16.897 31.0 62 15.2508
15.7483 32.0 64 14.3147
15.1794 33.0 66 13.5753
13.7129 34.0 68 12.8868
12.6031 35.0 70 12.6810
11.8192 36.0 72 11.9060
11.6487 37.0 74 11.3454
10.9525 38.0 76 10.8465
10.2164 39.0 78 10.1026
9.5492 40.0 80 9.6511
9.0438 41.0 82 9.2800
8.6141 42.0 84 8.8036
7.9373 43.0 86 8.6612
7.5371 44.0 88 8.1757
7.3186 45.0 90 8.1665
7.033 46.0 92 7.7424
6.7923 47.0 94 7.6650
6.4384 48.0 96 7.4306
6.2449 49.0 98 7.4175
6.1012 50.0 100 7.1466
6.0502 51.0 102 7.1740
5.7839 52.0 104 6.9619
5.6905 53.0 106 6.9416
5.665 54.0 108 6.7945
5.5401 55.0 110 6.7485
5.4773 56.0 112 6.6674
5.4169 57.0 114 6.6132
5.3628 58.0 116 6.5787
5.2021 59.0 118 6.4972
5.2817 60.0 120 6.4866
5.1901 61.0 122 6.4256
5.1268 62.0 124 6.3659
5.1105 63.0 126 6.3563
5.0539 64.0 128 6.3159
4.9715 65.0 130 6.3178
4.872 66.0 132 6.2741
4.9422 67.0 134 6.2699
4.944 68.0 136 6.2551
4.9487 69.0 138 6.2148
4.8968 70.0 140 6.2089
4.822 71.0 142 6.2093
4.965 72.0 144 6.1853
4.8401 73.0 146 6.1747
4.8539 74.0 148 6.1738
4.7751 75.0 150 6.1674
4.8871 76.0 152 6.1644
4.9347 77.0 154 6.1618
4.8009 78.0 156 6.1613
4.8121 79.0 158 6.1610
4.8048 80.0 160 6.1610

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
2
Safetensors
Model size
54.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support