Baby-Llama-58M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 6.1610

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.00025
train_batch_size: 128
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
num_epochs: 80
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
81.7512	1.0	2	74.4291
81.3083	2.0	4	73.3596
78.6216	3.0	6	71.5365
80.396	4.0	8	70.3538
75.3713	5.0	10	67.4044
74.0418	6.0	12	64.0233
70.1637	7.0	14	60.8437
67.5864	8.0	16	57.9300
64.8984	9.0	18	55.0383
61.2535	10.0	20	52.0253
57.6171	11.0	22	48.9365
54.2922	12.0	24	45.8747
50.3849	13.0	26	43.0132
49.0703	14.0	28	40.4715
45.5158	15.0	30	38.1415
44.3002	16.0	32	35.9572
41.2208	17.0	34	33.8684
39.8837	18.0	36	31.8991
38.1152	19.0	38	29.8574
35.239	20.0	40	28.0249
33.6748	21.0	42	26.4792
30.4729	22.0	44	25.4216
29.436	23.0	46	24.1119
27.72	24.0	48	22.8196
25.5231	25.0	50	21.7862
24.8119	26.0	52	20.4891
23.3658	27.0	54	19.3795
21.4143	28.0	56	18.1634
20.032	29.0	58	17.0348
18.43	30.0	60	16.1163
16.897	31.0	62	15.2508
15.7483	32.0	64	14.3147
15.1794	33.0	66	13.5753
13.7129	34.0	68	12.8868
12.6031	35.0	70	12.6810
11.8192	36.0	72	11.9060
11.6487	37.0	74	11.3454
10.9525	38.0	76	10.8465
10.2164	39.0	78	10.1026
9.5492	40.0	80	9.6511
9.0438	41.0	82	9.2800
8.6141	42.0	84	8.8036
7.9373	43.0	86	8.6612
7.5371	44.0	88	8.1757
7.3186	45.0	90	8.1665
7.033	46.0	92	7.7424
6.7923	47.0	94	7.6650
6.4384	48.0	96	7.4306
6.2449	49.0	98	7.4175
6.1012	50.0	100	7.1466
6.0502	51.0	102	7.1740
5.7839	52.0	104	6.9619
5.6905	53.0	106	6.9416
5.665	54.0	108	6.7945
5.5401	55.0	110	6.7485
5.4773	56.0	112	6.6674
5.4169	57.0	114	6.6132
5.3628	58.0	116	6.5787
5.2021	59.0	118	6.4972
5.2817	60.0	120	6.4866
5.1901	61.0	122	6.4256
5.1268	62.0	124	6.3659
5.1105	63.0	126	6.3563
5.0539	64.0	128	6.3159
4.9715	65.0	130	6.3178
4.872	66.0	132	6.2741
4.9422	67.0	134	6.2699
4.944	68.0	136	6.2551
4.9487	69.0	138	6.2148
4.8968	70.0	140	6.2089
4.822	71.0	142	6.2093
4.965	72.0	144	6.1853
4.8401	73.0	146	6.1747
4.8539	74.0	148	6.1738
4.7751	75.0	150	6.1674
4.8871	76.0	152	6.1644
4.9347	77.0	154	6.1618
4.8009	78.0	156	6.1613
4.8121	79.0	158	6.1610
4.8048	80.0	160	6.1610

Framework versions

Transformers 4.39.1
Pytorch 2.1.2+cu121
Datasets 2.16.1
Tokenizers 0.15.0

Downloads last month: 2

Safetensors

Model size

54.5M params

Tensor type

F32