GPT2-705M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss
9.7407	0.57	1	9.7354
8.0949	1.71	3	9.2987
8.037	2.86	5	7.9942
8.4143	4.0	7	8.3825
7.7196	4.57	8	8.7978
7.2632	5.71	10	7.6261
6.9715	6.86	12	7.4135
6.4835	8.0	14	8.2776
7.1529	8.57	15	7.0085
6.1255	9.71	17	6.8228
5.9176	10.86	19	6.5603
5.5785	12.0	21	6.3862
5.4833	12.57	22	6.3011
5.1483	13.71	24	6.0480
4.9268	14.86	26	6.0532
4.6602	16.0	28	5.7750
4.5647	16.57	29	5.7046
4.3202	17.71	31	5.5333
4.1764	18.86	33	5.5809
4.1745	20.0	35	5.4089
4.0056	20.57	36	5.3978
3.8024	21.71	38	5.4085
3.5845	22.86	40	5.3279
3.4378	24.0	42	5.3881
3.3361	24.57	43	5.2754
3.2585	25.71	45	5.2913
3.168	26.86	47	5.4232
2.9045	28.0	49	5.5044
2.8709	28.57	50	5.5538

Safetensors

Model size

0.7B params

Tensor type

F32