GPT2-705M

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.4628

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.00025
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 50
  • num_epochs: 40
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
9.7135 0.57 1 9.7272
8.0222 1.71 3 9.3213
7.6063 2.86 5 8.5841
7.5596 4.0 7 7.9271
7.4194 4.57 8 8.0942
7.1644 5.71 10 7.5409
6.8531 6.86 12 7.3028
6.3614 8.0 14 9.3796
8.5129 8.57 15 7.6361
6.1325 9.71 17 6.7577
5.8526 10.86 19 6.5249
5.5941 12.0 21 6.2490
5.4307 12.57 22 6.2442
5.1381 13.71 24 5.9595
4.8705 14.86 26 5.8944
4.7083 16.0 28 5.7005
4.5355 16.57 29 5.7459
4.4187 17.71 31 5.5387
4.3123 18.86 33 5.4863
4.0269 20.0 35 5.3277
3.942 20.57 36 5.3274
3.784 21.71 38 5.3998
3.4991 22.86 40 5.4628

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
6
Safetensors
Model size
0.7B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results