LLaMA 1B - Fine-tuned Language Model

This is a LLaMA model trained on FineWeb-Edu dataset with optimized learning rate and sequence length.

Model Details

Model Name: llama_1B_lr_4e-4_100bt
Architecture: LLaMA (Large Language Model Meta AI)
Parameters: ~1B parameters
Training Step: 340,000
Sequence Length: 4096
Vocabulary Size: 128256

Architecture Details

Model Configuration

Hidden Dimension: 2048
Number of Layers: 18
Number of Heads: 16
Head Dimension: None
KV Heads: None
Max Sequence Length: 4096
RoPE Theta: 10000.0
Norm Epsilon: 1e-05
FFN Dimension Multiplier: None
Weight Tying: False

Training Details

Data

Dataset: fineweb_edu_100bt_shuffled
Batch Size: 9
Tokenizer: tiktoken
Tokenizer Path: /fsx-pretraining/home/chunyyyy/blt/bytelatent/tokenizers/original/tokenizer.model
Add BOS Token: True
Add EOS Token: True

Optimization

Learning Rate: 0.0004
Weight Decay: 0.1
Scheduler: cosine
Warmup Steps: 5000

Distributed Training

Data Parallel Replicas: 8
Model Dtype: bf16
FSDP Type: full_shard

Usage

This model uses the LLaMA architecture and contains distributed model weights in PyTorch format. The checkpoint can be loaded using the PyTorch/transformers framework.

# Example loading code
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer

# Note: This requires the specific LLaMA framework used for training
# The checkpoint is saved in distributed format and may need conversion

Evaluation Tasks

The model evaluation configuration:

Validation Steps: 1000
Validation Source: /fsx-pretraining/home/sllokega/intern_workspace/data/fineweb_edu_10bt_val
Generator Max Tokens: 4096
Temperature: 1.0
Top-p: 0.95

Training Configuration

The complete training configuration is preserved in the uploaded files.

Files Description

*.distcp: Distributed checkpoint files containing model weights
params.json: Model parameters and configuration
train_state_*.json: Training state information including optimizer states
config.yaml: Complete training configuration

Citation

If you use this model, please cite the LLaMA paper and the FineWeb-Edu dataset.

Downloads last month: 2