zBotta/SmolLM2-360M-AccidentReports-distilled-kd1.7B
A compact 360M parameter instruction model specialized for single-paragraph accident/incident reports.
This student model was distilled (logit KD) from SmolLM2-1.7B-Instruct using QLoRA on zBotta/traffic-accidents-reports-5k.
It converts structured 5W1H inputs (What/When/Where/Who/How/Why + ContingencyActions) into concise, factual narratives.
Performance note: In quick spot-checks, this distilled model did not outperform the baseline
zBotta/smollm2-accident-reporter-360m-800on Cross-Encoding similarity under the same settings.
✨ Intended Use
- Turn structured 5W1H fields into a single, factual paragraph describing an accident/incident.
- Suitable for demos, prototypes, or lightweight backends needing small-footprint text generation.
🚫 Out of Scope / Limitations
- Not a general-purpose chat or reasoning model.
- May omit facts not present in the input by design; may still hallucinate with ambiguous prompts.
- Not a legal/safety authority; human validation is required before operational use.
🧾 Prompt Format
Training used a simple instruction schema with an explicit response marker:
Instruction:
You are a reporting agent.
You task is to create a report when provided the what, when, why, who, how and where questions about the events.
You are also given information about the contingency actions regarding the event.
Guidelines:
Generate only one report given the informations about the event
Generate the report as text in one paragraph
It is important to focus on accuracy and coherence when generating the report so that the description content matches the information provided (what, when, where, who, how , why, contingency actions).
If an information is not provided in (what, when, where, who, how , why, contingency actions), it must not be part of the generated text description.
Input:
What: ...
When: ...
Where: ...
Who: ...
How: ...
Why: ...
ContingencyActions: ...
Response:
🚀 How to Use
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "zBotta/SmolLM2-360M-AccidentReports-distilled-kd1.7B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") # merged FP16 weights
prompt = """### Instruction:
You are a reporting agent...
[full instruction text as above]
### Input:
What: rear-end collision between car A and van B at a traffic light
When: 2025-05-17 08:25 (occurrence)
Where: Main St & 3rd Ave, downtown
Who: Driver A (car), Driver B (van); police on scene
How: A failed to stop in time at red
Why: suspected distraction; investigation pending
ContingencyActions: police report filed; medical check for minor neck pain; vehicles towed
### Response:
"""
inputs = tok(prompt, return_tensors="pt").to(model.device)
gen = model.generate(
**inputs,
do_sample=False, # deterministic decoding
max_new_tokens=256,
eos_token_id=tok.eos_token_id,
pad_token_id=tok.pad_token_id,
no_repeat_ngram_size=4,
repetition_penalty=1.05,
renormalize_logits=True,
)
print(tok.decode(gen[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip())
🧰 Training Details
Distillation (Advanced KD)
Teacher: HuggingFaceTB/SmolLM2-1.7B-Instruct
Student: HuggingFaceTB/SmolLM2-360M-Instruct
KD temperature (KD_T): 3
Top-K KD (KD_TOP_K): 64 (partial KL on teacher top-K)
CE weight schedule (KD_ALPHA): start 0.6 → end 0.3 (KL gets 1 - CE)
Soft KD weighting: gamma=1.0, w_min=0.15
Unlikelihood (anti-repeat): β_UL = 0.05
Demo prompting: DEMO_PROB = 1.0 (always one demonstration)
KD schedule used trainer defaults (warmup 2 epochs, ramp 2 epochs).
QLoRA
Quantization: 4-bit nf4 with double quant; compute in fp16
LoRA:
rank r=8, lora_dropout=0.05, lora_alpha≈2r
Targets: typical attention/MLP projections (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
SFT / Trainer Config
SFTConfig(
output_dir=OUT_DIR,
num_train_epochs=20,
per_device_train_batch_size=4,
gradient_accumulation_steps=16, # effective batch ≈ 64
learning_rate=2e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
weight_decay=0.05,
label_smoothing_factor=0.05,
max_grad_norm=0.5,
logging_steps=50,
eval_strategy="epoch",
save_strategy="epoch",
save_total_limit=2,
load_best_model_at_end=True,
metric_for_best_model="eval_loss",
greater_is_better=False,
fp16=False, bf16=False, # training precision handled by bnb/device
optim="adamw_bnb_8bit",
packing=False,
max_length=1024,
gradient_checkpointing=True,
gradient_checkpointing_kwargs={"use_reentrant": False},
remove_unused_columns=False,
dataloader_num_workers=4,
dataloader_pin_memory=True,
report_to="none",
seed=42,
)
⚠️ Limitations & Bias
English-focused; short outputs only.
Domain-narrow: optimized for accident/incident narratives only.
Susceptible to input ambiguity; ensure clear, complete 5W1H fields.
May inherit biases from training data; do not rely on demographic attributes.
🛡️ Responsible / Safety
Treat outputs as drafts; human review is mandatory.
Avoid use in contexts where factual errors could cause harm without oversight.
Do not include PII unless you have legal basis and consent.
⚙️ Hardware & Inference
Designed for small-footprint serving; works well on consumer GPUs.
CPU inference is possible with the merged FP16 weights (will run in float32 on CPU via transformers).
Citation
If you use this model, please cite:
The source dataset: DSTI/traffic-accidents-reports-5k
Relevant distillation literature, e.g.: Hinton, Vinyals, Dean. Distilling the Knowledge in a Neural Network (2015).
@misc{accident_reporter_360m_distilled_kd1.7B,
title = {Accident Reporting distilled kd model (One-Paragraph)},
author = {zBotta, SamdGuizani},
year = {2025}
}
- Downloads last month
- 3
Model tree for DSTI/SmolLM2-360M-AccidentReports-distilled-kd1.7B
Base model
HuggingFaceTB/SmolLM2-360M