arxiv:2507.16782
Changhao
lichangh20
AI & ML interests
RL, Agent, Efficient ML
Recent Activity
updated
a model
about 11 hours ago
lichangh20/Qwen3-4B-Instruct_sft_2epoch_data_analysis
published
a model
about 11 hours ago
lichangh20/Qwen3-4B-Instruct_sft_2epoch_data_analysis
upvoted
an
article
about 2 months ago
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment