|
Full Publications
(* equal contribution or alphabetical order)
Wei Xiong*, Chenlu Ye*, Baohao Liao*, Hanze Dong*, Xinxing Xu, Christof Monz, Jiang Bian, Nan Jiang, Tong Zhang, “Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training”, Preprint.
Chenlu Ye, Zhou Yu, Ziji Zhang, Hao Chen, Narayanan Sadagopan, Jing Huang, Tong Zhang, Anurag Beniwalg, “Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training”, Preprint.
Wei Xiong*, Hanning Zhang*, Chenlu Ye*, Lichang Chen, Nan Jiang, Tong Zhang, “Self-rewarding correction for mathematical reasoning”, Preprint.
Chenlu Ye*, Wei Xiong*, Yuheng Zhang*, Hanze Dong*, Nan Jiang, Tong Zhang, “Online iterative reinforcement learning from human feedback with general preference model”, NeurIPS 2024.
Wei Xiong*, Hanze Dong*, Chenlu Ye*, Han Zhong, Nan Jiang, Tong Zhang, “Iterative preference learning from human feedback: Bridging theory and practice for RLHF under KL-constraint”, ICML 2024.
Heyang Zhao*, Chenlu Ye*, Wei Xiong, Quanquan Gu, Tong Zhang, “Logarithmic Regret for Online KL-Regularized Reinforcement Learning”, ICML 2025.
Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang, “Catoni Contextual Bandits are Robust to Heavy-tailed Rewards”, Spotlight of ICML 2025.
Yifan Hao*, Xingyuan Pan*, Hanning Zhang*, Chenlu Ye, Rui Pan, Tong Zhang, “Understanding Overadaptation in Supervised Fine-Tuning: The Role of Ensemble Methods”, ICML 2025.
Heyang Zhao, Chenlu Ye, Quanquan Gu, Tong Zhang, “Sharp Analysis for KL-Regularized Contextual Bandits and RLHF”, NeurIPS 2025.
Chenlu Ye*, Jiafan He*, Quanquan Gu, Tong Zhang, “Towards robust model-based reinforcement learning against adversarial corruption”, ICML 2024.
Chenlu Ye*, Rui Yang*, Quanquan Gu, Tong Zhang, “Corruption-Robust Offline Reinforcement Learning with General Function Approximation”, NeurIPS 2023.
Yong Lin*, Chen Liu*, Chenlu Ye*, Qing Lian, Yuan Yao, Tong Zhang, “Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning”, JMLR.
Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang, “Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes”, ICML 2023.
Jianqing Fan*, Zhaoran Wang*, Zhuoran Yang*, Chenlu Ye*, “Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks”, Preprint.
Xingyuan Pan*, Chenlu Ye*, Joseph Melkonian, Jiaqi W. Ma, Tong Zhang, “Daunce: Data Attribution through Uncertainty Estimation”, Preprint.
Yifan Hao*, Chenlu Ye*, Chi Han, Tong Zhang, “Transformers as Multi-task Learners: Decoupling Features in Hidden Markov Models”, Preprint.
|