Research
Papers and Preprints
Gibbs Sampling from Human Feedback: A Provable KL- constrained Framework for RLHF
Wei Xiong*, Hanze Dong*, Chenlu Ye*, Han Zhong, Nan Jiang, Tong Zhang, Preprint.
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks
Jianqing Fan*, Zhaoran Wang*, Zhuoran Yang*, Chenlu Ye*, Preprint.
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
Chenlu Ye*, Rui Yang*, Quanquan Gu and Tong Zhang, NeurIPS 2023.
Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning
Yong Lin*, Chen Liu*, Chenlu Ye*, Qing Lian, Yuan Yao and Tong Zhang, Preprint.
Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
Chenlu Ye, Wei Xiong, Quanquan Gu and Tong Zhang, ICML 2023.
(* equal contribution or alphabetical order)
|