Chenlu Ye

Ph.D. student
University of Illinois Urbana-Champaign, Computer Science

I am a first-year Ph.D. student in computer science at UIUC, where I am fortunate to advised by Prof. Tong Zhang. Prior to this, I obtained a master's degree in IIP (AI) at The Hong Kong University of Science and Technology, and received a B.S. in Statistics from the University of Science and Technology of China in 2021. Additionally, I was a visiting scholar in AGI LAB @ UCLA from 2023.08 to 2023.12, working with Prof. Quanquan Gu.

Research Interests

My research interests span the intersection of reinforcement learning for post-training and decision making problems, with a particular emphasis on the application in post training of llm and reasoning.

If you are interested in discussing or collaborating, please feel free to contact me via email: chenluy3 AT illinois DOT edu.

Publications and Preprints

(*) denotes alphabetical order or equal contribution.

Reinforcement Learning for Post-Training
1. Logarithmic Regret for Online KL-Regularized Reinforcement Learning
  Heyang Zhao*, Chenlu Ye*, Wei Xiong, Quanquan Gu, Tong Zhang, Preprint.
  We show that KL-regularized reinforcement learning with online exploration enjoy logarithmic regret bound.
2. Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
  Heyang Zhao, Chenlu Ye, Wei Xiong, Quanquan Gu, Tong Zhang, Preprint.
  We prove sharp sample complexity for KL-regularized contextual bandits and reinforcement learning from human feedback.
3. Online iterative reinforcement learning from human feedback with general preference model
  Chenlu Ye*, Wei Xiong*, Yuheng Zhang*, Hanze Dong*, Nan Jiang, Tong Zhang, NeurIPS 2024.
  We study general preference without assuming the Bradley–Terry model. We propose sample efficient algorithms for both online and offline settings and validate their efficiency theoretically and empirically.
4. Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
  Wei Xiong*, Hanze Dong*, Chenlu Ye*, Han Zhong, Nan Jiang, Tong Zhang, ICML 2024.
  We formulate the real-world RLHF process as a reverse-KL regularized contextual bandit and study its theoretical property by proposing statistically efficient algorithms with a finite-sample theoretical guarantee. We also connect our theoretical findings with practical algorithms (e.g. DPO, RSO), offering new tools and insights for the algorithmic design of alignment algorithms.
Theory of decision making porblems
1. Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
  Chenlu Ye*, Yujia Jin, Alekh Agarwal, Tong Zhang, Preprint.
  We build online contectual bandits based on Catoni's estimator from robust statistics under general function approximation and show that the regret bound depends logarithmically on the reward range for both known and unkown reward variances.
2. Towards robust model-based reinforcement learning against adversarial corruption
  Chenlu Ye*, Jiafan He*, Quanquan Gu, Tong Zhang, ICML 2024.
  An analysis of uncertainty-aware algorithms in the model-based framework under adversarial corruption and general function approximation.
3. Corruption-Robust Offline Reinforcement Learning with General Function Approximation
  Chenlu Ye*, Rui Yang*, Quanquan Gu and Tong Zhang, NeurIPS 2023.
  An application of the uncertainty-weighting technique in offline reinforcement learning problems under adversarial corruption and general function approximation. Moreover, practical implementations under various data-corruption scenarios are carried out on the uncertainty-weighting algorithm, which outperforms the state-of-the-art.
4. Corruption-Robust Algorithms with Uncertainty Weighting for Nonlinear Contextual Bandits and Markov Decision Processes
  Chenlu Ye, Wei Xiong, Quanquan Gu and Tong Zhang, ICML 2023.
  An application of uncertainty-weighted regression in the face of adversarial corruptions and under general function approximation: new weight design, and new techniques for controlling the sum of the (weighted) bonus (counterpart of the elliptical potential lemmas).
5. Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks
  Jianqing Fan*, Zhaoran Wang*, Zhuoran Yang*, Chenlu Ye* (Alphabetical), Preprint.
  A batching framework for high-dimensional multi-armed bandit problems, with simulations on both synthetic and real-world data.
Active Learning
1. Optimal Sample Selection Through Uncertainty Estimation and Its Application in Deep Learning
  Yong Lin*, Chen Liu*, Chenlu Ye*, Qing Lian, Yuan Yao and Tong Zhang, Preprint.
  A Theoretically optimal and computationally efficient sample selection approach, which can be effectively applied to deep learning and is robust to misspecification (by down-weighting highly uncertain samples).