🚀I’m very excited to share two of our recent works on offline reinforcement learning and its application in sequential recommendation, accepted at SIGIR 2025 and KDD 2025!
1. Offline Trajectory Optimization for Offline Reinforcement Learning
(accepted at #KDD2025)
Authors: Ziqi Zhao, Zhaochun Ren, Liu Yang, Yunsen Liang, Fajie Yuan, Pengjie Ren, Zhumin Chen, Jun Ma, Xin Xin
Code: https://lnkd.in/esKZJrBc
Preprint: https://lnkd.in/eMJ7T4D6
In this work, we propose OTTO, a plug-in framework that improves offline RL by generating long-horizon trajectories using an ensemble of Transformers. We further introduce an uncertainty-aware evaluator to correct low-confidence simulations, leading to more reliable and effective data augmentation. OTTO significantly boosts policy performance across standard and challenging RL benchmarks.
2. Improving Sequential Recommenders through Counterfactual Augmentation of System Exposure
(accepted at #SIGIR2025)
Authors: Ziqi Zhao, Zhaochun Ren, Jiyuan Yang, Zuming Yan, Zihan Wang, Liu Yang, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Xin Xin
Code: https://lnkd.in/eQK-KDzp
Preprint: https://lnkd.in/ekpVBR-s
We investigate sequential recommendation from a system exposure perspective, and propose CaseRec, which uses a decision transformer-based offline RL model to model user exposure behaviors. By generating counterfactual exposure sequences via data augmentation and a user simulator, CaseRec explores unseen user interests and effectively reduces exposure bias.
54