啪啪啦-美女啪啪-啪啪视频数学啪啪啦

啪啪啦» 科学研究» 啪啪啦报告» 讨论班» Information Sciences

讨论班

机器学习与数据科学博士生系列论坛（第一百零四期）—— Data Quality in Mathematical Post-Training: From Answer Correctness to Verifiable Rewards

报告人：孙谌劼 (啪啪啦-美女啪啪-啪啪视频 )

时间：2026-06-11 16:00-17:00

地点：腾讯会议：928-6293-8217

摘要：
Mathematical reasoning has become a central target of LLM post-training, where data quality is not only about clean problem–solution pairs, but about reliable and verifiable training signals. This talk reviews the evolution of mathematical post-training data: from human-curated solutions and synthetic reasoning traces, to answer-level verification, rejection sampling, and process reward models.

We then discuss recent reinforcement learning with verifiable rewards, where the key data unit shifts from a static problem–solution pair to a query–verifier pair. In this setting, high-quality data should be correct, learnable, sufficiently challenging, diverse, and automatically verifiable. The talk concludes with open challenges, including verifier reliability, process-level errors, difficulty selection, synthetic-data noise, and benchmark contamination.

论坛简介：该线上论坛是由张志华教授机器学习实验室组织，每两周主办一次（除了公共假期）。论坛每次邀请一位博士生就某个前沿课题做较为系统深入的介绍，主题包括但不限于机器学习、高维统计学、运筹优化和理论计算机科学。

TOP

啪啪啦

北大数学成就展

人才引进

捐赠