題目:Unified continuous-time q-learning for mean-field game and mean-field control problems
報告人:魏曉利
時間:2025年11月20日(周四),晚上19:00-20:00
地點:騰訊會議(會議號:706282801)
英文摘要:In this talk, we study the continuous-time q-learning in mean-field jump-diffusion models when the population distribution is not directly observable. We propose the integrated q-function in decoupled form (decoupled Iq-function) from the representative agent's perspective and establish its martingale characterization, which provides a unified policy evaluation rule for both mean-field game (MFG) and mean-field control (MFC) problems. Moreover, we consider the learning procedure where the representative agent updates the population distribution based on his own state values. Depending on the task to solve the MFG or MFC problem, we can employ the decoupled Iq-function differently to characterize the mean-field equilibrium policy or the mean-field optimal policy respectively. Based on these theoretical findings, we devise a unified q-learning algorithm for both MFG and MFC problems by utilizing test policies and the averaged martingale orthogonality condition. For several financial applications in the jump-diffusion setting, we obtain the exact parameterization of the decoupled Iq-functions and the value functions, and illustrate our q-learning algorithm with satisfactory performance.
中文摘要:在本次報告中,我們研究當總體分布不可直接觀測時,均值場跳躍-擴散模型中的連續時間Q學習問題。從典型智能體視角出發,我們提出解耦形式的集成Q函數(解耦Iq函數),并建立其鞅刻畫定理,為均值場博弈和均值場控制問題提供了統一的策略評估準則。此外,我們考慮典型智能體根據自身狀態值更新總體分布的學習流程。通過區分求解均值場博弈或均值場控制任務,可差異化運用解耦Iq函數分別表征均值場均衡策略與均值場最優策略。基于這些理論發現,我們利用測試策略和平均鞅正交性條件,構建了適用于均值場博弈與均值場控制問題的統一Q學習算法。針對跳躍-擴散場景下的若干金融應用,我們獲得了解耦Iq函數與價值函數的精確參數化表示,并通過數值實驗驗證了該算法具有令人滿意的性能。
報告人簡介:魏曉利,哈爾濱工業大學副教授(準聘)。本科畢業于中國科學技術大學,2018年于巴黎第七大學獲得博士學位。2019-2021年在加州大學伯克利分校從事博士后。2021年-2023年就職于清華大學深圳國際研究生院。主要從事隨機微分博弈、強化學習等研究。論文發表在Operations Research,Mathematical Finance, SIAM Journal on Control and Optimization等期刊雜志。
中國·浙江 湖州市二環東路759號(313000) 浙ICP備10025412號
浙公網安備 33050202000195號 版權所有:黨委宣傳部