網站首頁

88038威尼斯(中国)官方网站學院“博約學↙術論壇”-閻棟-第435期

來源:周金健 教授 作者:閻棟 博士 (百川智能) 發布時間:2024-03-21

邀請人: 周金健 教授

報告人: 閻棟 博士 (百川智能)

時間: 2024-03-21

地點: 良鄉校區,88038威尼斯(中国)官方网站實「驗中心229會議室

主㊣ 講人簡介:

-88038威尼斯(中国)官方网站學院博約學術論壇系列報告

435

題目:From Scale to Interaction - The LLM Journey

報告人:閻棟 博士 (百川智能)

間:2024-03-21(周四)上午10 : 00

點:良鄉校區88038威尼斯(中国)官方网站實∑驗中心229會議室

摘要:

GPT系列為代表的大語言模流沙岗哨水池型(Large Language Model, LLM)正在深刻的改☉變人類社會的運行方式。 本次報告嘗試從ScaleInteraction兩個方音箱面討論⌒LLM的兩個訓練階段:Pretrain(預訓練)& Alignment(對齊)預訓練階段以AGI現階段唯一能夠scale的第一性原理:next token prediction(通過對下一個標記的預測來進行規模化)入手,介紹技術蛛丝精华發展的脈絡。AlignmentExploration & Exploitation視角入手,介紹如何使用Human Feedback把模型向人類偏好對齊。

簡歷:

閻棟,博士畢業於清華大學計算機√系。歷任Intel中國研究員、清華大學計算機评估系博士後、啟元實驗室機器智能基礎前沿決策方向負責人。主要從事決策算法和系統方面的研究。在算法方新型爆破设备套件面,提出了通過獎勵分配機制連接無模型和基於模型々的強化學習算法的求解框架;在系■統方面,作為架構師設計的強化學習編程框架天授,在Github獲得超過6.6k星標/1k二次開發,相關文章發幽光矿洞表於JMLR。所獲獎勵:ViZDoom挑戰賽2017亞軍/2018冠軍(隊長)、騰訊開悟王者榮耀挑戰賽2022/2023冠軍(指導老師)、2023天行杯智能空戰超視距科目第9(共306支隊伍,負責人)。現為百川智能強化學習負責人。

聯系方式jjzhou@bit.edu.cn

邀請人: 周金教授

址:/

承辦單位:88038威尼斯(中国)官方网站學院先進光電量子結構設計與測量教育部重點實驗◥室

*TitleFrom Scale to Interaction - The LLM Journey

*ReporterDr. Dong Yan, Head of Reinforcement Learning at BaiChuan Intelligence

*TimeMar. 21th, 2024 (Thursday) 10:00 am

*PlaceRoom 229 Physics Experiment Center, Liangxiang Campus

*Contact Person: Prof. Jin-Jian Zhou, jjzhou@bit.edu.cn

*Abstract:

Large Language Models (LLM), represented by the GPT series, are profoundly changing the way human society operates. This report attempts to discuss the two training phases of LLMs - Pretrain and Alignment - from the aspects of Scale and Interaction. The pretraining phase starts with the only scalable first principle approach at the current stage of AGI: next token prediction (scaling by predicting the next token), introducing the development of technology. Alignment starts from the perspective of Exploration & Exploitation, introducing how to use Human Feedback to align the model with human preferences.

*Profile

Dong Yan graduated with a Ph.D. from the Department of Computer Science at Tsinghua University. He has held positions as a researcher at Intel China, a postdoctoral fellow in the Computer Science Department at Tsinghua University, and the head of the Advanced Decision-Making group in Qi Yuan Laboratory, focusing on machine intelligence. His research primarily involves decision-making algorithms and systems. In terms of algorithms, he proposed a solution framework that connects model-free and model-based reinforcement learning algorithms through a reward distribution mechanism. In terms of systems, he designed the reinforcement learning programming framework "Tian Shou," which has garnered over 6.6k stars and 1k forks on GitHub, with related articles published in JMLR. His awards include runner-up in the 2017 ViZDoom challenge and champion in 2018 (as team leader), champion of Tencent's "Enlightenment" Honor of Kings challenge in 2022/2023 (as a mentoring teacher), and 9th place (out of 306 teams, as team leader) in the 2023 "Tian Xing Cup" intelligent aerial combat beyond visual range category. He is currently the head of Reinforcement Learning at BaiChuan Intelligence.

"88038威尼斯,88038威尼斯(中国)官方网站,威澳门尼斯人官网欢迎您"ptekcorp.com

"88038威尼斯,88038威尼斯(中国)官方网站,威澳门尼斯人官网欢迎您"ptekcorp.com