Building a GENERAL AI agent with reinforcement learning

122K subscribers

21,678 views

About
Share

Published On Mar 20, 2024

Dr. Minqi Jiang and Dr. Marc Rigter explain an innovative new method to make the intelligence of agents more general-purpose by training them to learn many worlds before their usual goal-directed training, which we call "reinforcement learning".

Their new paper is called "Reward-free curricula for training robust world models" https://arxiv.org/pdf/2306.09205.pdf

  / minqijiang
  / marcrigter

Interviewer: Dr. Tim Scarfe

Please support us on Patreon, Tim is now doing MLST full-time and taking a massive financial hit. If you love MLST and want this to continue, please show your support! In return you get access to shows very early and private discord and networking.   / mlst

We are also looking for show sponsors, please get in touch if interested mlstreettalk at gmail.

MLST Discord:   / discord

00:00:00 - Intro
00:01:05 - Model-based Setting
00:02:41 - Similar to POET Paper
00:05:27 - Minimax Regret
00:07:21 - Why Explicitly Model the World?
00:12:47 - Minimax Regret Continued
00:18:17 - Why Would It Converge
00:20:36 - Latent Dynamics Model
00:24:34 - MDPs
00:27:11 - Latent
00:29:53 - Intelligence is Specialised / Overfitting / Sim2real
00:39:39 - Openendedness
00:44:38 - Creativity
00:48:06 - Intrinsic Motivation
00:51:12 - Deception / Stanley
00:53:56 - Sutton / Rewards is Enough
01:00:43 - Are LLMs Just Model Retrievers?
01:03:14 - Do LLMs Model the World?
01:09:49 - Dreamer and Plan to Explore
01:13:14 - Synthetic Data
01:15:21 - WAKER Paper Algorithm
01:21:24 - Emergent Curriculum
01:31:16 - Even Current AI is Externalised/Mimetic
01:36:39 - Brain Drain Academia
01:40:10 - Bitter Lesson / Do We Need Computation
01:44:31 - The Need for Modelling Dynamics
01:47:48 - Need for Memetic Systems
01:50:14 - Results of the Paper and OOD Motifs
01:55:47 - Interface Between Humans and ML

Published On Mar 20, 2024

Share/Embed

Video Link