Ashis Kumer Biswas

Ashis Kumer Biswas

Assistant Professor

CSCI-4800/5800: AI with Reinforcement Learning

Graduate course, SCIENCE-1067 (Tue only) + Online async, 2023

We now live in an era of Artificial Intelligence (AI) where we rely on responses as well as actions by numerous autonomous systems that are crisscrossed in our daily lives. These systems are powered by AI that learn to provide us with reasonable answers for us with respect to our respective perspectives. Reinforcement learning is one of the most advanced and powerful way of developing such systems and are very much in line with the learning paradigms used to make us knowledgeable since our childhood, which is to learn from our mistakes. In this course, students are going to get a solid foundation in the field of reinforcement learning, learn the core challenges, and ideas to bring in newer approaches to make the systems robust, and more humanoid, and better to some degree. Through a combination of lectures, programming assignments students are expected to receive a hands-on-experience in exploring this field effectively. In addition, through the final project in this course, students will advance their understanding of reinforcement learning paradigm and are going to be able to design, develop and demonstrate by the end of the semester smart competitive players in video games, autonomous chatbots, autonomous vehicle control systems, early detection of malicious activities in the communication networks in the field of cybersecurity, and so on.

Course objectives

By the end of the course you are expected to gain the following skills:

learn key ideas and algorithms of reinforcement learning – a more powerful paradigm in the field of machine learning.
be able to understand where reinforcement learning algorithms fit to solve problems.
apply reinforcement learning algorithms to solve various practical problems.

Prerequisites

The graduate standing.

Recommended Textbooks

Winder, Phil (2020). Reinforcement learning – industrial application of intelligent agents. O’Reilly Media. [ official-web ]
Sutton Richard S. and Barto Andrew G. (2018) Reinforcement Learning: An Introduction. MIT Press [ official-web], [PDF]

Topics planned to be covered

Introduction to Reinforcement Learning
Introduction to Amazon AWS Deepracer
Policy based and value based learning algorithms
Monte-carlo methods to learn
Deep Learning introduction
Comparative analysis of 3 reinforcement learning paradigms: DP, MC, TD
The SARSA algorithm
A non-tabular (e.g., approximation) approach in RL
Continous action space
Actor-critic
Value function approximation
On Temporal differende learning – TD(Lamba)

Schedule

Week 1

Total watch hour: x hours y minutes and z seconds

Introduction to Reinforcement Learning
Course Logistics & expectations
Reinforcement Learning applications (a higher-level overview)
Introduction to Reinforcement Learning with OpenAI-Gym [ Notebook/Slide ] [ Github-Repo ] [ Video-Recording, 37:11 ]
Working with OpenAI-Gym environments [Notebook/Slides: ALE/Breakout, Blackjack, CarRacing, ALE/Pong, ALE/Riverraid ][ Github-Repo ] [ Video-Recording, 16:00 ]
Making of an Intelligent CartPole agent [Notebook/Slides: Random CartPole Agent, Q-learning CartPole Agent][ Github-Repo ] [ Video-Recording, 48:27]
Non-gym environment and Reinforcement learning from scratch [Notebook/Slides: Goal-vs-Hole-v0, Goal-vs-Hole-v1][ Github-Repo ] [ Video-Recording, 35:09 ]

Week 2

Total watch hour: x hours y minutes and z seconds

Multi-bandit problem [Notebook+slides: ashiskb@github link] [A supporting video lecture by Connor Shorten] [Reference study: SB-2]
Markov Decision Process (MDP), Dynamic Programming for action selection intro [ Slides] [Video-Lecture by Dr. B] [Reference study: SB-3, 4]

Week 3

Total watch hour: x hours y minutes and z seconds

More into MDP and Dynamic Programming,
- [Video Lecture by Connor Shorten, Reference study: SB-3]
- [Video Lecture by Connor Shorten, Reference study: SB-4]

Week 4

Total watch hour: x hours y minutes and z seconds

Temporal Difference Learning
Q-learning
n-Step algorithms
Monte-Carlo Methods

Week 5

Total watch hour: x hours y minutes and z seconds

Introduction to AWS Deepracer in the ML lab (LSC-822) a. Manufacturer website b. Getting started with Deepracer c. Deepracer developer guide

Week 6

Total watch hour: x hours y minutes and z seconds

Value function with all 3 algorithms: Notebook

a. Dynamic Programming (DP) b. Monte-Carlo (MC) c. Temporal Difference Learning (TD), particularly, TD(0) that is known as the one-step temporal difference learning.

Week 7

Total watch hour: x hours y minutes and z seconds

Deep Q-networks
Policy Gradient Methods
Beyond Policy Gradients

Week 8

Reserved for midterm

Week 9

Total watch hour: x hours y minutes and z seconds

Reinforcement Learning in action

Week 10

SPRING BREAK!!! No classes scheduled.

Week 11 (3/28/2023)

Total watch hour: x hours y minutes and z seconds

Model-based planning with Dynamic programming [Slides] [Video Lecture by David Silver]
Model-free prediction – An introduction to Monte-Carlo Learning and Temporal Difference Learning [Slides] [Video Lecture by David Silver]
Model-free control – Dives into On Policy Monte-Carlo Control and Temporal Difference Learning, as well as Off-Policy Learning. [Slides] [Video Lecture by David Silver]

Week 12 (4/4/2023)

Total watch hour: x hours y minutes and z seconds

Value function approximation – A deep dive into incremental methods and batch methods of value function approximation. [Slides] [Video Lecture by David Silver]
Policy gradient methods – Looks at different policy gradients, including Finite Difference, Monte-Carlo and Actor Critic. [Slides] [Video Lecture by David Silver]

Week 13

Total watch hour: x hours y minutes and z seconds

Integrating learning and planning – Introduces model-based RL, along with integrated architectures and simulation based search. [Slides] [Video Lecture by David Silver]
Exploration and exploitation [Slides] [Video Lecture by David Silver]

Week 14

Total watch hour: x hours y minutes and z seconds

Final notes (and/or exam review)

Week 15

Project demo

Week 16

Reserved for final exam

Share on

Twitter Facebook LinkedIn