CSCI-4800/5800: AI with Reinforcement Learning

Graduate course, SCIENCE-1067 (Tue only) + Online async, 2023

We now live in an era of Artificial Intelligence (AI) where we rely on responses as well as actions by numerous autonomous systems that are crisscrossed in our daily lives. These systems are powered by AI that learn to provide us with reasonable answers for us with respect to our respective perspectives. Reinforcement learning is one of the most advanced and powerful way of developing such systems and are very much in line with the learning paradigms used to make us knowledgeable since our childhood, which is to learn from our mistakes. In this course, students are going to get a solid foundation in the field of reinforcement learning, learn the core challenges, and ideas to bring in newer approaches to make the systems robust, and more humanoid, and better to some degree. Through a combination of lectures, programming assignments students are expected to receive a hands-on-experience in exploring this field effectively. In addition, through the final project in this course, students will advance their understanding of reinforcement learning paradigm and are going to be able to design, develop and demonstrate by the end of the semester smart competitive players in video games, autonomous chatbots, autonomous vehicle control systems, early detection of malicious activities in the communication networks in the field of cybersecurity, and so on.

Course objectives

By the end of the course you are expected to gain the following skills:

  1. learn key ideas and algorithms of reinforcement learning – a more powerful paradigm in the field of machine learning.
  2. be able to understand where reinforcement learning algorithms fit to solve problems.
  3. apply reinforcement learning algorithms to solve various practical problems.

Prerequisites

  1. The graduate standing.

Recommended Textbooks

  1. Winder, Phil (2020). Reinforcement learning – industrial application of intelligent agents. O’Reilly Media. [ official-web ]
  2. Sutton Richard S. and Barto Andrew G. (2018) Reinforcement Learning: An Introduction. MIT Press [ official-web], [PDF]

Topics planned to be covered

  1. Introduction to Reinforcement Learning
  2. Introduction to Amazon AWS Deepracer
  3. Policy based and value based learning algorithms
  4. Monte-carlo methods to learn
  5. Deep Learning introduction
  6. Comparative analysis of 3 reinforcement learning paradigms: DP, MC, TD
  7. The SARSA algorithm
  8. A non-tabular (e.g., approximation) approach in RL
  9. Continous action space
  10. Actor-critic
  11. Value function approximation
  12. On Temporal differende learning – TD(Lamba)

Schedule

Week 1

Total watch hour: x hours y minutes and z seconds

  1. Introduction to Reinforcement Learning
  2. Course Logistics & expectations
  3. Reinforcement Learning applications (a higher-level overview)
  4. Introduction to Reinforcement Learning with OpenAI-Gym [ Notebook/Slide ] [ Github-Repo ] [ Video-Recording, 37:11 ]
  5. Working with OpenAI-Gym environments [Notebook/Slides: ALE/Breakout, Blackjack, CarRacing, ALE/Pong, ALE/Riverraid ][ Github-Repo ] [ Video-Recording, 16:00 ]
  6. Making of an Intelligent CartPole agent [Notebook/Slides: Random CartPole Agent, Q-learning CartPole Agent][ Github-Repo ] [ Video-Recording, 48:27]
  7. Non-gym environment and Reinforcement learning from scratch [Notebook/Slides: Goal-vs-Hole-v0, Goal-vs-Hole-v1][ Github-Repo ] [ Video-Recording, 35:09 ]

Week 2

Total watch hour: x hours y minutes and z seconds

  1. Multi-bandit problem [Notebook+slides: ashiskb@github link] [A supporting video lecture by Connor Shorten] [Reference study: SB-2]
  2. Markov Decision Process (MDP), Dynamic Programming for action selection intro [ Slides] [Video-Lecture by Dr. B] [Reference study: SB-3, 4]

Week 3

Total watch hour: x hours y minutes and z seconds

  1. More into MDP and Dynamic Programming,

Week 4

Total watch hour: x hours y minutes and z seconds

  1. Temporal Difference Learning
  2. Q-learning
  3. n-Step algorithms
  4. Monte-Carlo Methods

Week 5

Total watch hour: x hours y minutes and z seconds

  1. Introduction to AWS Deepracer in the ML lab (LSC-822) a. Manufacturer website b. Getting started with Deepracer c. Deepracer developer guide

Week 6

Total watch hour: x hours y minutes and z seconds

  1. Value function with all 3 algorithms: Notebook

    a. Dynamic Programming (DP) b. Monte-Carlo (MC) c. Temporal Difference Learning (TD), particularly, TD(0) that is known as the one-step temporal difference learning.

Week 7

Total watch hour: x hours y minutes and z seconds

  1. Deep Q-networks
  2. Policy Gradient Methods
  3. Beyond Policy Gradients

Week 8

  • Reserved for midterm

Week 9

Total watch hour: x hours y minutes and z seconds

  1. Reinforcement Learning in action

Week 10

  • SPRING BREAK!!! No classes scheduled.

Week 11 (3/28/2023)

Total watch hour: x hours y minutes and z seconds

  1. Model-based planning with Dynamic programming [Slides] [Video Lecture by David Silver]
  2. Model-free prediction – An introduction to Monte-Carlo Learning and Temporal Difference Learning [Slides] [Video Lecture by David Silver]
  3. Model-free control – Dives into On Policy Monte-Carlo Control and Temporal Difference Learning, as well as Off-Policy Learning. [Slides] [Video Lecture by David Silver]

Week 12 (4/4/2023)

Total watch hour: x hours y minutes and z seconds

  1. Value function approximation – A deep dive into incremental methods and batch methods of value function approximation. [Slides] [Video Lecture by David Silver]
  2. Policy gradient methods – Looks at different policy gradients, including Finite Difference, Monte-Carlo and Actor Critic. [Slides] [Video Lecture by David Silver]

Week 13

Total watch hour: x hours y minutes and z seconds

  1. Integrating learning and planning – Introduces model-based RL, along with integrated architectures and simulation based search. [Slides] [Video Lecture by David Silver]
  2. Exploration and exploitation [Slides] [Video Lecture by David Silver]

Week 14

Total watch hour: x hours y minutes and z seconds

  • Final notes (and/or exam review)

Week 15

  1. Project demo

Week 16

  1. Reserved for final exam