Home > Teaching > CS 728: Theory of Multi armed bandits

CS 728: Theory of Multi armed bandits

Credits: 3-0-0-0 [9] 

Prerequisites: Knowledge of  discrete probability

Course Description

In many real-world scenarios, decision-makers face environments where the best action is unknown, and they must make choices under uncertainty. The challenge lies in balancing exploration—learning which actions yield the highest rewards—and exploitation—leveraging known actions to maximize reward. This tension, known as the exploration-exploitation trade-off, is central to the theory of multi-armed bandits, a framework that has been studied for nearly a century. These algorithms are widely applicable in areas such as online advertising, recommendation systems, auctions, network routing, e-commerce, and any scenario where information is collected progressively over time. In this course, we will study a variety of bandit algorithms designed for different random environments to maximize rewards over the duration of play. The course will cover both foundational concepts and the latest research developments. It will serve as a complement to courses on online convex optimization and reinforcement learning.

 

Course Contents

Stochastic Bandits - ETC, Successive Elimination, UCB algorithms. Overview of Information Theory and worst case and instance dependent lower bounds.

• Adversarial Bandits - Hedge, EXP3, EXP4 algorithms.

• Online Convex optimization, Follow the Regularized Leader algorithm.

• Contextual Bandits, Pure Exploration.

• Miscellaneous topics: Bandits and Games, Reinforcement Learning,  Combinatorial Bandits, Thompson Sampling.

 

Reference books

1. Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.

2. Slivkins, Aleksandrs. "Introduction to multi-armed bandits." Foundations and Trends® in Machine Learning 12.1-2 (2019): 1-286.