Reinforcement Learning Projects

This repository contains two reinforcement learning projects:

Treasure Hunt in the Frozen Lake (Dynamic Programming)
Optimizing Movie Recommendations Using Multi-Armed Bandits

Project 1: Treasure Hunt in the Frozen Lake

Overview

This project implements a reinforcement learning agent using dynamic programming to solve a modified version of the FrozenLake environment. The agent navigates a slippery 5x5 grid to collect treasures while avoiding holes, with the ultimate goal of reaching the exit.

Environment Description

Grid: 5x5 FrozenLake with the following tile types:
- Start (S): Initial position, safe to step
- Frozen Tiles (F): Safe to step
- Hole (H): Falling ends the game with -10 reward
- Goal (G): Exit point, ends the game with +10 reward
- Treasure Tiles (T): Awards +5 reward and converts to a frozen tile after collection

State Space

Current position of the agent (row, column)
Boolean flags indicating whether each treasure has been collected

Action Space

Four possible moves: up, down, left, right

Rewards

Goal (G): +10
Treasure (T): +5 per treasure
Hole (H): -10
Frozen tiles (F): 0

Implementation Details

Custom environment created by modifying the existing "FrozenLakeNotSlippery-v0" in OpenAI Gym
Dynamic programming using value iteration and policy improvement
Calculation of state-value function (V*) for each state
Comparison of agent performance with and without treasures
Visualization of the agent's direction on the map using the learned policy

Key Findings

The agent successfully learns to navigate the environment and collect treasures
Trade-offs exist between risk-taking (collecting treasures) and safety (avoiding holes)
The optimal policy balances reward maximization with risk minimization

Project 2: Optimizing Movie Recommendations Using Multi-Armed Bandits

Overview

This project implements a movie recommendation system using Multi-Armed Bandit (MAB) algorithms to maximize cumulative user satisfaction. The system dynamically allocates recommendations by learning user preferences in real-time, balancing exploration and exploitation.

Scenario

A movie streaming platform (TrendMovie Inc.) aims to optimize its recommendation strategy to deliver maximum user satisfaction while maintaining high engagement. Each movie recommendation is treated as an interaction with the user, and their feedback is used to refine the recommendation strategy dynamically.

Dataset

The dataset contains user ratings for various movies, including:

User ID: A unique identifier for each user
Movie ID: A unique identifier for each movie
Rating: A score provided by the user for a movie (on a scale of 1 to 5)
Timestamp: The time when the rating was given

Implemented Algorithms

Random Policy: Randomly selects movies without considering past performance
Greedy Policy: Always selects the movie with the highest estimated reward
Epsilon-Greedy Policy: Selects the best movie with probability 1-ε, and a random movie with probability ε
- Implemented with ε values of 0.1, 0.2, and 0.5
Upper Confidence Bound (UCB): Selects movies based on their potential reward, considering uncertainty

Performance Metrics

Cumulative reward over time
Average reward per recommendation
Exploration vs. exploitation balance

Key Findings

Epsilon-Greedy (ε=0.1) performed best, achieving the highest cumulative reward
The performance ranking was: Epsilon-Greedy (ε=0.1) > Epsilon-Greedy (ε=0.2) > UCB > Random Policy > Epsilon-Greedy (ε=0.5)
Low exploration rates (ε=0.1) provided the optimal balance between exploiting known preferences and discovering new options
Excessive exploration (ε=0.5) undermined performance by making too many random selections
The results demonstrate that in this environment, user preferences are better leveraged by exploiting known patterns rather than extensive exploration

Installation and Usage

Prerequisites

Python 3.8 or higher
Required packages listed in requirements.txt

Setup

Clone the repository: -git clone https://github.com/Saurabhjalendra/Treasure-Hunt-in-the-Frozen-Lake-and-Optimizing-Movie-Recommendations-Using-Multi-Armed-Bandits.git
Install dependencies: -pip install -r requirements.txt
Run the Jupyter notebooks:

Files

FrozenLake_using_Dynamic_programming_Final_version.ipynb: Implementation of the Treasure Hunt in FrozenLake
MAB_Assignment_1 1.ipynb: Implementation of the Multi-Armed Bandit algorithms for movie recommendations

Contributors

[Saurabh Jalendra]
[Tushar Shandilya]
[Monica Malik]
[Reddy Balaji C]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reinforcement Learning Projects

Project 1: Treasure Hunt in the Frozen Lake

Overview

Environment Description

State Space

Action Space

Rewards

Implementation Details

Key Findings

Project 2: Optimizing Movie Recommendations Using Multi-Armed Bandits

Overview

Scenario

Dataset

Implemented Algorithms

Performance Metrics

Key Findings

Installation and Usage

Prerequisites

Setup

Files

Contributors

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reinforcement Learning Projects

Project 1: Treasure Hunt in the Frozen Lake

Overview

Environment Description

State Space

Action Space

Rewards

Implementation Details

Key Findings

Project 2: Optimizing Movie Recommendations Using Multi-Armed Bandits

Overview

Scenario

Dataset

Implemented Algorithms

Performance Metrics

Key Findings

Installation and Usage

Prerequisites

Setup

Files

Contributors