This repository contains two reinforcement learning projects:
- Treasure Hunt in the Frozen Lake (Dynamic Programming)
- Optimizing Movie Recommendations Using Multi-Armed Bandits
This project implements a reinforcement learning agent using dynamic programming to solve a modified version of the FrozenLake environment. The agent navigates a slippery 5x5 grid to collect treasures while avoiding holes, with the ultimate goal of reaching the exit.
- Grid: 5x5 FrozenLake with the following tile types:
- Start (S): Initial position, safe to step
- Frozen Tiles (F): Safe to step
- Hole (H): Falling ends the game with -10 reward
- Goal (G): Exit point, ends the game with +10 reward
- Treasure Tiles (T): Awards +5 reward and converts to a frozen tile after collection
- Current position of the agent (row, column)
- Boolean flags indicating whether each treasure has been collected
- Four possible moves: up, down, left, right
- Goal (G): +10
- Treasure (T): +5 per treasure
- Hole (H): -10
- Frozen tiles (F): 0
- Custom environment created by modifying the existing "FrozenLakeNotSlippery-v0" in OpenAI Gym
- Dynamic programming using value iteration and policy improvement
- Calculation of state-value function (V*) for each state
- Comparison of agent performance with and without treasures
- Visualization of the agent's direction on the map using the learned policy
- The agent successfully learns to navigate the environment and collect treasures
- Trade-offs exist between risk-taking (collecting treasures) and safety (avoiding holes)
- The optimal policy balances reward maximization with risk minimization
This project implements a movie recommendation system using Multi-Armed Bandit (MAB) algorithms to maximize cumulative user satisfaction. The system dynamically allocates recommendations by learning user preferences in real-time, balancing exploration and exploitation.
A movie streaming platform (TrendMovie Inc.) aims to optimize its recommendation strategy to deliver maximum user satisfaction while maintaining high engagement. Each movie recommendation is treated as an interaction with the user, and their feedback is used to refine the recommendation strategy dynamically.
The dataset contains user ratings for various movies, including:
- User ID: A unique identifier for each user
- Movie ID: A unique identifier for each movie
- Rating: A score provided by the user for a movie (on a scale of 1 to 5)
- Timestamp: The time when the rating was given
- Random Policy: Randomly selects movies without considering past performance
- Greedy Policy: Always selects the movie with the highest estimated reward
- Epsilon-Greedy Policy: Selects the best movie with probability 1-ε, and a random movie with probability ε
- Implemented with ε values of 0.1, 0.2, and 0.5
- Upper Confidence Bound (UCB): Selects movies based on their potential reward, considering uncertainty
- Cumulative reward over time
- Average reward per recommendation
- Exploration vs. exploitation balance
- Epsilon-Greedy (ε=0.1) performed best, achieving the highest cumulative reward
- The performance ranking was: Epsilon-Greedy (ε=0.1) > Epsilon-Greedy (ε=0.2) > UCB > Random Policy > Epsilon-Greedy (ε=0.5)
- Low exploration rates (ε=0.1) provided the optimal balance between exploiting known preferences and discovering new options
- Excessive exploration (ε=0.5) undermined performance by making too many random selections
- The results demonstrate that in this environment, user preferences are better leveraged by exploiting known patterns rather than extensive exploration
- Python 3.8 or higher
- Required packages listed in requirements.txt
-
Clone the repository: -git clone https://github.com/Saurabhjalendra/Treasure-Hunt-in-the-Frozen-Lake-and-Optimizing-Movie-Recommendations-Using-Multi-Armed-Bandits.git
-
Install dependencies: -pip install -r requirements.txt
-
Run the Jupyter notebooks:
FrozenLake_using_Dynamic_programming_Final_version.ipynb
: Implementation of the Treasure Hunt in FrozenLakeMAB_Assignment_1 1.ipynb
: Implementation of the Multi-Armed Bandit algorithms for movie recommendations
- [Saurabh Jalendra]
- [Tushar Shandilya]
- [Monica Malik]
- [Reddy Balaji C]