GetRec'd is a board game recommendation engine based on data scraped from BoardGameGeek, which is kind of like a wiki for board games. They have a comprehensive collection of first-party data and community-based metrics around games in the database, which made it a great source for feature engineering. After a collecting data and cleaning it, a K-Nearest Neighbors model is used to identify similar games based on distance in the vector space.
See also: data collection Jupyter Notebook
Data scraped from BGG based on their top rankings pages. Scraped from 3 source pages over 2+ days.
Tools:
- pandas
- requests
- BeautifulSoup
- json
- re
See also: preprocessing and modelling Jupyter Notebook
Data collected is cleaned (NaN values filled, datatypes converted), some feature engineering is done, and processed data is then fed to the model.
Tools:
- pandas
- NumPy
- scikit-learn
- K-Nearest Neighbors
- Binarization
- TF-IDF
- K-Means
- PCA
See also: the live site migration in process!
Finally, I built a simple static site to accept input games and produce recommendations. It uses a Bootstrap-based frontend with a Flask backend. Users are able to tweak their recommendations based on number of players and complexity (as a function of BGG's user-generated "weight" score for games). Uses fuzzy matching based on Levenshtein distance to interpret queries.
Tools:
- Flask
- Bootstrap
- WTForms
- TheFuzz
- Heroku