The intro to this series explained why I was making an Anime recommendation system, part 1 gave a brief overview of the approach I was taking. Part 2 explained how I got my Data. In this part I will explore how I tuned my three different methods determining Item-Item similarity. Method 1: Item-Item similarity based on user scores. Anime_Score_Sim in my github shows the code for this. First all 0’s (or statuses without a score) are dropped. Next, I find the average score for each user, and subtract that score from every user’s score. This is to […]
The intro to these posts explained why I was making an Anime recommendation system, and Part 1 gave a brief overview of the approach I was taking. In the next part I will start to explore how I tuned my Item-Item similarity models. Before diving into that I wanted to go through my data collection process, and initial analysis that helped guide me in this process. Even though every project like this starts with the data, and as Data is the New Oil it’s always worth going past the platitudes to figure out where my data came from, […]
As a Data Scientist coming from the Networking and DevOps world, I’m a firm believer in the Homelab philosophy: The best way to learn things is to experiment and build within a lab environment. This was crucial to me getting my CCNA, and how I learned virtualization as well. Now that I’m transitioning into Data-Science and Machine learning, I recently updated my lab. In this post I explain the specific HW and SW setup I followed to convert my z620 in an Nvidia-Docker workstation. I wanted to go through my thought process in how I decided […]
This is the 2nd of 3 blog posts about my process and discoveries working with data from Riot’s online game, League of Legends. This post is a technical writeup of the code I used for my initial ‘baseline’ modeling, and my Data Preparation (and imputation) code. The code I wrote for initial sanity check modelling work can be found here. The feature engineering/Data Prep code is here. The code for my ‘final’ models is here. Background:Summary of the previous post In the 5 v 5 Videogame/Esport two teams of 5 players compete against each other. […]
This is the 1st of 3 blog posts about my process and discoveries working with data from Riot’s online game, League of Legends. The code I wrote to do this can be found here. Background:Project Purpose In the 5 v 5 Videogame/Esport two teams of 5 players compete against each other. Before the ‘Game’ start each player chooses 1 of (currently 133) characters, known as a champion. No two players may play the same champions. In ranked and professional play, players may “ban” a champion and prevent either side from choosing that champion. There are […]
One of the most famous modern machine learning training datasets is to predict survival of passengers on the titanic. I used this project while experimenting with KNearestNeighbor classifiers, SVMs, LogReg, pipelines (and the problems with dummying categorical data in pandas), as well as the TFlearn front-end for Tensorflow. My sourcecode for that that project can be found here.
Anime_Rec is a Data Science project to generate Anime recommendations based on publicly available data from the website myanimelist.net. I’m an Anime fan. In fact, I watch enough Anime to have hit that point where finding something to watch becomes difficult. As an Anime fan and Data Scientist, the obvious solution was to build a Recommendation engine to recommend Anime for me to watch. This post explains my overall approach and architecture The first step in any machine learning or Data Science project is gathering the data, and thankfully for me, other Anime fans have done […]
After deciding, in my previous post, to switch my z620 to an Nvidia-Docker workstation, I wanted to give a writeup of how exactly I did that, because some of the specific technical steps (such as disabling a graphics card in bios to install the nvidia driver) aren’t all documented in one place. Part 1: HW Setup First I open up my z620, and remove the quad port NICs that I’m no longer going to use. The z620 has two ‘compartments’ inside of the case: the pci-express ports sit on one side of a partition, and […]
Lol_Scout was a Data Science/ Esports analytics project I pursued to answer questions about how League of Legends players champion choices can be optimized. The current soucecode and readme can be found here. Part 1: Background, Initial Design, Data Gathering, data acquisition lessons learned, ways to pursue the project further in the future. Part 2: Modelling with Neural Net Classifiers with just champion selection. Comparing to XGBoost. Adding Data from Player’s experience at a character. Part 3:Modelling with Neural Nets and Xgboost with more features, conclusions.