Loading…

Sam Kenkel

Data Science, Machine Learning, DevOps, CCNA, ACSR
Learn More

Kaggle:KKBox music recs

When I saw that Kaggle had a challenge for a recommendation engine, which I have some interest in since Anime_Rec.  I knew I was going to take part. From my previous post, I discuss the importance of research.  Besides all of the recommendation engine architecture I learned when doing anime rec, the research from other Kaggle users was key. In short, the order of this data matters: this data is sequential in time, which means that not only is the song year an important feature, but the index location is an important feature. The current […]

Kaggle:Port_Seguro

Like everyone with an interest in Data Science, I use the Data Science learning/competition site Kaggle. Recently, I’ve been working on a competition, (The Port Seguro challenge), which is both a frustrating competition to take part in,  but is a great chance to go through an overview of my standard kaggle process, as well as a great opportunity to dig into some of the limitations of Kaggle for data science projects. The current sourcecode and readme can be found here. This is a good project to talk about high level, how I approach kaggle projects […]

Anime_Rec4: Predicting User Scores with Neural Nets

Part 1 of this series explained why I was making an Anime recommendation system, gave a brief overview of the approach I was taking. Part 2 explained how I got my Data. Part 3 explained  how I tuned my 3 Item-Item similarity models to generate ‘possible’ recs. In this part, I’ll talk about predicting user scores with neural nets. Why 3 Neural Nets: Ensembling by targeting different scores: There are 3 pre-trained neural nets. Each net has been trained to predict one type of score: Score, User Scaled Score, Anime Scaled score. The Nets are loaded […]

Anime_Rec3: Generating possible recommendations (Cosine Similarity methods)

The intro to this series explained why I was making an Anime recommendation system, part 1 gave a brief overview of the approach I was taking. Part 2 explained how I got my Data. In this part I will explore how I tuned my three different methods determining Item-Item similarity. Method 1: Item-Item similarity based on user scores. Anime_Score_Sim in my github shows the code for this. First all 0’s (or statuses without a score) are dropped.  Next, I find the average score for each user, and subtract that score from every user’s score. This is to […]

Anime_Rec2: Data Collection, EDA

The intro to these posts explained why I was making an Anime recommendation system, and Part 1  gave a brief overview of the approach I was taking.  In the next part I will start to explore how I tuned my Item-Item similarity models. Before diving into that I wanted to go through my data collection process, and initial analysis that helped guide me in this process. Even though every project like this starts with the data, and as  Data is the New Oil it’s always worth going past the platitudes to figure out where my data came from, […]

Setting up an Nvidia-Docker workstation for DataScience/DeepLearning

After deciding, in my previous post, to switch my z620 to an Nvidia-Docker workstation, I wanted to give a writeup of how exactly I did that, because some of the specific technical steps (such as disabling a graphics card in bios to install the nvidia driver) aren’t all documented in one place. Part 1: HW Setup First I open up my z620, and remove the quad port NICs that I’m no longer going to use.  The z620 has two ‘compartments’ inside of the case: the pci-express ports sit on one side of a partition, and […]

Designing a DeepLearning Homelab: Cloud vs Virtualization vs Docker

As a Data Scientist coming from the Networking and DevOps world, I’m a firm believer in the Homelab philosophy: The best way to learn things is to experiment and build within a lab environment. This was crucial to me getting my CCNA, and how I learned virtualization as well. Now that I’m transitioning into Data-Science and Machine learning, I recently updated my lab. In this post I explain the specific HW and SW setup I followed to convert my z620 in an Nvidia-Docker workstation. I wanted to go through my thought process in how I decided […]

Lol_Scout 3: Final Modelling and Results

Background:Summary of the previous posts. In the 5 v 5 Videogame/Esport two teams of 5 players compete against each other. I have gathered Data using the API from riot games. I’m trying to use machine learning to predict wins or losses based on the characters (Champions) that players choose, and those player’s skill/ practice with those champions. This is the 3rd of 3 blog posts about my process and discoveries working with data from Riot’s online game, League of Legends.  The code I wrote for initial sanity check modelling work can be found here. The feature […]

Lol_Scout 2: Feature Engineering, initial Modelling

This is the 2nd of 3 blog posts about my process and discoveries working with data from Riot’s online game, League of Legends.   This post is a technical writeup of the code I used for my initial ‘baseline’ modeling, and my Data Preparation (and imputation) code. The code I wrote for initial sanity check modelling work can be found here. The feature engineering/Data Prep code is here. The code for my ‘final’ models is here. Background:Summary of the previous post In the 5 v 5 Videogame/Esport two teams of 5 players compete against each other. […]

Lol_Scout 1: Data Collection

This is the 1st of 3 blog posts about my process and discoveries working with data from Riot’s online game, League of Legends.  The code I wrote to do this can be found here. Background:Project Purpose In the 5 v 5 Videogame/Esport two teams of 5 players compete against each other. Before the ‘Game’ start each player chooses 1 of (currently 133) characters, known as a champion. No two players may play the same champions. In ranked and professional play, players may “ban” a champion and prevent either side from choosing that champion. There are […]