Loading…

Sam Kenkel

Data Science, Machine Learning, DevOps, CCNA, ACSR
Learn More

Kaggle:KKBox music recs

When I saw that Kaggle had a challenge for a recommendation engine, which I have some interest in since Anime_Rec.  I knew I was going to take part. From my previous post, I discuss the importance of research.  Besides all of the recommendation engine architecture I learned when doing anime rec, the research from other Kaggle users was key. In short, the order of this data matters: this data is sequential in time, which means that not only is the song year an important feature, but the index location is an important feature. The current […]

Kaggle:Port_Seguro

Like everyone with an interest in Data Science, I use the Data Science learning/competition site Kaggle. Recently, I’ve been working on a competition, (The Port Seguro challenge), which is both a frustrating competition to take part in,  but is a great chance to go through an overview of my standard kaggle process, as well as a great opportunity to dig into some of the limitations of Kaggle for data science projects. The current sourcecode and readme can be found here. This is a good project to talk about high level, how I approach kaggle projects […]

Anime_Rec4: Predicting User Scores with Neural Nets

Part 1 of this series explained why I was making an Anime recommendation system, gave a brief overview of the approach I was taking. Part 2 explained how I got my Data. Part 3 explained  how I tuned my 3 Item-Item similarity models to generate ‘possible’ recs. In this part, I’ll talk about predicting user scores with neural nets. Why 3 Neural Nets: Ensembling by targeting different scores: There are 3 pre-trained neural nets. Each net has been trained to predict one type of score: Score, User Scaled Score, Anime Scaled score. The Nets are loaded […]

Anime_Rec3: Generating possible recommendations (Cosine Similarity methods)

The intro to this series explained why I was making an Anime recommendation system, part 1 gave a brief overview of the approach I was taking. Part 2 explained how I got my Data. In this part I will explore how I tuned my three different methods determining Item-Item similarity. Method 1: Item-Item similarity based on user scores. Anime_Score_Sim in my github shows the code for this. First all 0’s (or statuses without a score) are dropped.  Next, I find the average score for each user, and subtract that score from every user’s score. This is to […]

Anime_Rec2: Data Collection, EDA

The intro to these posts explained why I was making an Anime recommendation system, and Part 1  gave a brief overview of the approach I was taking.  In the next part I will start to explore how I tuned my Item-Item similarity models. Before diving into that I wanted to go through my data collection process, and initial analysis that helped guide me in this process. Even though every project like this starts with the data, and as  Data is the New Oil it’s always worth going past the platitudes to figure out where my data came from, […]