When I saw that Kaggle had a challenge for a recommendation engine, which I have some interest in since Anime_Rec. I knew I was going to take part.
From my previous post, I discuss the importance of research. Besides all of the recommendation engine architecture I learned when doing anime rec, the research from other Kaggle users was key.
In short, the order of this data matters: this data is sequential in time, which means that not only is the song year an important feature, but the index location is an important feature.
The current sourcecode and readme can be found here.
It’s worth noting that I don’t personally consider this a recommendation system: This is a classification problem (binary yes no to a song being played). This is certainly and import classification problem: I don’t understand KKbox’s business model well enough to explain how they negotiate for streaming rights, but the predictions from a model like this help them understand how much they can afford to pay for each song.
However, because of how this Kaggle is being scored, there is no value being placed on recommending songs that a user is unlikely to discover on their own. Other kaggle users have been getting very good scores simply taking averages across the dataset (if lots of people listened to a song last month, a lot will listen to it this month). which is certainly valid from the business case of ‘How much should we pay for the license to this song’, but isn’t a personalized recommendation.