Recommender Systems Demystified
Let’s say you’re a young professional woman, aged 25–35, living in Washington, D.C. You occasionally like pictures of puppies and local brunch events. You and your friends are thinking about going to the Wiener 500 event this weekend, but haven’t decided yet (it’s a beer and Wiener dog racing event so it’s not really a hard decision). Now you pull up Instagram, and lo and behold, there is an ad for the Wiener 500! Instagram must be spying on you! How else would they know that you want to go?
There’s a lot of money to be made from data-driven advertising. Whether you know it or not, you’ve been the subject of an ongoing recommending system designed to get you to buy more or click that recommended movie. Ever wonder how Amazon ‘recommended for you’ page feels just a little too on the nose? Don’t feel too paranoid; Amazon and other companies are just very efficient at clumping you and others like you.
Recommender systems are doing their job at putting the content you want on your screen with a ‘buy now’ option just below it. In machine learning, there are two main recommender systems: content-based filtering and collaborative recommenders. Usually, companies will use a combination of both to learn overtime what you and others like you are willing to buy.
With this type of recommending system, the model will try to group items with similar features. These features can be literally anything. Let’s take movies for example. We can group movies with similar genres, actors, MPAA ratings, if it’s animated or not, or if the movie has a non-English speaking protagonist released before 1986. As long as we have the data, we can pick as specific or as vague of features we want to create as many clusters as we want.
After we’ve assigned numeric values the features of our data points, we can now compare these features. We’ll calculate their distances between the features and the closest items will become our recommendations. All math, no spying.
Collaborative recommender works similarly to content-based filtering by calculating distance. But this time we’re creating recommendations from users with similar likes, ratings, or purchases.
Let’s refer back to the Wiener 500 event from Instagram. How on earth would Instagram know that you’d be interested in going? Let’s break it down. Instagram would know your demographics. They would also know your friends on Instagram. They would also know your likes.
So if we put all that together, Instagram can cluster you with similar users. When other users like you bought tickets for the Wiener 500 for this upcoming weekend, Instagram figured out that you’d be interested. It’s not that creepy, it’s just math. People who are like you will most likely like the things you like.
Movie Recommender: Content-based
So let’s take a simple, real-life example. Let’s say we have a selection of 4 movies: The Incredibles, Ratatouille, Avengers: Endgame, and Chef. Now we can take as many features from these movies. In this case, we’ll take 5 binary features: whether the movie is animated or not, if it is about superheroes, if there is cooking involved, if it has the actor Jon Favreau starring in it, and if it contains characters that are related to each other. These features will be the content specific for each movie that we will compare with each other.
Next, we can start filling in each of their categories. Usually, whenever you are using distance, you’ll want to normalize your variables so they will be on the same scale. Since we’re using binary features (0 to represent ‘No’ and 1 to represent ‘Yes’) and they already on the same scale, we won’t have to normalize our results.
We can take a look specifically at The Incredibles. We can then measure the similarity that The Incredibles shares with each movie. We compare each feature with every other movie and give scores based on their similarity. We can then calculate the scoring distance from each movie: every value that differs, we increase the scoring distance by 1. The shorter the distance, the more similar the products.
In our example, The Incredibles has the shortest scoring distance to The Avengers (based on our features). We could recommend to someone that liked The Incredibles that they would most likely enjoy The Avengers. Ratatouille would also be a good recommendation since their scoring distance is 3.
Movie Recommender: Collaborative
So now let’s extend this example to collaborative recommenders. We have a group of people that have given their ratings on movies. Sade also has given her ratings but hasn’t seen the movie Chef. Will she like it?
Using our other user’s ratings, we can eliminate users that do not share the same tastes as Sade. Since Nick and Temple do not have similar movie ratings, their recommendations on Chef will not be considered. On the flip side, Adi and Russ are very similar to Sade, so we can use their ratings to predict if Sade will like Chef.
We then can recommend to Sade that she will like Chef. If she watches Chef and ends up not liking it, we can then update our recommender system and group her with users with similar tastes. We can repeat this process with more movies and more users until we have built a pretty decent recommender system.
Recommenders and beyond!
This is just the tip of the iceberg when it comes to recommender systems. Most companies use a hybrid of content-based and collaborative data. As the products and users grow, your database is basically a gold mine. So the next you feel creeped out with a relevant ad, just remind yourself that it’s just a bunch of math.