Movie Distance Problem¶
You're building a Netflix clone . You have a dataset of movie reviews, where each review is a (
user_id
, movie_id
, rating
) triplet.
movie_ids
are integers in the range[0, Nmovies)
user_ids
are integers in the range[0, Nusers)
ratings
are integers in the range[1, 5]
import random
Nmovies = 10
Nusers = 10
Nreviews = 30
movie_ids = random.choices(range(Nmovies), k=Nreviews)
user_ids = random.choices(range(Nusers), k=Nreviews)
ratings = random.choices(range(1,6), k=Nreviews)
print(movie_ids)
# [4, 9, 8, 7, 3, ... ]
print(user_ids)
# [1, 7, 4, 1, 2, ... ]
print(ratings)
# [1, 3, 2, 1, 1, ... ]
- Build a compressed sparse matrix where (i,j) gives the ith person's review of movie j.
- Normalize the movie vectors (column vectors) so that each of them has unit length.
- Calculate the Euclidean distance between normalized movie 2 and normalized movie 4.
For example
if our Netflix clone had three users and two movies with a review matrix like this
[[1 0]
[0 1]
[3 0]]
The normalized movie vectors would be
[[0.32 0. ]
[0. 1. ]
[0.95 0. ]]
The Euclidean distance between these two normalized movie vectors is 1.41.