Collaborative Filtering with FastAI

fastai
Author

Sean Dokko

Published

January 11, 2023

Here is a little boilerplate code for collaborative filtering with FastAI in just a few lines of code!

::: {.cell _kg_hide-input=‘true’ _kg_hide-output=‘true’ execution=‘{“iopub.execute_input”:“2023-01-12T05:15:34.597707Z”,“iopub.status.busy”:“2023-01-12T05:15:34.596398Z”,“iopub.status.idle”:“2023-01-12T05:15:34.605714Z”,“shell.execute_reply”:“2023-01-12T05:15:34.604205Z”,“shell.execute_reply.started”:“2023-01-12T05:15:34.597652Z”}’ papermill=‘{“duration”:4.977321,“end_time”:“2022-08-08T09:11:57.701696”,“exception”:false,“start_time”:“2022-08-08T09:11:52.724375”,“status”:“completed”}’ tags=‘[]’ trusted=‘true’ execution_count=23}

import pandas as pd
from datetime import datetime
from fastai import *
from fastai.collab import *
from fastai.tabular import *
from pathlib import Path
pd.set_option('mode.chained_assignment', 'warn')

import warnings
warnings.filterwarnings('ignore')

:::

Data Prep Step

base_path = Path('/kaggle/input/the-movies-dataset')

credits = pd.read_csv(base_path/'credits.csv')
keywords = pd.read_csv(base_path/'keywords.csv')
movies = pd.read_csv(base_path/'movies_metadata.csv').\
                     drop(['belongs_to_collection', 'homepage', 'imdb_id', 'poster_path', 'status', 'title', 'video'], axis=1).\
                     drop([19730, 29503, 35587]) # Incorrect data type

movies['id'] = movies['id'].astype('int64')

df = movies.merge(keywords, on='id').\
    merge(credits, on='id')

df['original_language'] = df['original_language'].fillna('')
df['runtime'] = df['runtime'].fillna(0)
df['tagline'] = df['tagline'].fillna('')

df.dropna(inplace=True)

ratings_df = pd.read_csv(base_path/'ratings_small.csv')

ratings_df['date'] = ratings_df['timestamp'].apply(lambda x: datetime.fromtimestamp(x))
ratings_df.drop('timestamp', axis=1, inplace=True)

ratings_df = ratings_df.merge(df[['id', 'original_title', 'genres', 'overview']], left_on='movieId',right_on='id', how='left')
ratings_df = ratings_df[~ratings_df['id'].isna()]
ratings_df.drop('id', axis=1, inplace=True)
ratings_df.reset_index(drop=True, inplace=True)

ratings_df.head()

movies_df = df[['id', 'original_title']]
movies_df.rename(columns={'id':'movieId'}, inplace=True)
ratings_df.merge(movies_df)
userId movieId rating date original_title genres overview
0 1 1371 2.5 2009-12-14 02:52:15 Rocky III [{'id': 18, 'name': 'Drama'}] Now the world champion, Rocky Balboa is living in luxury and only fighting opponents who pose no threat to him in the ring. His lifestyle of wealth and idleness is shaken when a powerful young fighter known as Clubber Lang challenges him to a bout. After taking a pounding from Lang, the humbled champ turns to former bitter rival Apollo Creed to help him regain his form for a rematch with Lang.
1 4 1371 4.0 2000-02-06 04:11:42 Rocky III [{'id': 18, 'name': 'Drama'}] Now the world champion, Rocky Balboa is living in luxury and only fighting opponents who pose no threat to him in the ring. His lifestyle of wealth and idleness is shaken when a powerful young fighter known as Clubber Lang challenges him to a bout. After taking a pounding from Lang, the humbled champ turns to former bitter rival Apollo Creed to help him regain his form for a rematch with Lang.
2 7 1371 3.0 1996-12-29 14:19:20 Rocky III [{'id': 18, 'name': 'Drama'}] Now the world champion, Rocky Balboa is living in luxury and only fighting opponents who pose no threat to him in the ring. His lifestyle of wealth and idleness is shaken when a powerful young fighter known as Clubber Lang challenges him to a bout. After taking a pounding from Lang, the humbled champ turns to former bitter rival Apollo Creed to help him regain his form for a rematch with Lang.
3 19 1371 4.0 1997-02-06 01:43:24 Rocky III [{'id': 18, 'name': 'Drama'}] Now the world champion, Rocky Balboa is living in luxury and only fighting opponents who pose no threat to him in the ring. His lifestyle of wealth and idleness is shaken when a powerful young fighter known as Clubber Lang challenges him to a bout. After taking a pounding from Lang, the humbled champ turns to former bitter rival Apollo Creed to help him regain his form for a rematch with Lang.
4 21 1371 3.0 1997-01-21 13:11:03 Rocky III [{'id': 18, 'name': 'Drama'}] Now the world champion, Rocky Balboa is living in luxury and only fighting opponents who pose no threat to him in the ring. His lifestyle of wealth and idleness is shaken when a powerful young fighter known as Clubber Lang challenges him to a bout. After taking a pounding from Lang, the humbled champ turns to former bitter rival Apollo Creed to help him regain his form for a rematch with Lang.
... ... ... ... ... ... ... ...
45184 652 129009 4.0 2015-09-19 19:27:07 Love Is a Ball [{'id': 35, 'name': 'Comedy'}, {'id': 10749, 'name': 'Romance'}] Etienne makes a good living out of marrying off poor but titled young men to rich but untitled young ladies. Millicent is now in his sights on the Riviera, and Grand Duke Gaspar is the bait. But what if Millicent starts to fancy planted chauffeur John instead, and Gaspar takes a shine to Etienne's secretary Janine?
45185 653 2103 3.0 2000-01-18 02:04:26 Solaris [{'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}, {'id': 9648, 'name': 'Mystery'}, {'id': 10749, 'name': 'Romance'}] Upon arrival at the space station orbiting an ocean world called Solaris a psychologist discovers that the commander of an expedition to the planet has died mysteriously. Other strange events soon start happening as well, such as the appearance of old acquaintances of the crew, including some who are dead.
45186 659 167 4.0 1996-06-30 12:25:50 K-PAX [{'id': 18, 'name': 'Drama'}, {'id': 878, 'name': 'Science Fiction'}] Prot is a patient at a mental hospital who claims to be from a far away Planet. His psychiatrist tries to help him, only to begin to doubt his own explanations.
45187 659 563 3.0 1996-06-13 19:29:47 Starship Troopers [{'id': 12, 'name': 'Adventure'}, {'id': 28, 'name': 'Action'}, {'id': 53, 'name': 'Thriller'}, {'id': 878, 'name': 'Science Fiction'}] Set in the future, the story follows a young soldier named Johnny Rico and his exploits in the Mobile Infantry. Rico's military career progresses from recruit to non-commissioned officer and finally to officer against the backdrop of an interstellar war between mankind and an arachnoid species known as "the Bugs".
45188 665 129 3.0 2001-07-15 21:28:48 千と千尋の神隠し [{'id': 14, 'name': 'Fantasy'}, {'id': 12, 'name': 'Adventure'}, {'id': 16, 'name': 'Animation'}, {'id': 10751, 'name': 'Family'}] A ten year old girl who wanders away from her parents along a path that leads to a world ruled by strange and unusual monster-like animals. Her parents have been changed into pigs along with others inside a bathhouse full of these creatures. Will she ever see the world how it once was?

45189 rows × 7 columns

DataLoader Creation Step

dls = CollabDataLoaders.from_df(ratings_df, user_name='userId', item_name='original_title', rating_name='rating', bs=64)
dls.show_batch()
userId original_title rating
0 346 The Getaway 1.0
1 615 Yesterday 3.5
2 19 Jaws: The Revenge 1.0
3 452 The Aviator 3.0
4 102 Houseboat 2.0
5 294 Confessions of a Dangerous Mind 4.0
6 357 Pirates of the Caribbean: Dead Man's Chest 5.0
7 518 Stuck on You 4.0
8 279 Crustacés et coquillages 3.0
9 115 バトル・ロワイアル 4.5

Model Creation/Training Step

learn = collab_learner(dls, n_factors=50, y_range=(0, 5.5))
learn.fit_one_cycle(5, 5e-3, wd=.1)
epoch train_loss valid_loss time
0 0.930709 0.902386 00:05
1 0.786654 0.789312 00:05
2 0.572339 0.765915 00:05
3 0.408200 0.764980 00:06
4 0.322286 0.766817 00:05
learn.model
EmbeddingDotBias(
  (u_weight): Embedding(672, 50)
  (i_weight): Embedding(2777, 50)
  (u_bias): Embedding(672, 1)
  (i_bias): Embedding(2777, 1)
)

Prediction

movie_names = list(movies_df.drop_duplicates(subset='movieId', keep='first').original_title)
size = len(movie_names)

def predict_top_movies(userId, count=5):
  query = { 'userId': [userId] * size, 'original_title': movie_names }
  query_df = pd.DataFrame(data=query)
  query_dl = dls.test_dl(query_df)
  preds, y = learn.get_preds(dl=query_dl)
  results = sorted(zip(preds, movie_names), reverse=True)[:count]
  for idx, (score, name) in enumerate(results):
    print("Score: ", round(float(score), 2), " for movie: ", name)

def predict_user_rating(userId, movieName):
  query = { 'userId': [userId] * size, 'original_title': [movieName] * size}
  query_df = pd.DataFrame(data=query)
  query_dl = dls.test_dl(query_df)
  preds, y = learn.get_preds(dl=query_dl)
  results = sorted(zip(preds, [movieName]), reverse=True)
  for idx, (score, name) in enumerate(results):
    print("Score: ", round(float(score), 2), " for movie: ", movieName)

predict_top_movies(123)
predict_user_rating(123, 'Minions')
Score:  4.83  for movie:  Sleepless in Seattle
Score:  4.64  for movie:  Laura
Score:  4.6  for movie:  Galaxy Quest
Score:  4.58  for movie:  Men in Black II
Score:  4.56  for movie:  Lonely Hearts
Score:  3.29  for movie:  Minions