Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save DiegoHernanSalazar/c3e75380109143b508b588164f648ada to your computer and use it in GitHub Desktop.

Select an option

Save DiegoHernanSalazar/c3e75380109143b508b588164f648ada to your computer and use it in GitHub Desktop.
Stanford Online/ DeepLearning.AI. Unsupervised Learning, Recommenders Systems and Reinforcement Learning: Deep Learning for Content-Based Filtering.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "Lzk7iX_CodX6",
"tags": []
},
"source": [
"# <img align=\"left\" src=\"./images/film_strip_vertical.png\" style=\" width:40px; \" > Practice lab: Deep Learning for Content-Based Filtering\n",
"\n",
"In this exercise, you will implement content-based filtering using a neural network to build a recommender system for movies. \n",
"\n",
"\n",
"# Outline\n",
"- [ 1 - Packages ](#1)\n",
"- [ 2 - Movie ratings dataset ](#2)\n",
"- [ 3 - Content-based filtering with a neural network](#3)\n",
" - [ 3.1 Training Data](#3.1)\n",
" - [ 3.2 Preparing the training data](#3.2)\n",
"- [ 4 - Neural Network for content-based filtering](#4)\n",
" - [ Exercise 1](#ex01)\n",
"- [ 5 - Predictions](#5)\n",
" - [ 5.1 - Predictions for a new user](#5.1)\n",
" - [ 5.2 - Predictions for an existing user.](#5.2)\n",
" - [ 5.3 - Finding Similar Items](#5.3)\n",
" - [ Exercise 2](#ex02)\n",
"- [ 6 - Congratulations! ](#6)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"_**NOTE:** To prevent errors from the autograder, you are not allowed to edit or delete non-graded cells in this lab. Please also refrain from adding any new cells. \n",
"**Once you have passed this assignment** and want to experiment with any of the non-graded code, you may follow the instructions at the bottom of this notebook._"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"1\"></a>\n",
"## 1 - Packages <img align=\"left\" src=\"./images/movie_camera.png\" style=\" width:40px; \">\n",
"We will use familiar packages, NumPy, TensorFlow and helpful routines from [scikit-learn](https://scikit-learn.org/stable/). We will also use [tabulate](https://pypi.org/project/tabulate/) to neatly print tables and [Pandas](https://pandas.pydata.org/) to organize tabular data."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"deletable": false,
"id": "Xu-w_RmNwCV5"
},
"outputs": [],
"source": [
"import numpy as np # Get numpy 'np' constructor for arrays handling and numeric computations\n",
"import numpy.ma as ma # Allows the handling of masked data which include 'undefined' values\n",
"import pandas as pd # Import pandas 'pd' constructor for data analysis and lists manipulation\n",
"import tensorflow as tf # Import 'TensorFlow' library as 'tf' to construct Deep Learning models\n",
"from tensorflow import keras # Import 'keras' library from tensorflow library\n",
"\n",
"# 'StandardScaler' NORMALIZES variables -> (x - mu)/sigma\n",
"# 'MinMaxScaler' SCALES variables into new range -> [0-1] default\n",
"from sklearn.preprocessing import StandardScaler, MinMaxScaler\n",
"\n",
"from sklearn.model_selection import train_test_split # Split main dataset into 'train' set and 'test' set\n",
"import tabulate # 'Pretty-print' tabular data in Python\n",
"from recsysNN_utils import * # Contains ALL (*) helper functions\n",
"pd.set_option(\"display.precision\", 1) # 'Round pandas numbers' to just one (1) decimal point -> x.x"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"2\"></a>\n",
"## 2 - Movie ratings dataset <img align=\"left\" src=\"./images/film_rating.png\" style=\" width:40px;\" >\n",
"The data set is derived from the [MovieLens ml-latest-small](https://grouplens.org/datasets/movielens/latest/) dataset. \n",
"\n",
"[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>]\n",
"\n",
"The original dataset has roughly 9000 movies rated by 600 users with ratings on a scale of 0.5 to 5 in 0.5 step increments. The dataset has been reduced in size to focus on movies from the years since 2000 and popular genres. The reduced dataset has $n_u = 397$ users, $n_m= 847$ movies and 25521 ratings. For each movie, the dataset provides a movie title, release date, and one or more genres. For example \"Toy Story 3\" was released in 2010 and has several genres: \"Adventure|Animation|Children|Comedy|Fantasy\". This dataset contains little information about users other than their ratings. This dataset is used to create training vectors for the neural networks described below. \n",
"Let's learn a bit more about this data set. The table below shows the top 10 movies ranked / organized from MAX to MIN number of ratings, given by the USERS. These movies also happen to have high average ratings. How many of these movies have you watched? "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"deletable": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>movie id</th>\n",
" <th>num ratings</th>\n",
" <th>ave rating</th>\n",
" <th>title</th>\n",
" <th>genres</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4993</td>\n",
" <td>198</td>\n",
" <td>4.1</td>\n",
" <td>Lord of the Rings: The Fellowship of the Ring,...</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>5952</td>\n",
" <td>188</td>\n",
" <td>4.0</td>\n",
" <td>Lord of the Rings: The Two Towers, The</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>7153</td>\n",
" <td>185</td>\n",
" <td>4.1</td>\n",
" <td>Lord of the Rings: The Return of the King, The</td>\n",
" <td>Action|Adventure|Drama|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4306</td>\n",
" <td>170</td>\n",
" <td>3.9</td>\n",
" <td>Shrek</td>\n",
" <td>Adventure|Animation|Children|Comedy|Fantasy|Ro...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>58559</td>\n",
" <td>149</td>\n",
" <td>4.2</td>\n",
" <td>Dark Knight, The</td>\n",
" <td>Action|Crime|Drama</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>6539</td>\n",
" <td>149</td>\n",
" <td>3.8</td>\n",
" <td>Pirates of the Caribbean: The Curse of the Bla...</td>\n",
" <td>Action|Adventure|Comedy|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>79132</td>\n",
" <td>143</td>\n",
" <td>4.1</td>\n",
" <td>Inception</td>\n",
" <td>Action|Crime|Drama|Mystery|Sci-Fi|Thriller</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>6377</td>\n",
" <td>141</td>\n",
" <td>4.0</td>\n",
" <td>Finding Nemo</td>\n",
" <td>Adventure|Animation|Children|Comedy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4886</td>\n",
" <td>132</td>\n",
" <td>3.9</td>\n",
" <td>Monsters, Inc.</td>\n",
" <td>Adventure|Animation|Children|Comedy|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>7361</td>\n",
" <td>131</td>\n",
" <td>4.2</td>\n",
" <td>Eternal Sunshine of the Spotless Mind</td>\n",
" <td>Drama|Romance|Sci-Fi</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" movie id num ratings ave rating \\\n",
"0 4993 198 4.1 \n",
"1 5952 188 4.0 \n",
"2 7153 185 4.1 \n",
"3 4306 170 3.9 \n",
"4 58559 149 4.2 \n",
"5 6539 149 3.8 \n",
"6 79132 143 4.1 \n",
"7 6377 141 4.0 \n",
"8 4886 132 3.9 \n",
"9 7361 131 4.2 \n",
"\n",
" title \\\n",
"0 Lord of the Rings: The Fellowship of the Ring,... \n",
"1 Lord of the Rings: The Two Towers, The \n",
"2 Lord of the Rings: The Return of the King, The \n",
"3 Shrek \n",
"4 Dark Knight, The \n",
"5 Pirates of the Caribbean: The Curse of the Bla... \n",
"6 Inception \n",
"7 Finding Nemo \n",
"8 Monsters, Inc. \n",
"9 Eternal Sunshine of the Spotless Mind \n",
"\n",
" genres \n",
"0 Adventure|Fantasy \n",
"1 Adventure|Fantasy \n",
"2 Action|Adventure|Drama|Fantasy \n",
"3 Adventure|Animation|Children|Comedy|Fantasy|Ro... \n",
"4 Action|Crime|Drama \n",
"5 Action|Adventure|Comedy|Fantasy \n",
"6 Action|Crime|Drama|Mystery|Sci-Fi|Thriller \n",
"7 Adventure|Animation|Children|Comedy \n",
"8 Adventure|Animation|Children|Comedy|Fantasy \n",
"9 Drama|Romance|Sci-Fi "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Get pandas dataset / df / table organized by the first 10 HIGHEST RATED and RANKED MOVIES\n",
"top10_df = pd.read_csv(\"./data/content_top10_df.csv\")\n",
"\n",
"# Get pandas dataset / df of total number of ratings and ranking\n",
"# given by the users, organized by GENRE\n",
"bygenre_df = pd.read_csv(\"./data/content_bygenre_df.csv\")\n",
"\n",
"# Display pandas df / table of 1st top 10 HIGHEST RATED and RANKED MOVIES\n",
"top10_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next table shows information sorted by genre. The number of ratings per genre vary substantially. Note that a movie may have multiple genre's so the sum of the ratings below is larger than the number of original ratings."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"deletable": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>genre</th>\n",
" <th>num movies</th>\n",
" <th>ave rating/genre</th>\n",
" <th>ratings per genre</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Action</td>\n",
" <td>321</td>\n",
" <td>3.4</td>\n",
" <td>10377</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Adventure</td>\n",
" <td>234</td>\n",
" <td>3.4</td>\n",
" <td>8785</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Animation</td>\n",
" <td>76</td>\n",
" <td>3.6</td>\n",
" <td>2588</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Children</td>\n",
" <td>69</td>\n",
" <td>3.4</td>\n",
" <td>2472</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Comedy</td>\n",
" <td>326</td>\n",
" <td>3.4</td>\n",
" <td>8911</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Crime</td>\n",
" <td>139</td>\n",
" <td>3.5</td>\n",
" <td>4671</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Documentary</td>\n",
" <td>13</td>\n",
" <td>3.8</td>\n",
" <td>280</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>Drama</td>\n",
" <td>342</td>\n",
" <td>3.6</td>\n",
" <td>10201</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>Fantasy</td>\n",
" <td>124</td>\n",
" <td>3.4</td>\n",
" <td>4468</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Horror</td>\n",
" <td>56</td>\n",
" <td>3.2</td>\n",
" <td>1345</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Mystery</td>\n",
" <td>68</td>\n",
" <td>3.6</td>\n",
" <td>2497</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Romance</td>\n",
" <td>151</td>\n",
" <td>3.4</td>\n",
" <td>4468</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Sci-Fi</td>\n",
" <td>174</td>\n",
" <td>3.4</td>\n",
" <td>5894</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Thriller</td>\n",
" <td>245</td>\n",
" <td>3.4</td>\n",
" <td>7659</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" genre num movies ave rating/genre ratings per genre\n",
"0 Action 321 3.4 10377\n",
"1 Adventure 234 3.4 8785\n",
"2 Animation 76 3.6 2588\n",
"3 Children 69 3.4 2472\n",
"4 Comedy 326 3.4 8911\n",
"5 Crime 139 3.5 4671\n",
"6 Documentary 13 3.8 280\n",
"7 Drama 342 3.6 10201\n",
"8 Fantasy 124 3.4 4468\n",
"9 Horror 56 3.2 1345\n",
"10 Mystery 68 3.6 2497\n",
"11 Romance 151 3.4 4468\n",
"12 Sci-Fi 174 3.4 5894\n",
"13 Thriller 245 3.4 7659"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Display pandas dataset / df of total number of ratings and ranking \n",
"# given by the users, organized by GENRE \n",
"bygenre_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"3\"></a>\n",
"## 3 - Content-based filtering with a neural network\n",
"\n",
"In the collaborative filtering lab, you generated two vectors, a user vector w and an item/movie vector x whose dot (.) product would predict a rating -> y = w^(user j) . x^(movie i). The vectors were derived solely from the ratings y. \n",
"\n",
"Content-based filtering also generates a user and movie feature vector but recognizes there may be other information available about the user and/or movie that may improve the prediction. The additional information is provided to a neural network which then generates the user and movie vector as shown below.\n",
"<figure>\n",
" <center> <img src=\"./images/RecSysNN.png\" style=\"width:500px;height:280px;\" ></center>\n",
"</figure>\n",
"\n",
"<a name=\"3.1\"></a>\n",
"### 3.1 Training Data\n",
"The movie content provided to the network is a combination of the original data and some 'engineered features'. Recall the feature engineering discussion and lab from Course 1, Week 2, lab 4. The original features are the year the movie was released and the movie's genre's presented as a one-hot vector. There are 14 genres. The engineered feature is an average rating derived from the user ratings. \n",
"\n",
"The user content is composed of engineered features. A per genre average rating is computed per user. Additionally, a user id, rating count and rating average are available but not included in the training or prediction content. They are carried with the data set because they are useful in interpreting data.\n",
"\n",
"The training set consists of all the ratings made by the users in the data set. Some ratings are repeated to boost the number of training examples of underrepresented genre's. The training set is split into two arrays with the same number of entries, a user array and a movie/item array. \n",
"\n",
"Below, let's load and display some of the data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"deletable": false,
"id": "M5gfMLYgxCD1"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"USER train x_u^(j) size: (50884, 17) Selected Features at x_u^(j): 14\n",
"MOVIE train x_m^(i) size: (50884, 17) Selected Features at x_m^(i): 16\n",
"Number of training vectors/rows: 50884\n"
]
}
],
"source": [
"# Load Data, set configuration variables\n",
"item_train, user_train, y_train, item_features, user_features, item_vecs, movie_dict, user_to_genre = load_data()\n",
"\n",
"# remove 'userid', 'rating count' and 'ave rating' cols during training at user_train = x_u^(j) \n",
"# 17 features/cols - 3 = 14 features/cols\n",
"num_user_features = user_train.shape[1] - 3\n",
"print(\"USER train x_u^(j) size:\",user_train.shape,\"Selected Features at x_u^(j):\",num_user_features)\n",
"\n",
"# remove 'movie id' col at movie_train = x_m^(i) \n",
"# 17 features/cols - 1 = 16 features/cols\n",
"num_item_features = item_train.shape[1] - 1\n",
"print(\"MOVIE train x_m^(i) size:\",item_train.shape,\"Selected Features at x_m^(i):\",num_item_features)\n",
"\n",
"uvs = 3 # USER genre vector start (Jump elems 0 'userid', 1 'rating count', and 2 'ave rating')\n",
"ivs = 3 # ITEM / MOVIE genre vector start (Jump elems 0 'userid', 1 'rating count', and 2 'ave rating')\n",
"u_s = 3 # start of columns to use in training, USER (Jump cols 0 'userid', 1 'rating count', and 2 'ave rating')\n",
"i_s = 1 # start of columns to use in training, MOVIES (Jump cols 0 'userid', 1 'rating count', and 2 'ave rating')\n",
"\n",
"print(f\"Number of training vectors/rows: {len(item_train)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look at the first few entries in the user training array."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"deletable": false
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th style=\"text-align: center;\"> [user id] </th><th style=\"text-align: center;\"> [rating count] </th><th style=\"text-align: center;\"> [rating ave] </th><th style=\"text-align: center;\"> Act ion </th><th style=\"text-align: center;\"> Adve nture </th><th style=\"text-align: center;\"> Anim ation </th><th style=\"text-align: center;\"> Chil dren </th><th style=\"text-align: center;\"> Com edy </th><th style=\"text-align: center;\"> Crime </th><th style=\"text-align: center;\"> Docum entary </th><th style=\"text-align: center;\"> Drama </th><th style=\"text-align: center;\"> Fan tasy </th><th style=\"text-align: center;\"> Hor ror </th><th style=\"text-align: center;\"> Mys tery </th><th style=\"text-align: center;\"> Rom ance </th><th style=\"text-align: center;\"> Sci -Fi </th><th style=\"text-align: center;\"> Thri ller </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"'<table>\\n<thead>\\n<tr><th style=\"text-align: center;\"> [user id] </th><th style=\"text-align: center;\"> [rating count] </th><th style=\"text-align: center;\"> [rating ave] </th><th style=\"text-align: center;\"> Act ion </th><th style=\"text-align: center;\"> Adve nture </th><th style=\"text-align: center;\"> Anim ation </th><th style=\"text-align: center;\"> Chil dren </th><th style=\"text-align: center;\"> Com edy </th><th style=\"text-align: center;\"> Crime </th><th style=\"text-align: center;\"> Docum entary </th><th style=\"text-align: center;\"> Drama </th><th style=\"text-align: center;\"> Fan tasy </th><th style=\"text-align: center;\"> Hor ror </th><th style=\"text-align: center;\"> Mys tery </th><th style=\"text-align: center;\"> Rom ance </th><th style=\"text-align: center;\"> Sci -Fi </th><th style=\"text-align: center;\"> Thri ller </th></tr>\\n</thead>\\n<tbody>\\n<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\\n<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\\n<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\\n<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\\n<tr><td style=\"text-align: center;\"> 2 </td><td style=\"text-align: center;\"> 22 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.1 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.0 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 3.9 </td><td style=\"text-align: center;\"> 3.9 </td></tr>\\n</tbody>\\n</table>'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First (5) examples/rows of x_u^(j) for USER j=2, including non-used to train features\n",
"pprint_train(user_train, user_features, uvs, u_s, maxcount=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Some of the user and item/movie features are not used in training. In the table above, the features in brackets \"[]\" such as the \"user id\", \"rating count\" and \"rating ave\" are not included when the model is trained and used.\n",
"Above you can see the per genre rating average for user 2. Zero entries are genre's which the user had not rated. The user vector is the same for all the movies rated by a user. \n",
"Let's look at the first few entries of the movie/item array."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"deletable": false
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th style=\"text-align: center;\"> [movie id] </th><th style=\"text-align: center;\"> year </th><th style=\"text-align: center;\"> ave rating </th><th style=\"text-align: center;\"> Act ion </th><th style=\"text-align: center;\"> Adve nture </th><th style=\"text-align: center;\"> Anim ation </th><th style=\"text-align: center;\"> Chil dren </th><th style=\"text-align: center;\"> Com edy </th><th style=\"text-align: center;\"> Crime </th><th style=\"text-align: center;\"> Docum entary </th><th style=\"text-align: center;\"> Drama </th><th style=\"text-align: center;\"> Fan tasy </th><th style=\"text-align: center;\"> Hor ror </th><th style=\"text-align: center;\"> Mys tery </th><th style=\"text-align: center;\"> Rom ance </th><th style=\"text-align: center;\"> Sci -Fi </th><th style=\"text-align: center;\"> Thri ller </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td style=\"text-align: center;\"> 6874 </td><td style=\"text-align: center;\"> 2003 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 8798 </td><td style=\"text-align: center;\"> 2004 </td><td style=\"text-align: center;\"> 3.8 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 46970 </td><td style=\"text-align: center;\"> 2006 </td><td style=\"text-align: center;\"> 3.2 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 48516 </td><td style=\"text-align: center;\"> 2006 </td><td style=\"text-align: center;\"> 4.3 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 58559 </td><td style=\"text-align: center;\"> 2008 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"'<table>\\n<thead>\\n<tr><th style=\"text-align: center;\"> [movie id] </th><th style=\"text-align: center;\"> year </th><th style=\"text-align: center;\"> ave rating </th><th style=\"text-align: center;\"> Act ion </th><th style=\"text-align: center;\"> Adve nture </th><th style=\"text-align: center;\"> Anim ation </th><th style=\"text-align: center;\"> Chil dren </th><th style=\"text-align: center;\"> Com edy </th><th style=\"text-align: center;\"> Crime </th><th style=\"text-align: center;\"> Docum entary </th><th style=\"text-align: center;\"> Drama </th><th style=\"text-align: center;\"> Fan tasy </th><th style=\"text-align: center;\"> Hor ror </th><th style=\"text-align: center;\"> Mys tery </th><th style=\"text-align: center;\"> Rom ance </th><th style=\"text-align: center;\"> Sci -Fi </th><th style=\"text-align: center;\"> Thri ller </th></tr>\\n</thead>\\n<tbody>\\n<tr><td style=\"text-align: center;\"> 6874 </td><td style=\"text-align: center;\"> 2003 </td><td style=\"text-align: center;\"> 4.0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td></tr>\\n<tr><td style=\"text-align: center;\"> 8798 </td><td style=\"text-align: center;\"> 2004 </td><td style=\"text-align: center;\"> 3.8 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td></tr>\\n<tr><td style=\"text-align: center;\"> 46970 </td><td style=\"text-align: center;\"> 2006 </td><td style=\"text-align: center;\"> 3.2 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td></tr>\\n<tr><td style=\"text-align: center;\"> 48516 </td><td style=\"text-align: center;\"> 2006 </td><td style=\"text-align: center;\"> 4.3 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td></tr>\\n<tr><td style=\"text-align: center;\"> 58559 </td><td style=\"text-align: center;\"> 2008 </td><td style=\"text-align: center;\"> 4.2 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0 </td></tr>\\n</tbody>\\n</table>'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First (5) examples/rows of x_m^(i) for MOVIES i = 0, 1, 2, 3, 4, including non-used to train features\n",
"pprint_train(item_train, item_features, ivs, i_s, maxcount=5, user=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, the movie array contains the year the film was released, the average rating and an indicator for each potential genre. The indicator is one for each genre that applies to the movie. The movie id is not used in training but is useful when interpreting the data."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"y_train / labels size: (50884,)\n",
"y_train[:5]: [4. 3.5 4. 4. 4.5]\n"
]
}
],
"source": [
"# y_train -> Some USERS j give 50884 average ratings \n",
"# to some MOVIES i.\n",
"print('y_train / labels size:',y_train.shape)\n",
"\n",
"# Print 1st (5) average ratings related to MOVIES i, from a USER j.\n",
"print(f\"y_train[:5]: {y_train[:5]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The target, y, is the movie rating given by the user. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Above, we can see that movie 6874 is an Action/Crime/Thriller movie released in 2003. \"[userid]\" = 2 rates thriller movies as 3.9 on average. MovieLens \"[movieid]\" = 6874 users, gave the movie an average rating \"ave rating\" of 4. 'y' = 'y_train' is 4 avergae rating at 1st position, indicating \"[userid]\" = 2 rated \"[movieid]\" = 6874 as a \"ave rating\" of 4 as well. A single training example consists of a row from both, the USER and ITEM arrays/vectors displayed as tables above, and an average rating from 'y_train'."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"3.2\"></a>\n",
"### 3.2 Preparing the training data\n",
"Recall in Course 1, Week 2, you explored feature scaling as a means of improving convergence. We'll scale the input features using the [scikit learn StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) -> scaled x = (x - mu) / sigma. This was used in Course 1, Week 2, Lab 5. Below, the inverse_transform is also shown to produce the original inputs. We'll scale the target ratings using a MinMaxScaler((-1, 1)) which scales the target to be between -1 and 1. [scikit learn MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Scaled item_train = x_m^(i) 2D array with size: (50884, 17)\n",
"Scaled user_train = x_u^(j) 2D array with size: (50884, 17)\n",
"Scaled y_train = y 2D array with size: (50884, 1)\n",
"True\n",
"True\n"
]
}
],
"source": [
"# Store UNSCALED 2D arrays of features x_m^(i) and x_u^(j), \n",
"# and also store UNSCALED 1D array with average ratings / labels \n",
"# of some USERS j to some MOVIES i.\n",
"item_train_unscaled = item_train\n",
"user_train_unscaled = user_train\n",
"y_train_unscaled = y_train\n",
"\n",
"# https://www.youtube.com/watch?v=sxEqtjLC0aM\n",
"# SCALING is performed to get ALL the features, contributing equally to the model,\n",
"# preventing that features with larger values, dominate the model.\n",
"\n",
"# Create (obj) to scale MOVIES training data -> scaled x = (x - mu) / sigma\n",
"# Scaling, we'll center the distribution of MOVIES data at mu = 0, \n",
"# keeping a standard deviation of sigma = 1.\n",
"scalerItem = StandardScaler()\n",
"\n",
"# Compute the mean = mu of USERS data and std = sigma of USERS data, \n",
"# to be used for later scaling. \n",
"scalerItem.fit(item_train)\n",
"\n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = 1) the MOVIES training data.\n",
"item_train = scalerItem.transform(item_train)\n",
"print('Scaled item_train = x_m^(i) 2D array with size:',item_train.shape)\n",
"\n",
"# Create (obj) to scale USERS training data -> scaled x = (x - mu) / sigma\n",
"# Scaling, we'll center the distribution of USERS data at mu = 0, \n",
"# keeping a standard deviation of sigma = 1.\n",
"scalerUser = StandardScaler()\n",
"\n",
"# Compute the mean = mu of USERS data and std = sigma of USERS data, \n",
"# to be used for later scaling. \n",
"scalerUser.fit(user_train)\n",
"\n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = 1) the USERS training data.\n",
"user_train = scalerUser.transform(user_train)\n",
"print('Scaled user_train = x_u^(j) 2D array with size:',user_train.shape)\n",
"\n",
"# y_new = (y - min_y) / (max_y - min_y) -> MinMax normalization\n",
"# by default, scales the data in a feature_range=(0, 1) \n",
"# between [0 -> 1], but when doing feature_range=(-1, 1), it will ensure that \n",
"# 'y_train' values will be between [-1 -> 1] for training.\n",
"scalerTarget = MinMaxScaler((-1, 1))\n",
"\n",
"# y_train.reshape(-1, 1) -> ALL elements (-1,) in 1D array 'y_train', are put\n",
"# in just (,1) column, as a 2D array of [50884 rows x 1 col]\n",
"# Then, Compute the minimum 'y_min' and maximum 'y_max',\n",
"# to be used for later scaling.\n",
"scalerTarget.fit(y_train.reshape(-1, 1))\n",
"\n",
"# y_train.reshape(-1, 1) -> ALL elements (-1,) in 1D array 'y_train', are put\n",
"# in just (,1) column, as a 2D array of [50884 rows x 1 col]\n",
"# Scale values of 'y_train' according to feature_range=(-1, 1)\n",
"y_train = scalerTarget.transform(y_train.reshape(-1, 1))\n",
"#ynorm_test = scalerTarget.transform(y_test.reshape(-1, 1))\n",
"print('Scaled y_train = y 2D array with size:', y_train.shape)\n",
"\n",
"# 'scalerItem.inverse_transform(item_train)' and 'scalerUser.inverse_transform(user_train)',\n",
"# Scale BACK the USERS and MOVIES data, to the ORIGINAL REPRESENTATION.\n",
"\n",
"# Then 'np.allclose(A, B)' checks whether two 2D arrays, A and B, are equal element-by-element,\n",
"# within a specified tolerance / positive float value, to be comparable, so\n",
"# If a small number is inside a large value -> True)\n",
"# The function returns 'True' if all elements are within that tolerance / positive float value \n",
"# and 'False' otherwise. \n",
"# if the following equation is element-wise True, then allclose returns True:\n",
"# absolute(a - b) <= (atol + rtol * absolute(b))\n",
"\n",
"# 'np.allclose(A, B, rtol=1e-05, atol=1e-08)'\n",
"# np.allclose([1e10,1e-8], [1.00001e10,1e-9]) -> True\n",
"# np.allclose([1e10,1e-8], [1.0001e10,1e-9]) -> False\n",
"\n",
"# In BOTH cases, 'originals' and 'scaled back' 2D arrays are EQUAL, so function\n",
"# 'np.allclose(original, scaled back)' returns 'True'.\n",
"print(np.allclose(item_train_unscaled, scalerItem.inverse_transform(item_train)))\n",
"print(np.allclose(user_train_unscaled, scalerUser.inverse_transform(user_train)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To allow us to evaluate the results, we will split the SCALED data into training and test sets as was discussed in Course 2, Week 3. Here we will use [sklean train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) to split and shuffle the SCALED data. Note that setting the initial random state to the same value ensures ITEM, USER, and y are SHUFFLED IDENTICALLY, so ORDER RELATIONSHIP between rows/vectors per each of them (ITEM, USER, and y) CONTINUES."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MOVIE/ITEM training data 80% shape: (40707, 17)\n",
"MOVIE/ITEM test data 20% shape: (10177, 17)\n",
"USER training data 80% shape: (40707, 17)\n",
"USER test data 20% shape: (10177, 17)\n",
"y_train training data 80% shape: (40707, 1)\n",
"y_test test data 20% shape: (10177, 1)\n"
]
}
],
"source": [
"# Get shuffled 80% of item_train data as TRAIN set -> 'x_m^(i)_train' \n",
"# and shuffled 20% of item_train data as TEST set -> 'x_m^(i)_test' \n",
"item_train, item_test = train_test_split(item_train, train_size=0.80, shuffle=True, random_state=1)\n",
"\n",
"# Get shuffled 80% of user_train data as TRAIN set -> 'x_u^(j)_train' \n",
"# and shuffled 20% of user_train data as TEST set -> 'x_u^(j)_test'\n",
"user_train, user_test = train_test_split(user_train, train_size=0.80, shuffle=True, random_state=1)\n",
"\n",
"# Get shuffled 80% of y_train data as TRAIN set -> 'y_train' \n",
"# and shuffled 20% of y_train data as TEST set -> 'y_test'\n",
"y_train, y_test = train_test_split(y_train, train_size=0.80, shuffle=True, random_state=1)\n",
"\n",
"# MOVIES TRAIN SET 'x_m^(i)_train' size is 0.8 * (50884, 17) = (40707, 17) \n",
"print(f\"MOVIE/ITEM training data 80% shape: {item_train.shape}\")\n",
"\n",
"# MOVIES TEST SET 'x_m^(i)_test' size is 0.2 * (50884, 17) = (10177, 17)\n",
"print(f\"MOVIE/ITEM test data 20% shape: {item_test.shape}\")\n",
"\n",
"# USERS TRAIN SET 'x_u^(j)_train' size is 0.8 * (50884, 17) = (40707, 17) \n",
"print(f\"USER training data 80% shape: {user_train.shape}\")\n",
"\n",
"# USERS TEST SET 'x_m^(i)_test' size is 0.2 * (50884, 17) = (10177, 17)\n",
"print(f\"USER test data 20% shape: {user_test.shape}\") \n",
"\n",
"# TRAIN SET 'y_train' size is 0.8 * (50884, 1) = (40707, 1)\n",
"print(f\"y_train training data 80% shape: {y_train.shape}\")\n",
"\n",
"# TEST SET 'y_test' size is 0.2 * (50884, 1) = (10177, 1)\n",
"print(f\"y_test test data 20% shape: {y_test.shape}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The scaled, shuffled data now has a mean of zero, so the distribution is centered a mu = 0."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"deletable": false
},
"outputs": [
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th style=\"text-align: center;\"> [user id] </th><th style=\"text-align: center;\"> [rating count] </th><th style=\"text-align: center;\"> [rating ave] </th><th style=\"text-align: center;\"> Act ion </th><th style=\"text-align: center;\"> Adve nture </th><th style=\"text-align: center;\"> Anim ation </th><th style=\"text-align: center;\"> Chil dren </th><th style=\"text-align: center;\"> Com edy </th><th style=\"text-align: center;\"> Crime </th><th style=\"text-align: center;\"> Docum entary </th><th style=\"text-align: center;\"> Drama </th><th style=\"text-align: center;\"> Fan tasy </th><th style=\"text-align: center;\"> Hor ror </th><th style=\"text-align: center;\"> Mys tery </th><th style=\"text-align: center;\"> Rom ance </th><th style=\"text-align: center;\"> Sci -Fi </th><th style=\"text-align: center;\"> Thri ller </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> -1.0 </td><td style=\"text-align: center;\"> -0.8 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> 0.1 </td><td style=\"text-align: center;\"> -0.0 </td><td style=\"text-align: center;\"> -1.2 </td><td style=\"text-align: center;\"> -0.4 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> -0.7 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> -0.2 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.2 </td><td style=\"text-align: center;\"> 0.7 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.8 </td><td style=\"text-align: center;\"> 0.1 </td><td style=\"text-align: center;\"> -0.0 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.4 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> -0.2 </td><td style=\"text-align: center;\"> 0.3 </td><td style=\"text-align: center;\"> -0.4 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 1.0 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> -1.2 </td><td style=\"text-align: center;\"> -0.3 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -2.3 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> -0.0 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.2 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> -1.2 </td><td style=\"text-align: center;\"> 0.9 </td><td style=\"text-align: center;\"> 1.2 </td><td style=\"text-align: center;\"> -2.3 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.2 </td><td style=\"text-align: center;\"> 0.3 </td></tr>\n",
"<tr><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0.7 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.3 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 1.0 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 0.3 </td><td style=\"text-align: center;\"> 0.8 </td><td style=\"text-align: center;\"> 0.8 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> 0.7 </td><td style=\"text-align: center;\"> 0.7 </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"'<table>\\n<thead>\\n<tr><th style=\"text-align: center;\"> [user id] </th><th style=\"text-align: center;\"> [rating count] </th><th style=\"text-align: center;\"> [rating ave] </th><th style=\"text-align: center;\"> Act ion </th><th style=\"text-align: center;\"> Adve nture </th><th style=\"text-align: center;\"> Anim ation </th><th style=\"text-align: center;\"> Chil dren </th><th style=\"text-align: center;\"> Com edy </th><th style=\"text-align: center;\"> Crime </th><th style=\"text-align: center;\"> Docum entary </th><th style=\"text-align: center;\"> Drama </th><th style=\"text-align: center;\"> Fan tasy </th><th style=\"text-align: center;\"> Hor ror </th><th style=\"text-align: center;\"> Mys tery </th><th style=\"text-align: center;\"> Rom ance </th><th style=\"text-align: center;\"> Sci -Fi </th><th style=\"text-align: center;\"> Thri ller </th></tr>\\n</thead>\\n<tbody>\\n<tr><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> -1.0 </td><td style=\"text-align: center;\"> -0.8 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> 0.1 </td><td style=\"text-align: center;\"> -0.0 </td><td style=\"text-align: center;\"> -1.2 </td><td style=\"text-align: center;\"> -0.4 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> -0.7 </td></tr>\\n<tr><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 1 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.7 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> -0.2 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.2 </td><td style=\"text-align: center;\"> 0.7 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.8 </td><td style=\"text-align: center;\"> 0.1 </td><td style=\"text-align: center;\"> -0.0 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -0.5 </td><td style=\"text-align: center;\"> -0.4 </td></tr>\\n<tr><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> -0.2 </td><td style=\"text-align: center;\"> 0.3 </td><td style=\"text-align: center;\"> -0.4 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 1.0 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> -1.2 </td><td style=\"text-align: center;\"> -0.3 </td><td style=\"text-align: center;\"> -0.6 </td><td style=\"text-align: center;\"> -2.3 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> -0.0 </td></tr>\\n<tr><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.2 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> -1.2 </td><td style=\"text-align: center;\"> 0.9 </td><td style=\"text-align: center;\"> 1.2 </td><td style=\"text-align: center;\"> -2.3 </td><td style=\"text-align: center;\"> -0.1 </td><td style=\"text-align: center;\"> 0.0 </td><td style=\"text-align: center;\"> 0.2 </td><td style=\"text-align: center;\"> 0.3 </td></tr>\\n<tr><td style=\"text-align: center;\"> -1 </td><td style=\"text-align: center;\"> 0 </td><td style=\"text-align: center;\"> 0.7 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.3 </td><td style=\"text-align: center;\"> 0.5 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 1.0 </td><td style=\"text-align: center;\"> 0.6 </td><td style=\"text-align: center;\"> 0.3 </td><td style=\"text-align: center;\"> 0.8 </td><td style=\"text-align: center;\"> 0.8 </td><td style=\"text-align: center;\"> 0.4 </td><td style=\"text-align: center;\"> 0.7 </td><td style=\"text-align: center;\"> 0.7 </td></tr>\\n</tbody>\\n</table>'"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First (5) examples/rows of x_u^(j) SCALED for USER j=2, including non-used to train features\n",
"pprint_train(user_train, user_features, uvs, u_s, maxcount=5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"4\"></a>\n",
"## 4 - Neural Network for content-based filtering\n",
"Now, let's construct a neural network as described in the figure above. It will have two networks that are combined by a dot product. You will construct the two networks. In this example, they will be identical. Note that these networks do not need to be the same. If the user content was substantially larger than the movie content, you might elect to increase the complexity of the user network relative to the movie network. In this case, the content is similar, so the networks are the same.\n",
"\n",
"<a name=\"ex01\"></a>\n",
"### Exercise 1\n",
"\n",
"- Use a Keras sequential model\n",
" - The first layer is a dense layer with 256 units and a relu activation.\n",
" - The second layer is a dense layer with 128 units and a relu activation.\n",
" - The third layer is a dense layer with `num_outputs` units and a linear or no activation. \n",
" \n",
"The remainder of the network will be provided. The provided code does not use the Keras sequential model but instead uses the Keras [functional api](https://keras.io/guides/functional_api/). This format allows for more flexibility in how components are interconnected.\n"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"deletable": false,
"id": "CBjZ2HhRwpa0"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: \"model\"\n",
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"input_1 (InputLayer) [(None, 14)] 0 \n",
"__________________________________________________________________________________________________\n",
"input_2 (InputLayer) [(None, 16)] 0 \n",
"__________________________________________________________________________________________________\n",
"sequential (Sequential) (None, 32) 40864 input_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"sequential_1 (Sequential) (None, 32) 41376 input_2[0][0] \n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize/Square [(None, 32)] 0 sequential[0][0] \n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_1/Squa [(None, 32)] 0 sequential_1[0][0] \n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize/Sum (T [(None, 1)] 0 tf_op_layer_l2_normalize/Square[0\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_1/Sum [(None, 1)] 0 tf_op_layer_l2_normalize_1/Square\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize/Maximu [(None, 1)] 0 tf_op_layer_l2_normalize/Sum[0][0\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_1/Maxi [(None, 1)] 0 tf_op_layer_l2_normalize_1/Sum[0]\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize/Rsqrt [(None, 1)] 0 tf_op_layer_l2_normalize/Maximum[\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_1/Rsqr [(None, 1)] 0 tf_op_layer_l2_normalize_1/Maximu\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize (Tenso [(None, 32)] 0 sequential[0][0] \n",
" tf_op_layer_l2_normalize/Rsqrt[0]\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_1 (Ten [(None, 32)] 0 sequential_1[0][0] \n",
" tf_op_layer_l2_normalize_1/Rsqrt[\n",
"__________________________________________________________________________________________________\n",
"dot (Dot) (None, 1) 0 tf_op_layer_l2_normalize[0][0] \n",
" tf_op_layer_l2_normalize_1[0][0] \n",
"==================================================================================================\n",
"Total params: 82,240\n",
"Trainable params: 82,240\n",
"Non-trainable params: 0\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"# EXERCISE 1\n",
"# GRADED_CELL\n",
"# UNQ_C1\n",
"\n",
"num_outputs = 32 # Ouput Layer Units\n",
"tf.random.set_seed(1) # Establish a seed to generate the same results, each time this cell is executed.\n",
"\n",
"# Create a tf.keras 'Sequential([Dense HL1, Dense HL2, Dense OL])' model \n",
"user_NN = tf.keras.models.Sequential([\n",
" \n",
" ### START CODE HERE ###\n",
" \n",
" tf.keras.layers.Dense(256, activation = 'relu'), # HL1 has 256 hidden units and 'relu' activation function\n",
" tf.keras.layers.Dense(128, activation = 'relu'), # HL1 has 128 hidden units and 'relu' activation function\n",
" tf.keras.layers.Dense(num_outputs, activation = 'linear') # OL has 32 output units and 'linear' activation function\n",
" \n",
" ### END CODE HERE ### \n",
" ])\n",
"\n",
"item_NN = tf.keras.models.Sequential([\n",
" \n",
" ### START CODE HERE ###\n",
" \n",
" tf.keras.layers.Dense(256, activation = 'relu'), # HL1 has 256 hidden units and 'relu' activation function\n",
" tf.keras.layers.Dense(128, activation = 'relu'), # HL1 has 128 hidden units and 'relu' activation function\n",
" tf.keras.layers.Dense(num_outputs, activation = 'linear') # OL has 32 output units and 'linear' activation function\n",
" \n",
" ### END CODE HERE ### \n",
"])\n",
"\n",
"# Extracts out ALL the USER j input features -> x_u^(j), so create the USER input (obj) of USER network, \n",
"# with shape = (num_user_features) = (14) cols/feat -> \n",
"# x_u^(j)=[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14] features x rows/vectors/batch size\n",
"# tf.Keras -> input_shape=(cols, rows), so if we use just (cols,) = (cols) \n",
"# then tf.keras 'ADDs' automatically a (cols, None) which is later replaced by the batch size (vectors/rows)\n",
"input_user = tf.keras.layers.Input(shape=(num_user_features,))\n",
"\n",
"# FEEDS x_u^(j) to the USER Network 'user_NN', computing USER vector V_u^(j) with 32 elements.\n",
"vu = user_NN(input_user)\n",
"\n",
"# Each of 32 elements at V_u^(j) is divided by its magnitude or norm -> || V_u^(j) ||, so\n",
"# we end up with a UNITARY VECTOR V_u^(j) WITH LENGTH = ‘1’. \n",
"# This code, NORMALIZES to ‘1’ the LENGHT of vector V_u^(j).\n",
"# It turns out to make this algorithm WORK A BIT BETTER.\n",
"# Normalizes each element in row-> direction of the vu tensor (defined by axis=1) \n",
"# so that the L2 norm at row-> direction is equal to 1\n",
"vu = tf.linalg.l2_normalize(vu, axis=1)\n",
"\n",
"# Extracts out ALL the MOVIE i input features -> x_m^(i), so create the MOVIE input (obj) of MOVIE network, \n",
"# with shape = (num_item_features) = (16) cols/feat -> \n",
"# x_m^(i)=[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16] features x rows/vectors/batch size\n",
"# tf.Keras -> input_shape=(cols, rows), so if we add just (cols,) = (cols) \n",
"# then tf.keras 'ADDs' automatically a (cols, None) which is later replaced by the batch size (vectors/rows)\n",
"input_item = tf.keras.layers.Input(shape=(num_item_features))\n",
"\n",
"# FEEDS x_m^(i) to the MOVIE Network 'item_NN', computing MOVIE vector V_m^(i) with 32 elements.\n",
"vm = item_NN(input_item)\n",
"\n",
"# Each of 32 elements at V_m^(j) is divided by its magnitude or norm -> || V_m^(i) ||, so\n",
"# we end up with a UNITARY VECTOR V_m^(i) WITH LENGTH = ‘1’. \n",
"# This code, NORMALIZES to ‘1’ the LENGHT of vector V_m^(i).\n",
"# It turns out to make this algorithm WORK A BIT BETTER.\n",
"# Normalizes each element in row-> direction of the vm tensor (defined by axis=1) \n",
"# so that the L2 norm at row-> direction is equal to 1\n",
"vm = tf.linalg.l2_normalize(vm, axis=1)\n",
"\n",
"# Compute the dot (.) product between these two (2) vectors with the same size (32 elements), computed above. \n",
"# output = y^(i, j) = V_u^(j) . V_m^(i) done through rows (->) axis, so (axes=1)\n",
"# vu = [vu1, vu2, ... , vu32] \n",
"# vm = [vm1, vm2, ... , vm32]\n",
"# y^ = vu . vm = (vu1 * vm1) + (vu2 * vm2) + ... + (vu32 * vm32) = scalar\n",
"output = tf.keras.layers.Dot(axes=1)([vu, vm])\n",
"\n",
"# Tell Keras what are the inputs = [ input_user x_u^(j) , input_item x_m^(i) ] and the output y^(i, j) of the model.\n",
"# Say that the overall model, is a model with inputs = [ input_user x_u^(j) , input_item x_m^(i) ] \n",
"# being the USER j features x_u^(j) and ITEM / MOVIE i features x_m^(i), and the output = y^(i, j).\n",
"model = tf.keras.Model([input_user, input_item], output)\n",
"\n",
"# Provides a concise and useful summary of the model, including:\n",
"# 'Name' and 'type' of each layer in the model, the 'output dimensions' of each layer, \n",
"# and the 'total number of trainable parameters'.\n",
"model.summary()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[92mAll tests passed!\n",
"\u001b[92mAll tests passed!\n"
]
}
],
"source": [
"# Import ALL (*) modules from public_tests DL.ai own library\n",
"from public_tests import *\n",
"\n",
"# TEST 'user_NN' model\n",
"test_tower(user_NN)\n",
"\n",
"# TEST 'item_NN' model\n",
"test_tower(item_NN)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary><font size=\"3\" color=\"darkgreen\"><b>Click for hints</b></font></summary>\n",
" \n",
" You can create a dense layer with a relu activation as shown.\n",
" \n",
"```python \n",
"user_NN = tf.keras.models.Sequential([\n",
" ### START CODE HERE ### \n",
" tf.keras.layers.Dense(256, activation='relu'),\n",
"\n",
" \n",
" ### END CODE HERE ### \n",
"])\n",
"\n",
"item_NN = tf.keras.models.Sequential([\n",
" ### START CODE HERE ### \n",
" tf.keras.layers.Dense(256, activation='relu'),\n",
"\n",
" \n",
" ### END CODE HERE ### \n",
"])\n",
"``` \n",
"<details>\n",
" <summary><font size=\"2\" color=\"darkblue\"><b> Click for solution</b></font></summary>\n",
" \n",
"```python \n",
"user_NN = tf.keras.models.Sequential([\n",
" ### START CODE HERE ### \n",
" tf.keras.layers.Dense(256, activation='relu'),\n",
" tf.keras.layers.Dense(128, activation='relu'),\n",
" tf.keras.layers.Dense(num_outputs),\n",
" ### END CODE HERE ### \n",
"])\n",
"\n",
"item_NN = tf.keras.models.Sequential([\n",
" ### START CODE HERE ### \n",
" tf.keras.layers.Dense(256, activation='relu'),\n",
" tf.keras.layers.Dense(128, activation='relu'),\n",
" tf.keras.layers.Dense(num_outputs),\n",
" ### END CODE HERE ### \n",
"])\n",
"```\n",
"</details>\n",
"</details>\n",
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will use a Mean Squared Error (MSE) as loss and an 'Adam' OPTIMIZER."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"deletable": false,
"id": "pGK5MEUowxN4"
},
"outputs": [],
"source": [
"# Establish a seed to generate the same results, \n",
"# each time this cell is executed.\n",
"tf.random.set_seed(1)\n",
"\n",
"# Create loss function J = MSE = Σ (y^ - y)2 / m\n",
"cost_fn = tf.keras.losses.MeanSquaredError()\n",
"\n",
"# Create Adam (obj) optimizer with initial lr = 0.01. This\n",
"# adapts assigning an individual lr each parameter (in NNs),\n",
"# It modifies lr automatically during training, \n",
"# reaching a lower/best cost convergence, more quickly.\n",
"opt = keras.optimizers.Adam(learning_rate=0.01)\n",
"\n",
"# Perform GD minimizing COST J as MSE function,\n",
"# via 'Adam' optimizer.\n",
"model.compile(optimizer=opt,\n",
" loss=cost_fn)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"deletable": false,
"id": "6zHf7eASw0tN"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 40707 samples\n",
"Epoch 1/30\n",
"40707/40707 [==============================] - 5s 122us/sample - loss: 0.1232\n",
"Epoch 2/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.1146\n",
"Epoch 3/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.1089\n",
"Epoch 4/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.1039\n",
"Epoch 5/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.1001\n",
"Epoch 6/30\n",
"40707/40707 [==============================] - 5s 115us/sample - loss: 0.0973\n",
"Epoch 7/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0956\n",
"Epoch 8/30\n",
"40707/40707 [==============================] - 5s 114us/sample - loss: 0.0935\n",
"Epoch 9/30\n",
"40707/40707 [==============================] - 5s 115us/sample - loss: 0.0916\n",
"Epoch 10/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0897\n",
"Epoch 11/30\n",
"40707/40707 [==============================] - 5s 114us/sample - loss: 0.0880\n",
"Epoch 12/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0865\n",
"Epoch 13/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0852\n",
"Epoch 14/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0839\n",
"Epoch 15/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0830\n",
"Epoch 16/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0815\n",
"Epoch 17/30\n",
"40707/40707 [==============================] - 5s 115us/sample - loss: 0.0807\n",
"Epoch 18/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0796\n",
"Epoch 19/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0786\n",
"Epoch 20/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0776\n",
"Epoch 21/30\n",
"40707/40707 [==============================] - 5s 111us/sample - loss: 0.0769\n",
"Epoch 22/30\n",
"40707/40707 [==============================] - 5s 112us/sample - loss: 0.0761\n",
"Epoch 23/30\n",
"40707/40707 [==============================] - 5s 112us/sample - loss: 0.0755\n",
"Epoch 24/30\n",
"40707/40707 [==============================] - 5s 111us/sample - loss: 0.0746\n",
"Epoch 25/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0741\n",
"Epoch 26/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0733\n",
"Epoch 27/30\n",
"40707/40707 [==============================] - 5s 112us/sample - loss: 0.0728\n",
"Epoch 28/30\n",
"40707/40707 [==============================] - 5s 113us/sample - loss: 0.0723\n",
"Epoch 29/30\n",
"40707/40707 [==============================] - 5s 111us/sample - loss: 0.0717\n",
"Epoch 30/30\n",
"40707/40707 [==============================] - 5s 111us/sample - loss: 0.0713\n"
]
},
{
"data": {
"text/plain": [
"<tensorflow.python.keras.callbacks.History at 0x7450604a9dd0>"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Establish a seed to generate the same results, \n",
"# each time this cell is executed.\n",
"tf.random.set_seed(1)\n",
"\n",
"# TRAIN / LEARN /FIT the model, to determine 'loss' on the TRAIN data. \n",
"# Input arguments are a list of arrays. \n",
"# (in case the model has multiple inputs) -> x=[x_users,x_movies]\n",
"# x = [x_u^(j)[All rows, from feature 3], x_m^(i)[All rows, from feature 1]]\n",
"# 0,1,2 cols NO 0 col NO\n",
"\n",
"# An epoch is an iteration over the ENTIRE x and y data provided.\n",
"# 1 epoch -> 40707 rows/iters\n",
"# 30 epochs -> 40707 iters * 30 times = 1.221.210 iters\n",
"\n",
"# model.fit([x_users_selected_features, x_movies_selected_features], y_train, epochs=30)\n",
"model.fit([user_train[:, u_s:], item_train[:, i_s:]], y_train, epochs=30)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Evaluate the model to determine loss on the test data. "
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"10177/10177 [==============================] - 0s 37us/sample - loss: 0.0815\n"
]
},
{
"data": {
"text/plain": [
"0.08146006993124337"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# EVALUATE the model, to determine 'loss' on the TEST data. \n",
"# Input arguments are a list of arrays. \n",
"# (in case the model has multiple inputs) -> x=[x_users,x_movies]\n",
"# x = [x_u^(j)[All rows, from feature 3], x_m^(i)[All rows, from feature 1]]\n",
"# 0,1,2 cols NO 0 col NO \n",
"\n",
"# 1 epoch -> 10177 rows/iters * 1 time\n",
"# model.evaluate([x_users_selected_features, x_movies_selected_features], y_test)\n",
"model.evaluate([user_test[:, u_s:], item_test[:, i_s:]], y_test)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Loss = 0.0713 in TRAIN set, and Loss = 0.0814 in TEST set. It is comparable to the training loss indicating the model has NOT substantially OVERFIT (Loss in the TEST set is NOT MUCH MORE HIGHER than Loss in the TRAINING data)."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Xsre-gquwEls"
},
"source": [
"<a name=\"5\"></a>\n",
"## 5 - Predictions\n",
"Below, you'll use your model to make predictions in a number of circumstances. \n",
"<a name=\"5.1\"></a>\n",
"### 5.1 - Predictions for a new user\n",
"First, we'll create a new user and have the model suggest movies for that user. After you have tried this on the example user content, feel free to change the user content to match your own preferences and see what the model suggests. Note that ratings are between [0.5 and 5.0], inclusive, in half-step increments (+0.5)."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"deletable": false,
"id": "4_7nZyPiVJ4r"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NEW USER j 2D array of features = [[new_user_id,new_rating_count,new_rating_ave,new_action,\n",
"new_adventure,new_animation,new_childrens,new_comedy,new_crime,new_documentary,new_drama,\n",
"new_fantasy,new_horror,new_mystery,new_romance,new_scifi,new_thriller]] = \n",
"[[5.e+03 3.e+00 0.e+00 0.e+00 5.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00\n",
" 0.e+00 5.e+00 0.e+00 0.e+00 0.e+00 0.e+00 0.e+00]] (1, 17)\n"
]
}
],
"source": [
"new_user_id = 5000 # x1 feature / col = new_user_id = 5000 <-\n",
"new_rating_count = 3 # x2 feature / col = new_rating_count = 3 <-\n",
"new_rating_ave = 0.0 # x3 feature / col = new_rating_ave = 0.0\n",
"new_action = 0.0 # x4 feature / col = new_action = 0.0\n",
"new_adventure = 5.0 # x5 feature / col = new_adventure = 5.0 <-\n",
"new_animation = 0.0 # x6 feature / col = new_childrens = 0.0\n",
"new_childrens = 0.0 # x7 feature / col = new_childrens = 0.0\n",
"new_comedy = 0.0 # x8 feature / col = new_comedy = 0.0\n",
"new_crime = 0.0 # x9 feature / col = new_crime = 0.0\n",
"new_documentary = 0.0 # x10 feature / col = new_documentary = 0.0\n",
"new_drama = 0.0 # x11 feature / col = new_drama = 0.0\n",
"new_fantasy = 5.0 # x12 feature / col = new_fantasy = 5.0 <-\n",
"new_horror = 0.0 # x13 feature / col = new_horror = 0.0\n",
"new_mystery = 0.0 # x14 feature / col = new_mystery = 0.0\n",
"new_romance = 0.0 # x15 feature / col = new_romance = 0.0\n",
"new_scifi = 0.0 # x16 feature / col = new_scifi = 0.0\n",
"new_thriller = 0.0 # x17 feature / col = new_thriller = 0.0\n",
"\n",
"# Create 2D array/vector for NEW USER j features\n",
"# x_u^(new j) = [x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17]\n",
"user_vec = np.array([[new_user_id, new_rating_count, new_rating_ave,\n",
" new_action, new_adventure, new_animation, new_childrens,\n",
" new_comedy, new_crime, new_documentary,\n",
" new_drama, new_fantasy, new_horror, new_mystery,\n",
" new_romance, new_scifi, new_thriller]])\n",
"\n",
"print('NEW USER j 2D array of features = [[new_user_id,new_rating_count,new_rating_ave,new_action,')\n",
"print('new_adventure,new_animation,new_childrens,new_comedy,new_crime,new_documentary,new_drama,')\n",
"print('new_fantasy,new_horror,new_mystery,new_romance,new_scifi,new_thriller]] = ')\n",
"print(user_vec, user_vec.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The new user enjoys movies from the adventure, fantasy genres. Let's find the top-rated movies for the new user. \n",
"Below, we'll use a set of movie/item vectors, `item_vecs` that have a vector for each movie in the training/test set. This is matched with the new user vector above and the scaled vectors are used to predict ratings for all the movies."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"NEW USER [vector] is repeated * 847 times -> (2D array)\n",
" [[5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]\n",
" [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]\n",
" [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]\n",
" ...\n",
" [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]\n",
" [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]\n",
" [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]] (847, 17)\n",
"\n",
"847 MOVIES data set for TRAIN or TEST (2D array)\n",
" [[4.05400000e+03 2.00100000e+03 2.84375000e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.06900000e+03 2.00100000e+03 2.90909091e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.14800000e+03 2.00100000e+03 2.93589744e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]\n",
" ...\n",
" [1.77765000e+05 2.01700000e+03 3.53846154e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [1.79819000e+05 2.01700000e+03 3.12500000e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]\n",
" [1.87593000e+05 2.01800000e+03 3.87500000e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]] (847, 17)\n",
"\n",
"SCALED user_vecs 2D array \n",
" [[26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
" -5.67635045]\n",
" [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
" -5.67635045]\n",
" [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
" -5.67635045]\n",
" ...\n",
" [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
" -5.67635045]\n",
" [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
" -5.67635045]\n",
" [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
" -5.67635045]] (847, 17)\n",
"\n",
"SCALED item_vecs 2D array \n",
" [[-0.9488947 -1.19927702 -1.98317861 ... 2.50507064 -0.50822265\n",
" -0.65976722]\n",
" [-0.94849765 -1.19927702 -1.81511436 ... 2.50507064 -0.50822265\n",
" -0.65976722]\n",
" [-0.94640649 -1.19927702 -1.74616492 ... -0.39919034 -0.50822265\n",
" 1.51568608]\n",
" ...\n",
" [ 3.64928572 2.816088 -0.19630151 ... -0.39919034 -0.50822265\n",
" -0.65976722]\n",
" [ 3.7036557 2.816088 -1.25977162 ... -0.39919034 1.96764154\n",
" -0.65976722]\n",
" [ 3.90943572 3.06704831 0.6693137 ... -0.39919034 1.96764154\n",
" -0.65976722]] (847, 17)\n",
"\n",
"SCALED y^ predictions\n",
" [[-0.23707277]\n",
" [-0.04912706]\n",
" [-0.06851547]] ... (847, 1)\n",
"\n",
"UNSCALED y^ predictions\n",
" [[2.2165864]\n",
" [2.639464 ]\n",
" [2.5958402]] ... (847, 1)\n",
"\n",
"min: 1.3867546 \n",
"max: 4.459245\n",
"\n",
"Indices of UNSCALED predictions y^, sorted in DESCENDING order:\n",
" [717, 254, 467, 367, 749, 636, 521, 142, 133, 470, 307, 68, 809, 822, 126, 181, 222, 166, 583, 507, 838, 798, 816, 652, 659, 56, 474, 429, 793, 797, 584, 606, 341, 664, 522, 399, 242, 371, 625, 615, 703, 558, 224, 778, 571, 840, 674, 804, 519, 788, 765, 803, 761, 716, 805, 726, 800, 160, 630, 830, 807, 264, 748, 802, 506, 691, 731, 63, 303, 841, 569, 815, 289, 445, 782, 458, 666, 442, 685, 811, 514, 799, 106, 76, 312, 562, 84, 191, 122, 497, 760, 769, 836, 476, 528, 818, 746, 827, 17, 425, 739, 533, 772, 195, 801, 209, 58, 611, 619, 575, 568, 736, 167, 53, 677, 610, 786, 255, 781, 843, 197, 655, 488, 419, 426, 708, 846, 718, 629, 171, 795, 137, 215, 486, 516, 729, 730, 667, 845, 600, 217, 357, 469, 359, 578, 400, 808, 586, 758, 695, 780, 375, 302, 164, 205, 322, 754, 694, 712, 643, 383, 745, 473, 103, 279, 498, 72, 320, 551, 680, 721, 706, 502, 828, 592, 550, 775, 210, 176, 285, 559, 71, 637, 690, 641, 384, 770, 250, 129, 283, 829, 379, 594, 494, 272, 660, 380, 481, 259, 839, 108, 417, 203, 239, 742, 777, 538, 747, 612, 491, 774, 410, 49, 128, 246, 415, 151, 267, 783, 287, 705, 247, 825, 243, 671, 673, 740, 333, 138, 487, 638, 163, 100, 101, 627, 462, 601, 329, 268, 675, 446, 639, 80, 423, 219, 356, 62, 238, 552, 553, 589, 485, 844, 598, 654, 345, 618, 681, 149, 744, 785, 394, 334, 409, 688, 200, 817, 806, 192, 253, 542, 621, 146, 127, 290, 732, 234, 332, 55, 412, 837, 791, 454, 508, 330, 713, 789, 12, 358, 484, 684, 599, 428, 541, 461, 722, 548, 672, 416, 707, 232, 305, 436, 168, 544, 687, 402, 537, 170, 83, 344, 278, 699, 245, 495, 361, 269, 702, 700, 372, 373, 318, 87, 821, 218, 102, 143, 216, 472, 734, 77, 832, 438, 78, 693, 751, 309, 581, 585, 44, 656, 16, 505, 382, 260, 640, 698, 724, 683, 427, 251, 824, 157, 120, 155, 339, 135, 310, 814, 573, 753, 764, 297, 66, 236, 23, 779, 493, 784, 653, 513, 196, 743, 596, 397, 366, 741, 766, 199, 11, 812, 90, 273, 582, 771, 389, 327, 422, 546, 479, 536, 450, 193, 95, 147, 69, 554, 390, 362, 70, 790, 762, 678, 288, 376, 750, 180, 337, 348, 530, 725, 424, 404, 757, 834, 125, 831, 296, 281, 336, 274, 331, 19, 709, 43, 208, 475, 787, 378, 152, 67, 186, 564, 292, 720, 300, 52, 523, 97, 32, 202, 145, 201, 31, 679, 45, 231, 328, 313, 520, 30, 483, 642, 306, 710, 326, 172, 527, 737, 1, 187, 556, 534, 154, 566, 282, 248, 230, 123, 10, 496, 15, 38, 354, 110, 96, 112, 468, 411, 204, 119, 104, 174, 512, 617, 405, 244, 418, 212, 2, 169, 842, 453, 51, 5, 595, 756, 81, 759, 388, 480, 704, 323, 3, 466, 363, 340, 392, 13, 563, 228, 263, 456, 256, 431, 634, 321, 308, 132, 347, 50, 489, 258, 60, 319, 88, 477, 183, 577, 588, 355, 211, 796, 452, 509, 613, 301, 835, 86, 241, 73, 525, 432, 85, 111, 61, 826, 398, 299, 545, 692, 9, 82, 833, 591, 351, 35, 669, 57, 650, 714, 207, 490, 153, 421, 158, 131, 93, 42, 603, 820, 609, 437, 189, 368, 276, 381, 190, 628, 433, 24, 188, 738, 349, 413, 492, 304, 543, 39, 440, 275, 447, 46, 225, 41, 33, 324, 26, 614, 633, 733, 342, 252, 117, 570, 604, 286, 316, 113, 161, 144, 576, 459, 792, 315, 261, 34, 139, 767, 291, 36, 130, 715, 115, 471, 752, 121, 408, 59, 374, 185, 338, 173, 8, 646, 150, 7, 298, 249, 156, 608, 407, 532, 620, 464, 21, 293, 295, 696, 140, 518, 504, 94, 435, 547, 233, 501, 4, 177, 386, 455, 114, 414, 266, 22, 385, 574, 265, 503, 823, 723, 364, 406, 622, 572, 387, 227, 277, 346, 448, 148, 794, 605, 240, 515, 109, 510, 37, 353, 6, 511, 220, 271, 451, 92, 635, 178, 213, 526, 555, 420, 819, 0, 593, 270, 182, 403, 449, 631, 531, 676, 623, 443, 727, 99, 535, 47, 517, 647, 649, 352, 524, 662, 134, 661, 116, 311, 567, 393, 284, 444, 165, 48, 529, 735, 597, 141, 54, 237, 478, 29, 460, 560, 540, 91, 206, 65, 162, 194, 632, 776, 587, 401, 658, 294, 314, 499, 579, 370, 75, 159, 391, 74, 235, 317, 670, 369, 434, 223, 396, 226, 561, 500, 360, 682, 580, 365, 644, 64, 184, 539, 335, 105, 665, 350, 463, 626, 711, 18, 107, 221, 262, 728, 118, 686, 607, 40, 624, 465, 697, 136, 719, 663, 79, 14, 395, 668, 657, 689, 214, 482, 557, 602, 441, 565, 280, 701, 179, 25, 98, 651, 325, 590, 645, 20, 377, 175, 457, 124, 755, 89, 229, 343, 768, 549, 198, 439, 648, 430, 763, 810, 257, 616, 813, 773, 27, 28]\n",
"\n",
"UNSCALED predictions y^, sorted in DESCENDING way:\n",
" [[4.459245 ]\n",
" [4.386574 ]\n",
" [4.356962 ]\n",
" [4.3426695]\n",
" [4.317039 ]\n",
" [4.3073645]\n",
" [4.2947803]\n",
" [4.2924314]\n",
" [4.275124 ]\n",
" [4.2571397]\n",
" [4.230271 ]\n",
" [4.1942124]\n",
" [4.188543 ]\n",
" [4.175948 ]\n",
" [4.170507 ]\n",
" [4.1418047]\n",
" [4.0994697]\n",
" [4.0632615]\n",
" [4.027624 ]\n",
" [4.0164332]\n",
" [4.016054 ]\n",
" [4.011395 ]\n",
" [4.004601 ]\n",
" [3.9898522]\n",
" [3.9830139]\n",
" [3.9795961]\n",
" [3.966485 ]\n",
" [3.9224193]\n",
" [3.8783858]\n",
" [3.8728654]\n",
" [3.8682032]\n",
" [3.862696 ]\n",
" [3.858278 ]\n",
" [3.8417935]\n",
" [3.8351555]\n",
" [3.8336205]\n",
" [3.8277843]\n",
" [3.822945 ]\n",
" [3.8089836]\n",
" [3.7802577]\n",
" [3.7743752]\n",
" [3.7605236]\n",
" [3.757972 ]\n",
" [3.7539535]\n",
" [3.7464256]\n",
" [3.7420485]\n",
" [3.7341976]\n",
" [3.7309663]\n",
" [3.7300212]\n",
" [3.7193718]\n",
" [3.715157 ]\n",
" [3.7145047]\n",
" [3.712654 ]\n",
" [3.710556 ]\n",
" [3.7024705]\n",
" [3.7008436]\n",
" [3.6914954]\n",
" [3.6834488]\n",
" [3.681274 ]\n",
" [3.6790495]\n",
" [3.675489 ]\n",
" [3.6681402]\n",
" [3.656627 ]\n",
" [3.6542082]\n",
" [3.646564 ]\n",
" [3.6444435]\n",
" [3.6262605]\n",
" [3.6232762]\n",
" [3.622971 ]\n",
" [3.6115336]\n",
" [3.6092427]\n",
" [3.6058416]\n",
" [3.6057715]\n",
" [3.6020522]\n",
" [3.589968 ]\n",
" [3.5831838]\n",
" [3.5829515]\n",
" [3.578299 ]\n",
" [3.5778322]\n",
" [3.5687437]\n",
" [3.5542655]\n",
" [3.5497556]\n",
" [3.5350182]\n",
" [3.5291631]\n",
" [3.520699 ]\n",
" [3.511447 ]\n",
" [3.5097666]\n",
" [3.5094917]\n",
" [3.508343 ]\n",
" [3.5033417]\n",
" [3.4993262]\n",
" [3.4943023]\n",
" [3.4778545]\n",
" [3.4760363]\n",
" [3.4748478]\n",
" [3.4566607]\n",
" [3.4558184]\n",
" [3.4547846]\n",
" [3.4486434]\n",
" [3.4427898]\n",
" [3.4353976]\n",
" [3.4306555]\n",
" [3.429614 ]\n",
" [3.428885 ]\n",
" [3.4287713]\n",
" [3.4279718]\n",
" [3.4247575]\n",
" [3.4213297]\n",
" [3.4182765]\n",
" [3.4130368]\n",
" [3.4110832]\n",
" [3.4102583]\n",
" [3.4097643]\n",
" [3.4063737]\n",
" [3.4002748]\n",
" [3.3987231]\n",
" [3.398623 ]\n",
" [3.3906348]\n",
" [3.3903694]\n",
" [3.389829 ]\n",
" [3.3871489]\n",
" [3.3866808]\n",
" [3.3850012]\n",
" [3.3811111]\n",
" [3.377124 ]\n",
" [3.3718107]\n",
" [3.3699038]\n",
" [3.3665583]\n",
" [3.3641217]\n",
" [3.3586016]\n",
" [3.356424 ]\n",
" [3.3532934]\n",
" [3.349794 ]\n",
" [3.344625 ]\n",
" [3.341354 ]\n",
" [3.3403924]\n",
" [3.329246 ]\n",
" [3.3230593]\n",
" [3.315028 ]\n",
" [3.3147888]\n",
" [3.3134296]\n",
" [3.3104444]\n",
" [3.3101196]\n",
" [3.3042507]\n",
" [3.3017254]\n",
" [3.29436 ]\n",
" [3.285603 ]\n",
" [3.2848089]\n",
" [3.2833614]\n",
" [3.2800026]\n",
" [3.2757351]\n",
" [3.2754014]\n",
" [3.2743013]\n",
" [3.2714224]\n",
" [3.2710786]\n",
" [3.2680635]\n",
" [3.2673464]\n",
" [3.2672486]\n",
" [3.2671494]\n",
" [3.2651696]\n",
" [3.2641792]\n",
" [3.2618062]\n",
" [3.2618036]\n",
" [3.255795 ]\n",
" [3.2507322]\n",
" [3.246841 ]\n",
" [3.2463443]\n",
" [3.2446918]\n",
" [3.24319 ]\n",
" [3.2395382]\n",
" [3.237233 ]\n",
" [3.2371721]\n",
" [3.2298684]\n",
" [3.2273216]\n",
" [3.22687 ]\n",
" [3.2258818]\n",
" [3.2234397]\n",
" [3.2187822]\n",
" [3.216659 ]\n",
" [3.2164807]\n",
" [3.2162209]\n",
" [3.2130265]\n",
" [3.2122602]\n",
" [3.2121081]\n",
" [3.212059 ]\n",
" [3.210527 ]\n",
" [3.205883 ]\n",
" [3.205141 ]\n",
" [3.2012467]\n",
" [3.1993237]\n",
" [3.1984463]\n",
" [3.1955576]\n",
" [3.1876934]\n",
" [3.1871533]\n",
" [3.1851745]\n",
" [3.1850007]\n",
" [3.1792161]\n",
" [3.1791651]\n",
" [3.1772995]\n",
" [3.1753993]\n",
" [3.1741834]\n",
" [3.172162 ]\n",
" [3.1686847]\n",
" [3.1673284]\n",
" [3.163831 ]\n",
" [3.163402 ]\n",
" [3.1623733]\n",
" [3.159294 ]\n",
" [3.1570888]\n",
" [3.1557903]\n",
" [3.152236 ]\n",
" [3.1468313]\n",
" [3.1457014]\n",
" [3.1442165]\n",
" [3.1440685]\n",
" [3.1379778]\n",
" [3.1367998]\n",
" [3.1359854]\n",
" [3.134318 ]\n",
" [3.1331477]\n",
" [3.1282246]\n",
" [3.1268792]\n",
" [3.1199179]\n",
" [3.1192653]\n",
" [3.1180048]\n",
" [3.1165247]\n",
" [3.1159902]\n",
" [3.1151383]\n",
" [3.1149805]\n",
" [3.1146512]\n",
" [3.1142213]\n",
" [3.113139 ]\n",
" [3.1121082]\n",
" [3.1114218]\n",
" [3.111372 ]\n",
" [3.1089427]\n",
" [3.1055648]\n",
" [3.1033432]\n",
" [3.1011677]\n",
" [3.1002688]\n",
" [3.0984313]\n",
" [3.0970464]\n",
" [3.0952497]\n",
" [3.0907462]\n",
" [3.0839705]\n",
" [3.082735 ]\n",
" [3.081305 ]\n",
" [3.080568 ]\n",
" [3.0774834]\n",
" [3.0771737]\n",
" [3.07623 ]\n",
" [3.07568 ]\n",
" [3.0672991]\n",
" [3.0668454]\n",
" [3.0663552]\n",
" [3.0650723]\n",
" [3.064833 ]\n",
" [3.0640829]\n",
" [3.0623057]\n",
" [3.0581558]\n",
" [3.0572338]\n",
" [3.0569944]\n",
" [3.0564816]\n",
" [3.0562506]\n",
" [3.0560067]\n",
" [3.0518785]\n",
" [3.0476475]\n",
" [3.0434175]\n",
" [3.0411425]\n",
" [3.0372245]\n",
" [3.0361981]\n",
" [3.0331616]\n",
" [3.0314822]\n",
" [3.0297985]\n",
" [3.0274296]\n",
" [3.026195 ]\n",
" [3.0255785]\n",
" [3.0237148]\n",
" [3.0189033]\n",
" [3.0089145]\n",
" [3.0030246]\n",
" [2.9998238]\n",
" [2.9988348]\n",
" [2.9968467]\n",
" [2.995912 ]\n",
" [2.9888647]\n",
" [2.9882288]\n",
" [2.9871707]\n",
" [2.9848285]\n",
" [2.9826763]\n",
" [2.9823775]\n",
" [2.9807193]\n",
" [2.977845 ]\n",
" [2.9765027]\n",
" [2.9733286]\n",
" [2.9731193]\n",
" [2.9700468]\n",
" [2.9686456]\n",
" [2.9666126]\n",
" [2.9633884]\n",
" [2.9529583]\n",
" [2.950475 ]\n",
" [2.947521 ]\n",
" [2.9428596]\n",
" [2.9398391]\n",
" [2.931745 ]\n",
" [2.9313867]\n",
" [2.9281583]\n",
" [2.9249833]\n",
" [2.9243996]\n",
" [2.9174228]\n",
" [2.9155014]\n",
" [2.9153402]\n",
" [2.9139547]\n",
" [2.9108362]\n",
" [2.9097197]\n",
" [2.9086504]\n",
" [2.907938 ]\n",
" [2.9067705]\n",
" [2.9036984]\n",
" [2.9030735]\n",
" [2.8977413]\n",
" [2.8974717]\n",
" [2.8937263]\n",
" [2.8906958]\n",
" [2.8904102]\n",
" [2.887924 ]\n",
" [2.8871226]\n",
" [2.88349 ]\n",
" [2.8802798]\n",
" [2.8802757]\n",
" [2.8758712]\n",
" [2.8748074]\n",
" [2.8735645]\n",
" [2.8732839]\n",
" [2.8722389]\n",
" [2.8705797]\n",
" [2.8704703]\n",
" [2.8704703]\n",
" [2.8687508]\n",
" [2.868703 ]\n",
" [2.8679636]\n",
" [2.8654406]\n",
" [2.8601098]\n",
" [2.8555186]\n",
" [2.8517022]\n",
" [2.8501189]\n",
" [2.8489628]\n",
" [2.8448539]\n",
" [2.8427477]\n",
" [2.8419244]\n",
" [2.840814 ]\n",
" [2.838309 ]\n",
" [2.8373907]\n",
" [2.8367581]\n",
" [2.83457 ]\n",
" [2.8327494]\n",
" [2.830987 ]\n",
" [2.8256779]\n",
" [2.8212893]\n",
" [2.8193748]\n",
" [2.8189123]\n",
" [2.8184104]\n",
" [2.8183722]\n",
" [2.8171496]\n",
" [2.810015 ]\n",
" [2.8095067]\n",
" [2.8073156]\n",
" [2.806714 ]\n",
" [2.8050184]\n",
" [2.8033564]\n",
" [2.8027973]\n",
" [2.8023014]\n",
" [2.800736 ]\n",
" [2.7985754]\n",
" [2.7972455]\n",
" [2.7964704]\n",
" [2.7940426]\n",
" [2.792991 ]\n",
" [2.7922904]\n",
" [2.7911847]\n",
" [2.790652 ]\n",
" [2.7862058]\n",
" [2.7859676]\n",
" [2.7853107]\n",
" [2.7766385]\n",
" [2.7765558]\n",
" [2.7750258]\n",
" [2.7742536]\n",
" [2.7720919]\n",
" [2.7686663]\n",
" [2.7667444]\n",
" [2.7657218]\n",
" [2.7644656]\n",
" [2.7631004]\n",
" [2.7618058]\n",
" [2.7590315]\n",
" [2.7581983]\n",
" [2.7550712]\n",
" [2.7484536]\n",
" [2.7480764]\n",
" [2.7439861]\n",
" [2.7434502]\n",
" [2.7399552]\n",
" [2.7378337]\n",
" [2.7304523]\n",
" [2.7286909]\n",
" [2.7277074]\n",
" [2.7271566]\n",
" [2.7249606]\n",
" [2.724156 ]\n",
" [2.7232275]\n",
" [2.716372 ]\n",
" [2.7152648]\n",
" [2.7115605]\n",
" [2.7075589]\n",
" [2.70336 ]\n",
" [2.7017856]\n",
" [2.7003314]\n",
" [2.700186 ]\n",
" [2.6998413]\n",
" [2.6987233]\n",
" [2.6950955]\n",
" [2.6940217]\n",
" [2.6913233]\n",
" [2.688191 ]\n",
" [2.6865904]\n",
" [2.6850533]\n",
" [2.6823823]\n",
" [2.6805615]\n",
" [2.6803775]\n",
" [2.6773064]\n",
" [2.6758685]\n",
" [2.6752088]\n",
" [2.6741009]\n",
" [2.6730125]\n",
" [2.6722443]\n",
" [2.669653 ]\n",
" [2.6694539]\n",
" [2.6682634]\n",
" [2.6679778]\n",
" [2.6677628]\n",
" [2.6677172]\n",
" [2.6656177]\n",
" [2.664939 ]\n",
" [2.6644602]\n",
" [2.6632714]\n",
" [2.6623666]\n",
" [2.6580782]\n",
" [2.6579041]\n",
" [2.6548362]\n",
" [2.6548073]\n",
" [2.654058 ]\n",
" [2.648099 ]\n",
" [2.6459012]\n",
" [2.645074 ]\n",
" [2.6432161]\n",
" [2.641315 ]\n",
" [2.6398938]\n",
" [2.639464 ]\n",
" [2.6378903]\n",
" [2.6374006]\n",
" [2.6363246]\n",
" [2.6350818]\n",
" [2.6324053]\n",
" [2.6319304]\n",
" [2.6300516]\n",
" [2.6222234]\n",
" [2.6218534]\n",
" [2.6193838]\n",
" [2.6192203]\n",
" [2.61805 ]\n",
" [2.616669 ]\n",
" [2.6165226]\n",
" [2.615718 ]\n",
" [2.6150997]\n",
" [2.614617 ]\n",
" [2.6145685]\n",
" [2.6143036]\n",
" [2.6116984]\n",
" [2.6115143]\n",
" [2.6099882]\n",
" [2.606089 ]\n",
" [2.6046457]\n",
" [2.6045628]\n",
" [2.6038723]\n",
" [2.6025136]\n",
" [2.6003213]\n",
" [2.5971851]\n",
" [2.5958402]\n",
" [2.5955691]\n",
" [2.595026 ]\n",
" [2.592542 ]\n",
" [2.5905085]\n",
" [2.589953 ]\n",
" [2.589952 ]\n",
" [2.5882008]\n",
" [2.5879226]\n",
" [2.586005 ]\n",
" [2.5858579]\n",
" [2.5770447]\n",
" [2.570223 ]\n",
" [2.5681543]\n",
" [2.5677862]\n",
" [2.5625234]\n",
" [2.5567896]\n",
" [2.5543346]\n",
" [2.5540805]\n",
" [2.5520144]\n",
" [2.551978 ]\n",
" [2.5486193]\n",
" [2.5478237]\n",
" [2.5452108]\n",
" [2.5449238]\n",
" [2.5423439]\n",
" [2.541688 ]\n",
" [2.541147 ]\n",
" [2.540767 ]\n",
" [2.5402722]\n",
" [2.5399518]\n",
" [2.5384429]\n",
" [2.5366092]\n",
" [2.5364537]\n",
" [2.5358658]\n",
" [2.5350237]\n",
" [2.534743 ]\n",
" [2.5336113]\n",
" [2.532395 ]\n",
" [2.5308697]\n",
" [2.5289187]\n",
" [2.5288603]\n",
" [2.5285912]\n",
" [2.5267336]\n",
" [2.5262392]\n",
" [2.523992 ]\n",
" [2.5204244]\n",
" [2.519497 ]\n",
" [2.5194392]\n",
" [2.5169785]\n",
" [2.5153475]\n",
" [2.5153377]\n",
" [2.5151036]\n",
" [2.510258 ]\n",
" [2.509283 ]\n",
" [2.5080023]\n",
" [2.5044427]\n",
" [2.5029643]\n",
" [2.499252 ]\n",
" [2.4983377]\n",
" [2.490515 ]\n",
" [2.4893146]\n",
" [2.4890385]\n",
" [2.4853323]\n",
" [2.4814494]\n",
" [2.47972 ]\n",
" [2.4736404]\n",
" [2.4729714]\n",
" [2.4726954]\n",
" [2.471537 ]\n",
" [2.4703338]\n",
" [2.4695883]\n",
" [2.4691405]\n",
" [2.4688356]\n",
" [2.4656997]\n",
" [2.4656487]\n",
" [2.4648743]\n",
" [2.4606938]\n",
" [2.460663 ]\n",
" [2.4547756]\n",
" [2.450889 ]\n",
" [2.4507804]\n",
" [2.4505074]\n",
" [2.4500084]\n",
" [2.4448385]\n",
" [2.443828 ]\n",
" [2.4427283]\n",
" [2.4425435]\n",
" [2.4424853]\n",
" [2.441924 ]\n",
" [2.4308195]\n",
" [2.429411 ]\n",
" [2.4286537]\n",
" [2.4271176]\n",
" [2.4249985]\n",
" [2.424007 ]\n",
" [2.4237623]\n",
" [2.4222713]\n",
" [2.4194424]\n",
" [2.418102 ]\n",
" [2.4167125]\n",
" [2.415009 ]\n",
" [2.4142053]\n",
" [2.41281 ]\n",
" [2.412647 ]\n",
" [2.411808 ]\n",
" [2.4112964]\n",
" [2.4112897]\n",
" [2.410063 ]\n",
" [2.4082572]\n",
" [2.4074452]\n",
" [2.4060278]\n",
" [2.4057636]\n",
" [2.4044743]\n",
" [2.4042602]\n",
" [2.4041007]\n",
" [2.4036908]\n",
" [2.4018984]\n",
" [2.4016447]\n",
" [2.4010084]\n",
" [2.4004822]\n",
" [2.3985963]\n",
" [2.398595 ]\n",
" [2.3967388]\n",
" [2.3953075]\n",
" [2.395129 ]\n",
" [2.3937993]\n",
" [2.3916276]\n",
" [2.3877718]\n",
" [2.3860486]\n",
" [2.3859446]\n",
" [2.3826723]\n",
" [2.3731287]\n",
" [2.3709443]\n",
" [2.3688664]\n",
" [2.3654697]\n",
" [2.3629036]\n",
" [2.3599215]\n",
" [2.3581252]\n",
" [2.3571038]\n",
" [2.3554688]\n",
" [2.3549633]\n",
" [2.3502882]\n",
" [2.3497207]\n",
" [2.348881 ]\n",
" [2.346566 ]\n",
" [2.3414059]\n",
" [2.3411784]\n",
" [2.3376496]\n",
" [2.3365414]\n",
" [2.3362248]\n",
" [2.3356526]\n",
" [2.32226 ]\n",
" [2.3202283]\n",
" [2.3192773]\n",
" [2.3184304]\n",
" [2.3158858]\n",
" [2.3140495]\n",
" [2.3125045]\n",
" [2.3122332]\n",
" [2.3069193]\n",
" [2.3043463]\n",
" [2.302479 ]\n",
" [2.3020773]\n",
" [2.3019671]\n",
" [2.2963629]\n",
" [2.2942834]\n",
" [2.2865484]\n",
" [2.2852025]\n",
" [2.2845268]\n",
" [2.2822044]\n",
" [2.281864 ]\n",
" [2.280296 ]\n",
" [2.2796564]\n",
" [2.2757158]\n",
" [2.2735412]\n",
" [2.2714844]\n",
" [2.2711594]\n",
" [2.2687004]\n",
" [2.2666905]\n",
" [2.2666657]\n",
" [2.2664237]\n",
" [2.2652128]\n",
" [2.26455 ]\n",
" [2.2622373]\n",
" [2.25876 ]\n",
" [2.2585402]\n",
" [2.257276 ]\n",
" [2.2560818]\n",
" [2.2510543]\n",
" [2.250623 ]\n",
" [2.2492592]\n",
" [2.249111 ]\n",
" [2.2468185]\n",
" [2.2437153]\n",
" [2.2415767]\n",
" [2.2406766]\n",
" [2.2374372]\n",
" [2.2327676]\n",
" [2.2275515]\n",
" [2.2274659]\n",
" [2.227354 ]\n",
" [2.226612 ]\n",
" [2.2263954]\n",
" [2.2257166]\n",
" [2.2252402]\n",
" [2.22102 ]\n",
" [2.2198718]\n",
" [2.2197866]\n",
" [2.2191205]\n",
" [2.2189407]\n",
" [2.2183263]\n",
" [2.217338 ]\n",
" [2.2165864]\n",
" [2.2140675]\n",
" [2.2095556]\n",
" [2.2080624]\n",
" [2.2034793]\n",
" [2.2025797]\n",
" [2.2013853]\n",
" [2.1997306]\n",
" [2.1991022]\n",
" [2.1990845]\n",
" [2.1976357]\n",
" [2.194289 ]\n",
" [2.1904447]\n",
" [2.1899855]\n",
" [2.18791 ]\n",
" [2.1843772]\n",
" [2.1827168]\n",
" [2.182671 ]\n",
" [2.1768842]\n",
" [2.1756914]\n",
" [2.1744227]\n",
" [2.1712582]\n",
" [2.1689332]\n",
" [2.167 ]\n",
" [2.1669657]\n",
" [2.1649854]\n",
" [2.164376 ]\n",
" [2.1640785]\n",
" [2.1616585]\n",
" [2.1602838]\n",
" [2.15687 ]\n",
" [2.1566195]\n",
" [2.1488578]\n",
" [2.148465 ]\n",
" [2.14821 ]\n",
" [2.1415062]\n",
" [2.1378906]\n",
" [2.1358144]\n",
" [2.1351008]\n",
" [2.1295419]\n",
" [2.1291142]\n",
" [2.1287608]\n",
" [2.1261158]\n",
" [2.1250408]\n",
" [2.1250272]\n",
" [2.1235154]\n",
" [2.120979 ]\n",
" [2.11898 ]\n",
" [2.1178327]\n",
" [2.1158757]\n",
" [2.1083198]\n",
" [2.1059887]\n",
" [2.104644 ]\n",
" [2.1006114]\n",
" [2.096213 ]\n",
" [2.0949194]\n",
" [2.0940018]\n",
" [2.09121 ]\n",
" [2.086474 ]\n",
" [2.0852158]\n",
" [2.0838983]\n",
" [2.0833483]\n",
" [2.0823803]\n",
" [2.076673 ]\n",
" [2.0758686]\n",
" [2.0756547]\n",
" [2.0716026]\n",
" [2.0701756]\n",
" [2.0619073]\n",
" [2.0584733]\n",
" [2.0582154]\n",
" [2.0578017]\n",
" [2.0570757]\n",
" [2.0569513]\n",
" [2.0549898]\n",
" [2.054412 ]\n",
" [2.053999 ]\n",
" [2.0517542]\n",
" [2.050838 ]\n",
" [2.050426 ]\n",
" [2.043964 ]\n",
" [2.0426598]\n",
" [2.0415776]\n",
" [2.0405672]\n",
" [2.0400157]\n",
" [2.0395718]\n",
" [2.03688 ]\n",
" [2.0311208]\n",
" [2.0277994]\n",
" [2.0270715]\n",
" [2.0188103]\n",
" [2.018795 ]\n",
" [2.0145507]\n",
" [2.0142775]\n",
" [2.013943 ]\n",
" [2.008393 ]\n",
" [1.9970746]\n",
" [1.9947002]\n",
" [1.9939047]\n",
" [1.9897671]\n",
" [1.9850045]\n",
" [1.9794842]\n",
" [1.9794585]\n",
" [1.9757434]\n",
" [1.9735702]\n",
" [1.9697552]\n",
" [1.9694178]\n",
" [1.9674922]\n",
" [1.964678 ]\n",
" [1.9584919]\n",
" [1.946818 ]\n",
" [1.9170725]\n",
" [1.9090813]\n",
" [1.9021583]\n",
" [1.9020703]\n",
" [1.896089 ]\n",
" [1.8950799]\n",
" [1.8934582]\n",
" [1.8916398]\n",
" [1.8851914]\n",
" [1.8687224]\n",
" [1.8522754]\n",
" [1.8298396]\n",
" [1.8277471]\n",
" [1.8194891]\n",
" [1.8017738]\n",
" [1.7828372]\n",
" [1.7364458]\n",
" [1.7360994]\n",
" [1.7264173]\n",
" [1.7087297]\n",
" [1.686799 ]\n",
" [1.6548401]\n",
" [1.64811 ]\n",
" [1.6300786]\n",
" [1.6179663]\n",
" [1.5992339]\n",
" [1.5755935]\n",
" [1.5595846]\n",
" [1.5460145]\n",
" [1.5141395]\n",
" [1.5105387]\n",
" [1.4863265]\n",
" [1.4122096]\n",
" [1.3867546]] (847, 1)\n",
"\n",
"MOVIES/vectors dataset (item_vecs), sorted in DESCENDING way:\n",
" [[9.88090000e+04 2.01200000e+03 3.81250000e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [8.36800000e+03 2.00400000e+03 3.91397849e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [5.40010000e+04 2.00700000e+03 3.86206897e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" ...\n",
" [1.12183000e+05 2.01400000e+03 3.34615385e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.37000000e+03 2.00100000e+03 3.33928571e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]\n",
" [4.38600000e+03 2.00100000e+03 2.81818182e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]] (847, 17)\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>y_pu</th>\n",
" <th>movie_id</th>\n",
" <th>rating ave</th>\n",
" <th>title</th>\n",
" <th>genres</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>4.5</td>\n",
" <td>98809</td>\n",
" <td>3.8</td>\n",
" <td>Hobbit: An Unexpected Journey, The (2012)</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>4.4</td>\n",
" <td>8368</td>\n",
" <td>3.9</td>\n",
" <td>Harry Potter and the Prisoner of Azkaban (2004)</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>4.4</td>\n",
" <td>54001</td>\n",
" <td>3.9</td>\n",
" <td>Harry Potter and the Order of the Phoenix (2007)</td>\n",
" <td>Adventure|Drama|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>4.3</td>\n",
" <td>40815</td>\n",
" <td>3.8</td>\n",
" <td>Harry Potter and the Goblet of Fire (2005)</td>\n",
" <td>Adventure|Fantasy|Thriller</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>4.3</td>\n",
" <td>106489</td>\n",
" <td>3.6</td>\n",
" <td>Hobbit: The Desolation of Smaug, The (2013)</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>4.3</td>\n",
" <td>81834</td>\n",
" <td>4.0</td>\n",
" <td>Harry Potter and the Deathly Hallows: Part 1 (...</td>\n",
" <td>Action|Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>4.3</td>\n",
" <td>59387</td>\n",
" <td>4.0</td>\n",
" <td>Fall, The (2006)</td>\n",
" <td>Adventure|Drama|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>4.3</td>\n",
" <td>5952</td>\n",
" <td>4.0</td>\n",
" <td>Lord of the Rings: The Two Towers, The (2002)</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>4.3</td>\n",
" <td>5816</td>\n",
" <td>3.6</td>\n",
" <td>Harry Potter and the Chamber of Secrets (2002)</td>\n",
" <td>Adventure|Fantasy</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>4.3</td>\n",
" <td>54259</td>\n",
" <td>3.6</td>\n",
" <td>Stardust (2007)</td>\n",
" <td>Adventure|Comedy|Fantasy|Romance</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" y_pu movie_id rating ave \\\n",
"0 4.5 98809 3.8 \n",
"1 4.4 8368 3.9 \n",
"2 4.4 54001 3.9 \n",
"3 4.3 40815 3.8 \n",
"4 4.3 106489 3.6 \n",
"5 4.3 81834 4.0 \n",
"6 4.3 59387 4.0 \n",
"7 4.3 5952 4.0 \n",
"8 4.3 5816 3.6 \n",
"9 4.3 54259 3.6 \n",
"\n",
" title \\\n",
"0 Hobbit: An Unexpected Journey, The (2012) \n",
"1 Harry Potter and the Prisoner of Azkaban (2004) \n",
"2 Harry Potter and the Order of the Phoenix (2007) \n",
"3 Harry Potter and the Goblet of Fire (2005) \n",
"4 Hobbit: The Desolation of Smaug, The (2013) \n",
"5 Harry Potter and the Deathly Hallows: Part 1 (... \n",
"6 Fall, The (2006) \n",
"7 Lord of the Rings: The Two Towers, The (2002) \n",
"8 Harry Potter and the Chamber of Secrets (2002) \n",
"9 Stardust (2007) \n",
"\n",
" genres \n",
"0 Adventure|Fantasy \n",
"1 Adventure|Fantasy \n",
"2 Adventure|Drama|Fantasy \n",
"3 Adventure|Fantasy|Thriller \n",
"4 Adventure|Fantasy \n",
"5 Action|Adventure|Fantasy \n",
"6 Adventure|Drama|Fantasy \n",
"7 Adventure|Fantasy \n",
"8 Adventure|Fantasy \n",
"9 Adventure|Comedy|Fantasy|Romance "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# REPEAT the NEW USER vector 847 times to match with \n",
"# the number of MOVIES (847) in the 'items_vec' TRAIN/TEST set.\n",
"# user_vecs= np.repeat(user_vec,len(item_vecs),axis=0) -> \n",
"# len(item_vecs) = 847 MOVIES/vectors, each MOVIE/vector has 17 MOVIE features\n",
"# axis = 0 -> Repeats 'user_vec' array n=847 times (1 per MOVIE) \n",
"# along x axis (rows)\n",
"user_vecs = gen_user_vecs(user_vec,len(item_vecs))\n",
"\n",
"# [[5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00] ->[user_vec]1 \n",
"# [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00] ->[user_vec]2\n",
"# [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00] ->[user_vec]3\n",
"# ...\n",
"# [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00] ->[user_vec]845\n",
"# [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00] ->[user_vec]846\n",
"# [5.e+03 3.e+00 0.e+00 ... 0.e+00 0.e+00 0.e+00]]->[user_vec]847 \n",
"# (847, 17) \n",
"print('NEW USER [vector] is repeated * 847 times -> (2D array)\\n',user_vecs, user_vecs.shape)\n",
"\n",
"# [[4.05400000e+03 2.00100000e+03 2.84375000e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] ->[movie 1_vec]\n",
"# [4.06900000e+03 2.00100000e+03 2.90909091e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] ->[movie 2_vec]\n",
"# [4.14800000e+03 2.00100000e+03 2.93589744e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00] ->[movie 3_vec]\n",
"# ...\n",
"# [1.77765000e+05 2.01700000e+03 3.53846154e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] ->[movie 845_vec]\n",
"# [1.79819000e+05 2.01700000e+03 3.12500000e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00] ->[movie 846_vec]\n",
"# [1.87593000e+05 2.01800000e+03 3.87500000e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00]]->[movie 847_vec] \n",
"# (847, 17)\n",
"print('\\n847 MOVIES data set for TRAIN or TEST (2D array)\\n',item_vecs, item_vecs.shape)\n",
"\n",
"# SCALE 'user_vecs' 2D array (847, 17), with 'scalerUser' (obj) \n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = 1).\n",
"suser_vecs = scalerUser.transform(user_vecs)\n",
"\n",
"# [[26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
"# -5.67635045] ->[scaled_user_vec]1\n",
"# [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
"# -5.67635045] ->[scaled_user_vec]2\n",
"# [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
"# -5.67635045] ->[scaled_user_vec]3\n",
"# ...\n",
"# [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
"# -5.67635045] ->[scaled_user_vec]845\n",
"# [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
"# -5.67635045] ->[scaled_user_vec]846\n",
"# [26.3020864 -1.2142439 -7.48604051 ... -4.86517161 -5.00476899\n",
"# -5.67635045]] ->[scaled_user_vec]847 \n",
"# (847, 17)\n",
"print('\\nSCALED user_vecs 2D array \\n', suser_vecs, suser_vecs.shape)\n",
"\n",
"# SCALE 'item_vecs' 2D array (847, 17) with 'scalerItem' (obj)\n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = 1).\n",
"sitem_vecs = scalerItem.transform(item_vecs)\n",
"\n",
"# [[-0.9488947 -1.19927702 -1.98317861 ... 2.50507064 -0.50822265\n",
"# -0.65976722] ->[scaled_item_vec]1\n",
"# [-0.94849765 -1.19927702 -1.81511436 ... 2.50507064 -0.50822265\n",
"# -0.65976722] ->[scaled_item_vec]2\n",
"# [-0.94640649 -1.19927702 -1.74616492 ... -0.39919034 -0.50822265\n",
"# 1.51568608] ->[scaled_item_vec]3\n",
"# ...\n",
"# [ 3.64928572 2.816088 -0.19630151 ... -0.39919034 -0.50822265\n",
"# -0.65976722] ->[scaled_item_vec]845\n",
"# [ 3.7036557 2.816088 -1.25977162 ... -0.39919034 1.96764154\n",
"# -0.65976722] ->[scaled_item_vec]846\n",
"# [ 3.90943572 3.06704831 0.6693137 ... -0.39919034 1.96764154\n",
"# -0.65976722]] ->[scaled_item_vec]847\n",
"# (847, 17)\n",
"print('\\nSCALED item_vecs 2D array \\n', sitem_vecs, sitem_vecs.shape)\n",
"\n",
"# Make a y^ scaled prediction. \n",
"# We defined OVERALL model, as a model with a list of multiple inputs \n",
"# inputs = [ x_u^(j)[All rows, from feature 3] , x_m^(i)[All rows, from feature 1] ]\n",
"# 0,1,2 cols NO 0 col NO \n",
"y_p = model.predict([suser_vecs[:, u_s:], sitem_vecs[:, i_s:]])\n",
"\n",
"# [[-0.23707277]\n",
"# [-0.04912706]\n",
"# [-0.06851547]] \n",
"# ... \n",
"# (847, 1)\n",
"print('\\nSCALED y^ predictions\\n',y_p[:3],'...', y_p.shape)\n",
"\n",
"# UNSCALE y^ prediction, to get back predicted RATINGS between [0-5]\n",
"# with 'scalerTarget' (obj) and '.inverse_transform(scaled prediction)' \n",
"y_pu = scalerTarget.inverse_transform(y_p)\n",
"\n",
"# [[2.2165864]\n",
"# [2.639464 ]\n",
"# [2.5958402]] ... (847, 1) \n",
"# min: 1.3867546 \n",
"# max: 4.459245\n",
"print('\\nUNSCALED y^ predictions\\n',y_pu[:3],'...', y_pu.shape)\n",
"print('\\nmin:',np.min(y_pu),'\\nmax:',np.max(y_pu))\n",
"\n",
"# Returns the INDICES that would sort the 'y_pu' 2D array along \n",
"# the 0 axis/cols (vertically) in ASCENDING order.\n",
"# The negative sign '-y_pu' reverses the order, so 'argsort()' \n",
"# returns the indices to sort 'y_pu' in DESCENDING order.\n",
"# Sort the results, HIGHEST prediction first.\n",
"# (-) negate 'y_pu' to get LARGEST rating 1st.\n",
"\n",
"# .reshape(-1) = .reshape(-1,) -> All rows (-1,) \n",
"# can be re-organized into ALL NEEDED columns (,empty).\n",
"\n",
"# .tolist() -> Converts 'numpy' array into 'pandas' list \n",
"sorted_index = np.argsort(-y_pu,axis=0).reshape(-1,).tolist()\n",
"\n",
"# [717, 254, 467, ... , 773, 27, 28] (847,) \n",
"print('\\nIndices of UNSCALED predictions y^, sorted in DESCENDING order:\\n',sorted_index)\n",
"\n",
"# Select the predictions/RATINGS, related with 'sorted indices' in DESCENDING way\n",
"# as -> unscaled_predictions[sorted_indices]\n",
"sorted_ypu = y_pu[sorted_index]\n",
"\n",
"# [[4.459245 ]\n",
"# [4.386574 ]\n",
"# [4.356962 ]\n",
"# ...\n",
"# [1.4863265]\n",
"# [1.4122096]\n",
"# [1.3867546]] (847,1)\n",
"print('\\nUNSCALED predictions y^, sorted in DESCENDING way:\\n',sorted_ypu, sorted_ypu.shape )\n",
"\n",
"# Select MOVIES rows/vectors, with 'indices' in DESENDING way\n",
"# as -> item_vecs[sorted_indices]\n",
"# Use unscaled vectors to get 'movie_id' (i.e 98809, 8368,...) for display\n",
"sorted_items = item_vecs[sorted_index] \n",
"\n",
"# [[9.88090000e+04 2.01200000e+03 3.81250000e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]1\n",
"# [8.36800000e+03 2.00400000e+03 3.91397849e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]2\n",
"# [5.40010000e+04 2.00700000e+03 3.86206897e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]3\n",
"# ...\n",
"# [1.12183000e+05 2.01400000e+03 3.34615385e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]845\n",
"# [4.37000000e+03 2.00100000e+03 3.33928571e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]846\n",
"# [4.38600000e+03 2.00100000e+03 2.81818182e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00]] -> [non-scaled_item_vec]847\n",
"# (847,17)\n",
"print('\\nMOVIES/vectors dataset (item_vecs), sorted in DESCENDING way:\\n',sorted_items,sorted_items.shape )\n",
"\n",
"# Number of loop iterations, to repeat appended rows [x1, x2, x3, x4, x5] list\n",
"maxcount=10\n",
"\n",
"# Init counter as ‘0’, so when count = maxcount -> break\n",
"count = 0\n",
"\n",
"# Init 'disp' list as empty []\n",
"disp=[]\n",
"#disp = [[\"y_pu\", \"movie id\", \"rating ave\", \"title\", \"genres\"]]\n",
"\n",
"# Iterate 847 times I=0,1,2,...,846\n",
"for i in range(sorted_ypu.shape[0]):\n",
" \n",
" # If count = maxcount -> True \n",
" if count == maxcount:\n",
" \n",
" # Stop\n",
" break\n",
" \n",
" # count = count + 1\n",
" # (init 0)\n",
" count += 1\n",
" \n",
" # Pick feature / col0 -> 'movie_id' per row/vector, at 'sorted_items'\n",
" # Display as integer 9.88090000e+04 -> 98809\n",
" movie_id = sorted_items[i,0].astype(int)\n",
" \n",
" # Feature / col 1 = 'sorted y^ unscaled', rounded to '1' decimal x.x\n",
" x1 = np.around(sorted_ypu[i, 0],1)\n",
" \n",
" # Feature / col 2 = 'movie_id'(col0) each row/vector at 'sorted_items'\n",
" x2 = sorted_items[i, 0].astype(int)\n",
" \n",
" # Feature / col 3 = 'rating average' of movie (col2) each row/vector at 'sorted_items'\n",
" # rounded to '1' decimal x.x\n",
" x3 = np.around(sorted_items[i, 2].astype(float),1)\n",
" \n",
" # Feature / col 4 = 'title' of MOVIE at 'movie_dict[movie_id]['title']'\n",
" x4 = movie_dict[movie_id]['title']\n",
" \n",
" # Feature / col 5 = 'genres' of MOVIE at 'movie_dict[movie_id]['genres']' \n",
" x5 = movie_dict[movie_id]['genres']\n",
" \n",
" # Construct a row/vector of [x1,x2,x3,x4,x5] features per iteration i.\n",
" # .append(vector) ADDS a new 'vector,' each iteration i to disp=[] list:\n",
" # disp=[[x1,x2,x3,x4,x5]i=0, [x1,x2,x3,x4,x5]i=1,...,[x1,x2,x3,x4,x5]i=846]\n",
" # but [ [x1,x2,x3,x4,x5]0, [x1,x2,x3,x4,x5]1,..., [x1,x2,x3,x4,x5]846 ] -> \n",
" # 2D array with different vectors/rows and the SAME cols / features.\n",
" disp.append([x1, x2, x3, x4, x5])\n",
" \n",
"#table = tabulate.tabulate(disp, tablefmt='html', headers=\"firstrow\")\n",
"\n",
"# Converts numpy array / list -> pandas df / table\n",
"df=pd.DataFrame(disp)\n",
"\n",
"# Rename cols / features 'names'\n",
"df = df.rename(columns={0: 'y_pu', 1: 'movie_id', 2: 'rating ave', 3: 'title', 4: 'genres'})\n",
"\n",
"# Sort ’df’ by ’y^ predition unscaled’ column, in DESCENDING way\n",
"df.sort_values(by=['y_pu'], ascending=False)\n",
"\n",
"\n",
"# Select MOVIES 'sorted_items', related to DESCENDING order \n",
"# unscaled predictions/RATINGS 'sorted_ypu', \n",
"# at df/table/dict 'movie_dict', and then display just its first 10 rows\n",
"#print_pred_movies(sorted_ypu, sorted_items, movie_dict, maxcount = 10)"
]
},
{
"cell_type": "markdown",
"metadata": {
"tags": []
},
"source": [
"<a name=\"5.2\"></a>\n",
"### 5.2 - Predictions for an existing user.\n",
"Let's look at the predictions for \"user 2\", one of the users in the data set. We can compare the predicted ratings with the model's ratings."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"EXISTING USER 2 [vector] is repeated * 847 times -> (2D array)\n",
" [[ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" ...\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]] (847, 17)\n",
"EXISTING USER 2, 847 Ratings/labels (One per MOVIE)\n",
" [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 3.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 4.5 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 4.5 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 3. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 3. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 4.5 0.\n",
" 5. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 3.5 0. 0. 0. 2.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 3.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 3.5 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. ] (847,)\n",
"\n",
"847 MOVIES data set for TRAIN or TEST (2D array)\n",
" [[4.05400000e+03 2.00100000e+03 2.84375000e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.06900000e+03 2.00100000e+03 2.90909091e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.14800000e+03 2.00100000e+03 2.93589744e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]\n",
" ...\n",
" [1.77765000e+05 2.01700000e+03 3.53846154e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [1.79819000e+05 2.01700000e+03 3.12500000e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]\n",
" [1.87593000e+05 2.01800000e+03 3.87500000e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]] (847, 17)\n",
"\n",
"SCALED user_vecs 2D array \n",
" [[-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
" 0.60679525]\n",
" [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
" 0.60679525]\n",
" [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
" 0.60679525]\n",
" ...\n",
" [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
" 0.60679525]\n",
" [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
" 0.60679525]\n",
" [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
" 0.60679525]] (847, 17)\n",
"\n",
"SCALED item_vecs 2D array \n",
" [[-0.9488947 -1.19927702 -1.98317861 ... 2.50507064 -0.50822265\n",
" -0.65976722]\n",
" [-0.94849765 -1.19927702 -1.81511436 ... 2.50507064 -0.50822265\n",
" -0.65976722]\n",
" [-0.94640649 -1.19927702 -1.74616492 ... -0.39919034 -0.50822265\n",
" 1.51568608]\n",
" ...\n",
" [ 3.64928572 2.816088 -0.19630151 ... -0.39919034 -0.50822265\n",
" -0.65976722]\n",
" [ 3.7036557 2.816088 -1.25977162 ... -0.39919034 1.96764154\n",
" -0.65976722]\n",
" [ 3.90943572 3.06704831 0.6693137 ... -0.39919034 1.96764154\n",
" -0.65976722]] (847, 17)\n",
"\n",
"SCALED y^ predictions\n",
" [[0.28323317]\n",
" [0.22770423]\n",
" [0.36427093]] ... (847, 1)\n",
"\n",
"UNSCALED y^ predictions\n",
" [[3.3872745]\n",
" [3.2623346]\n",
" [3.5696096]] ... (847, 1)\n",
"\n",
"min: 1.5109389 \n",
"max: 4.520821\n",
"\n",
"Indices of UNSCALED predictions y^, sorted in DESCENDING order:\n",
" [630, 43, 830, 95, 197, 203, 742, 224, 840, 783, 306, 718, 242, 246, 384, 625, 721, 71, 698, 568, 104, 541, 782, 185, 559, 446, 11, 52, 426, 46, 842, 578, 146, 841, 322, 506, 481, 143, 786, 740, 753, 562, 743, 125, 795, 333, 128, 61, 381, 166, 769, 488, 373, 287, 706, 307, 703, 811, 619, 329, 193, 147, 688, 712, 489, 569, 677, 493, 423, 383, 243, 691, 667, 150, 479, 514, 690, 205, 273, 356, 433, 476, 376, 452, 681, 176, 751, 770, 363, 195, 533, 99, 428, 380, 361, 113, 144, 279, 63, 255, 486, 505, 149, 84, 741, 62, 756, 781, 250, 629, 151, 225, 491, 550, 192, 685, 388, 137, 803, 234, 824, 754, 253, 804, 296, 126, 470, 282, 305, 425, 516, 66, 603, 303, 628, 239, 484, 120, 705, 299, 780, 708, 318, 699, 209, 191, 10, 537, 502, 400, 429, 348, 328, 309, 444, 103, 78, 424, 163, 633, 473, 498, 789, 805, 744, 836, 843, 551, 405, 778, 106, 752, 839, 653, 267, 508, 459, 784, 269, 222, 310, 334, 247, 217, 678, 91, 210, 77, 775, 760, 829, 302, 665, 76, 747, 174, 637, 285, 419, 202, 347, 26, 231, 483, 774, 73, 382, 707, 846, 675, 673, 215, 445, 777, 412, 32, 641, 827, 671, 639, 627, 645, 487, 109, 187, 521, 272, 564, 121, 35, 552, 288, 420, 68, 598, 416, 362, 436, 244, 548, 592, 389, 622, 57, 101, 332, 759, 654, 683, 278, 47, 292, 838, 469, 802, 152, 454, 815, 165, 367, 649, 809, 472, 720, 594, 739, 464, 142, 480, 581, 585, 584, 714, 542, 635, 323, 715, 801, 117, 765, 81, 797, 219, 324, 51, 437, 513, 213, 825, 538, 313, 241, 58, 787, 807, 19, 790, 427, 725, 601, 44, 326, 286, 7, 442, 545, 800, 410, 546, 710, 135, 674, 582, 355, 761, 554, 539, 798, 127, 467, 733, 211, 831, 72, 194, 398, 359, 260, 12, 397, 49, 248, 240, 482, 676, 119, 254, 237, 70, 520, 22, 157, 566, 527, 2, 293, 172, 687, 60, 40, 726, 817, 415, 776, 85, 571, 315, 249, 16, 366, 399, 456, 529, 534, 232, 342, 694, 818, 411, 497, 799, 544, 341, 669, 567, 659, 500, 274, 216, 748, 344, 679, 458, 737, 153, 422, 519, 190, 686, 806, 636, 680, 731, 455, 788, 796, 80, 689, 823, 230, 300, 173, 132, 413, 417, 576, 536, 275, 563, 93, 312, 131, 281, 297, 298, 236, 218, 181, 379, 717, 339, 475, 660, 138, 235, 59, 695, 291, 507, 164, 732, 555, 69, 450, 485, 116, 643, 141, 374, 92, 577, 385, 201, 270, 600, 608, 656, 595, 6, 844, 451, 591, 583, 792, 256, 431, 130, 613, 561, 364, 626, 406, 664, 0, 115, 474, 587, 462, 308, 340, 749, 368, 38, 392, 503, 779, 565, 553, 652, 499, 387, 704, 183, 808, 684, 461, 438, 791, 276, 207, 122, 492, 510, 327, 468, 5, 304, 466, 517, 264, 535, 764, 526, 724, 515, 238, 557, 261, 330, 745, 346, 845, 729, 477, 435, 511, 200, 590, 736, 372, 56, 528, 55, 158, 336, 655, 168, 414, 118, 746, 208, 82, 624, 771, 421, 133, 206, 651, 670, 540, 258, 586, 295, 490, 617, 812, 148, 531, 589, 463, 102, 386, 320, 268, 353, 220, 1, 233, 170, 793, 167, 351, 620, 711, 96, 666, 110, 114, 265, 169, 245, 663, 134, 570, 826, 266, 94, 611, 161, 448, 159, 575, 650, 229, 573, 41, 632, 596, 39, 21, 188, 696, 316, 518, 512, 259, 155, 42, 609, 105, 409, 3, 719, 453, 23, 186, 599, 314, 123, 773, 722, 618, 572, 180, 100, 50, 369, 556, 393, 228, 640, 223, 404, 45, 820, 140, 31, 522, 702, 289, 401, 532, 727, 227, 67, 447, 111, 371, 610, 48, 440, 199, 204, 319, 631, 602, 321, 395, 375, 29, 606, 615, 432, 75, 145, 160, 97, 672, 558, 136, 24, 177, 154, 657, 834, 354, 504, 738, 198, 18, 74, 408, 226, 693, 443, 212, 822, 90, 349, 418, 478, 252, 762, 271, 524, 501, 816, 64, 139, 171, 390, 221, 646, 108, 360, 262, 365, 768, 34, 79, 378, 832, 496, 33, 370, 644, 728, 8, 605, 449, 441, 86, 634, 112, 15, 391, 604, 509, 716, 772, 612, 156, 88, 331, 394, 523, 621, 828, 810, 668, 53, 83, 283, 755, 54, 358, 30, 290, 179, 530, 700, 763, 352, 597, 36, 27, 182, 301, 357, 730, 65, 837, 701, 25, 403, 758, 87, 338, 345, 588, 4, 184, 543, 835, 692, 623, 14, 196, 17, 465, 13, 682, 794, 495, 642, 785, 471, 263, 525, 574, 560, 709, 547, 189, 821, 277, 638, 37, 734, 214, 20, 735, 819, 697, 178, 766, 494, 813, 814, 311, 175, 723, 757, 335, 396, 317, 337, 750, 647, 614, 294, 833, 107, 460, 713, 162, 402, 98, 767, 350, 284, 607, 251, 658, 648, 280, 439, 434, 124, 377, 9, 593, 325, 407, 579, 89, 662, 661, 129, 257, 343, 580, 430, 457, 549, 28, 616]\n",
"\n",
"UNSCALED predictions y^, sorted in DESCENDING way:\n",
" [[4.520821 ]\n",
" [4.3526335]\n",
" [4.332033 ]\n",
" [4.3269444]\n",
" [4.294288 ]\n",
" [4.274542 ]\n",
" [4.2387724]\n",
" [4.2352633]\n",
" [4.2253733]\n",
" [4.2187924]\n",
" [4.1973453]\n",
" [4.196561 ]\n",
" [4.18116 ]\n",
" [4.1781654]\n",
" [4.175674 ]\n",
" [4.1716084]\n",
" [4.1579013]\n",
" [4.1506577]\n",
" [4.13163 ]\n",
" [4.130816 ]\n",
" [4.1302857]\n",
" [4.1259737]\n",
" [4.120397 ]\n",
" [4.114049 ]\n",
" [4.111725 ]\n",
" [4.1086707]\n",
" [4.1076713]\n",
" [4.0909667]\n",
" [4.0896845]\n",
" [4.0861955]\n",
" [4.0828543]\n",
" [4.0777464]\n",
" [4.0767074]\n",
" [4.0685062]\n",
" [4.063311 ]\n",
" [4.06126 ]\n",
" [4.060198 ]\n",
" [4.055799 ]\n",
" [4.0538654]\n",
" [4.0493217]\n",
" [4.049248 ]\n",
" [4.046773 ]\n",
" [4.0357685]\n",
" [4.030957 ]\n",
" [4.029517 ]\n",
" [4.0274925]\n",
" [4.0256963]\n",
" [4.0246506]\n",
" [4.0241365]\n",
" [4.02365 ]\n",
" [4.023339 ]\n",
" [4.0219173]\n",
" [4.018738 ]\n",
" [4.0127788]\n",
" [4.0064964]\n",
" [4.0063944]\n",
" [3.9977314]\n",
" [3.997599 ]\n",
" [3.9963002]\n",
" [3.9934545]\n",
" [3.9885924]\n",
" [3.9873452]\n",
" [3.9869547]\n",
" [3.9860296]\n",
" [3.985028 ]\n",
" [3.9838858]\n",
" [3.9830863]\n",
" [3.980987 ]\n",
" [3.9799259]\n",
" [3.9730735]\n",
" [3.972077 ]\n",
" [3.9707043]\n",
" [3.9694135]\n",
" [3.9683084]\n",
" [3.9678898]\n",
" [3.9633696]\n",
" [3.9627445]\n",
" [3.9613037]\n",
" [3.9502451]\n",
" [3.9474206]\n",
" [3.947393 ]\n",
" [3.946128 ]\n",
" [3.9453936]\n",
" [3.9453337]\n",
" [3.9436204]\n",
" [3.940915 ]\n",
" [3.940053 ]\n",
" [3.9391418]\n",
" [3.9387786]\n",
" [3.9363618]\n",
" [3.9347227]\n",
" [3.931423 ]\n",
" [3.9280694]\n",
" [3.9264405]\n",
" [3.9260025]\n",
" [3.9218287]\n",
" [3.9194458]\n",
" [3.9167166]\n",
" [3.912341 ]\n",
" [3.910011 ]\n",
" [3.905148 ]\n",
" [3.9037392]\n",
" [3.9026308]\n",
" [3.9020524]\n",
" [3.8993447]\n",
" [3.8983898]\n",
" [3.89767 ]\n",
" [3.8976064]\n",
" [3.8962998]\n",
" [3.8960648]\n",
" [3.8958395]\n",
" [3.8938658]\n",
" [3.891298 ]\n",
" [3.890253 ]\n",
" [3.8901772]\n",
" [3.8895314]\n",
" [3.8894029]\n",
" [3.8876696]\n",
" [3.8873744]\n",
" [3.887227 ]\n",
" [3.8867252]\n",
" [3.8841748]\n",
" [3.8841233]\n",
" [3.882327 ]\n",
" [3.8752275]\n",
" [3.8743203]\n",
" [3.8727245]\n",
" [3.869627 ]\n",
" [3.8666272]\n",
" [3.8646393]\n",
" [3.863223 ]\n",
" [3.8601277]\n",
" [3.8589897]\n",
" [3.8588793]\n",
" [3.8587024]\n",
" [3.8580463]\n",
" [3.8569658]\n",
" [3.8547559]\n",
" [3.854102 ]\n",
" [3.852309 ]\n",
" [3.8510444]\n",
" [3.850939 ]\n",
" [3.8508573]\n",
" [3.8497057]\n",
" [3.8491626]\n",
" [3.8491538]\n",
" [3.8490915]\n",
" [3.8464203]\n",
" [3.8453343]\n",
" [3.845195 ]\n",
" [3.8414707]\n",
" [3.838554 ]\n",
" [3.83814 ]\n",
" [3.8350573]\n",
" [3.8332915]\n",
" [3.830314 ]\n",
" [3.8284802]\n",
" [3.8264244]\n",
" [3.826198 ]\n",
" [3.8258626]\n",
" [3.825408 ]\n",
" [3.823984 ]\n",
" [3.8238113]\n",
" [3.8236327]\n",
" [3.822183 ]\n",
" [3.8221397]\n",
" [3.8221352]\n",
" [3.8183885]\n",
" [3.8183482]\n",
" [3.8161035]\n",
" [3.8141875]\n",
" [3.8132849]\n",
" [3.810636 ]\n",
" [3.8092926]\n",
" [3.8072543]\n",
" [3.8054996]\n",
" [3.8035455]\n",
" [3.8031912]\n",
" [3.8027012]\n",
" [3.8019705]\n",
" [3.8015225]\n",
" [3.7990623]\n",
" [3.7977595]\n",
" [3.79775 ]\n",
" [3.7975771]\n",
" [3.7963006]\n",
" [3.7921822]\n",
" [3.791008 ]\n",
" [3.790154 ]\n",
" [3.789637 ]\n",
" [3.7873707]\n",
" [3.7855685]\n",
" [3.783267 ]\n",
" [3.7802148]\n",
" [3.7761626]\n",
" [3.7743309]\n",
" [3.7695866]\n",
" [3.7662287]\n",
" [3.766175 ]\n",
" [3.7647152]\n",
" [3.7632074]\n",
" [3.7626848]\n",
" [3.7615693]\n",
" [3.7587304]\n",
" [3.7582934]\n",
" [3.758292 ]\n",
" [3.75672 ]\n",
" [3.7563844]\n",
" [3.7563012]\n",
" [3.756258 ]\n",
" [3.755007 ]\n",
" [3.754918 ]\n",
" [3.7540374]\n",
" [3.7539902]\n",
" [3.753873 ]\n",
" [3.7536473]\n",
" [3.7536247]\n",
" [3.753468 ]\n",
" [3.7523444]\n",
" [3.75109 ]\n",
" [3.7502656]\n",
" [3.750068 ]\n",
" [3.7481256]\n",
" [3.7458887]\n",
" [3.7455597]\n",
" [3.7419596]\n",
" [3.741449 ]\n",
" [3.7387888]\n",
" [3.7368915]\n",
" [3.7364235]\n",
" [3.7357419]\n",
" [3.7354097]\n",
" [3.7353704]\n",
" [3.7351027]\n",
" [3.728754 ]\n",
" [3.7270775]\n",
" [3.7188926]\n",
" [3.7183664]\n",
" [3.717781 ]\n",
" [3.7168448]\n",
" [3.7150695]\n",
" [3.713875 ]\n",
" [3.7136772]\n",
" [3.7124743]\n",
" [3.7083786]\n",
" [3.7057443]\n",
" [3.7041874]\n",
" [3.7034314]\n",
" [3.7004423]\n",
" [3.700422 ]\n",
" [3.6971037]\n",
" [3.6950197]\n",
" [3.6941662]\n",
" [3.6918392]\n",
" [3.6910498]\n",
" [3.6901383]\n",
" [3.6883585]\n",
" [3.6879942]\n",
" [3.6878521]\n",
" [3.6855958]\n",
" [3.6848779]\n",
" [3.6844733]\n",
" [3.684379 ]\n",
" [3.683479 ]\n",
" [3.682289 ]\n",
" [3.681918 ]\n",
" [3.6791112]\n",
" [3.677717 ]\n",
" [3.6751869]\n",
" [3.672771 ]\n",
" [3.672771 ]\n",
" [3.672314 ]\n",
" [3.671944 ]\n",
" [3.6711032]\n",
" [3.6702316]\n",
" [3.66923 ]\n",
" [3.6675904]\n",
" [3.6668558]\n",
" [3.6668367]\n",
" [3.6659517]\n",
" [3.664831 ]\n",
" [3.6641996]\n",
" [3.6635642]\n",
" [3.6626222]\n",
" [3.6624548]\n",
" [3.66204 ]\n",
" [3.661615 ]\n",
" [3.6614149]\n",
" [3.657568 ]\n",
" [3.6556783]\n",
" [3.655218 ]\n",
" [3.6539855]\n",
" [3.6515052]\n",
" [3.6493068]\n",
" [3.648517 ]\n",
" [3.6483858]\n",
" [3.6464648]\n",
" [3.6457078]\n",
" [3.6428258]\n",
" [3.640536 ]\n",
" [3.639556 ]\n",
" [3.6389205]\n",
" [3.6361115]\n",
" [3.6358786]\n",
" [3.6347058]\n",
" [3.6308308]\n",
" [3.629764 ]\n",
" [3.6289346]\n",
" [3.6271317]\n",
" [3.6240761]\n",
" [3.6231186]\n",
" [3.622825 ]\n",
" [3.6226513]\n",
" [3.6212265]\n",
" [3.620967 ]\n",
" [3.6180854]\n",
" [3.6163943]\n",
" [3.6156042]\n",
" [3.6114717]\n",
" [3.6086886]\n",
" [3.607524 ]\n",
" [3.6069868]\n",
" [3.606602 ]\n",
" [3.6042297]\n",
" [3.601604 ]\n",
" [3.5997992]\n",
" [3.5997386]\n",
" [3.5997262]\n",
" [3.5982533]\n",
" [3.5977087]\n",
" [3.5959706]\n",
" [3.5912554]\n",
" [3.589594 ]\n",
" [3.5895593]\n",
" [3.5893717]\n",
" [3.5878499]\n",
" [3.5868886]\n",
" [3.586129 ]\n",
" [3.5844188]\n",
" [3.5824177]\n",
" [3.5770621]\n",
" [3.5763261]\n",
" [3.5753484]\n",
" [3.5700347]\n",
" [3.5696096]\n",
" [3.5688353]\n",
" [3.5687244]\n",
" [3.5664363]\n",
" [3.5633423]\n",
" [3.5600803]\n",
" [3.5569475]\n",
" [3.556798 ]\n",
" [3.553372 ]\n",
" [3.5521843]\n",
" [3.5512857]\n",
" [3.5510983]\n",
" [3.548401 ]\n",
" [3.5479023]\n",
" [3.5474617]\n",
" [3.546868 ]\n",
" [3.5434153]\n",
" [3.5425587]\n",
" [3.5421383]\n",
" [3.5391083]\n",
" [3.5381424]\n",
" [3.5364845]\n",
" [3.5360792]\n",
" [3.5356088]\n",
" [3.533208 ]\n",
" [3.5331104]\n",
" [3.5311673]\n",
" [3.5287175]\n",
" [3.5284834]\n",
" [3.5277765]\n",
" [3.5274963]\n",
" [3.5271685]\n",
" [3.527152 ]\n",
" [3.5254078]\n",
" [3.524228 ]\n",
" [3.5230818]\n",
" [3.5221605]\n",
" [3.5199492]\n",
" [3.5188367]\n",
" [3.5185525]\n",
" [3.5124958]\n",
" [3.5118246]\n",
" [3.5115724]\n",
" [3.5102067]\n",
" [3.507531 ]\n",
" [3.5073102]\n",
" [3.5058823]\n",
" [3.5056899]\n",
" [3.503522 ]\n",
" [3.501753 ]\n",
" [3.5007935]\n",
" [3.5006313]\n",
" [3.5005631]\n",
" [3.4988012]\n",
" [3.4954436]\n",
" [3.4936526]\n",
" [3.4914613]\n",
" [3.4879138]\n",
" [3.485554 ]\n",
" [3.483366 ]\n",
" [3.4810266]\n",
" [3.4748304]\n",
" [3.4740093]\n",
" [3.4727678]\n",
" [3.4678876]\n",
" [3.467232 ]\n",
" [3.4671443]\n",
" [3.466784 ]\n",
" [3.4627235]\n",
" [3.4613435]\n",
" [3.4582977]\n",
" [3.456813 ]\n",
" [3.45644 ]\n",
" [3.4561985]\n",
" [3.4537065]\n",
" [3.453392 ]\n",
" [3.453363 ]\n",
" [3.448248 ]\n",
" [3.448227 ]\n",
" [3.444355 ]\n",
" [3.4423018]\n",
" [3.440249 ]\n",
" [3.4393287]\n",
" [3.4388435]\n",
" [3.43754 ]\n",
" [3.4347217]\n",
" [3.4345264]\n",
" [3.4316914]\n",
" [3.431552 ]\n",
" [3.4313042]\n",
" [3.431292 ]\n",
" [3.426632 ]\n",
" [3.4253712]\n",
" [3.4253178]\n",
" [3.4249964]\n",
" [3.4183292]\n",
" [3.41789 ]\n",
" [3.417804 ]\n",
" [3.4172366]\n",
" [3.4141316]\n",
" [3.4128537]\n",
" [3.4119272]\n",
" [3.4090767]\n",
" [3.4088035]\n",
" [3.4058132]\n",
" [3.4040709]\n",
" [3.4037752]\n",
" [3.4017792]\n",
" [3.4013405]\n",
" [3.4000869]\n",
" [3.3995547]\n",
" [3.3990002]\n",
" [3.3988295]\n",
" [3.398637 ]\n",
" [3.394179 ]\n",
" [3.3935704]\n",
" [3.3915648]\n",
" [3.3912199]\n",
" [3.3894246]\n",
" [3.3872745]\n",
" [3.3848386]\n",
" [3.3822286]\n",
" [3.3812757]\n",
" [3.3800313]\n",
" [3.377747 ]\n",
" [3.3773115]\n",
" [3.3770132]\n",
" [3.3760521]\n",
" [3.3759272]\n",
" [3.375547 ]\n",
" [3.3753684]\n",
" [3.3722134]\n",
" [3.371961 ]\n",
" [3.3715703]\n",
" [3.370985 ]\n",
" [3.370687 ]\n",
" [3.365469 ]\n",
" [3.3641512]\n",
" [3.3636737]\n",
" [3.3602688]\n",
" [3.359964 ]\n",
" [3.3592062]\n",
" [3.3585815]\n",
" [3.3575191]\n",
" [3.3569334]\n",
" [3.356351 ]\n",
" [3.3562894]\n",
" [3.3562038]\n",
" [3.3560872]\n",
" [3.3557541]\n",
" [3.3542943]\n",
" [3.352324 ]\n",
" [3.3522375]\n",
" [3.3522265]\n",
" [3.3522127]\n",
" [3.3520985]\n",
" [3.3486633]\n",
" [3.3486152]\n",
" [3.3473604]\n",
" [3.3443758]\n",
" [3.3417647]\n",
" [3.3397837]\n",
" [3.3397815]\n",
" [3.3393583]\n",
" [3.3385193]\n",
" [3.3385034]\n",
" [3.3368976]\n",
" [3.3320863]\n",
" [3.3319523]\n",
" [3.3312914]\n",
" [3.331225 ]\n",
" [3.331003 ]\n",
" [3.3309972]\n",
" [3.3309898]\n",
" [3.3304615]\n",
" [3.326533 ]\n",
" [3.3237963]\n",
" [3.3233335]\n",
" [3.323092 ]\n",
" [3.3167117]\n",
" [3.3152144]\n",
" [3.3151114]\n",
" [3.3142154]\n",
" [3.3121421]\n",
" [3.3066702]\n",
" [3.3062146]\n",
" [3.303853 ]\n",
" [3.3030903]\n",
" [3.3017504]\n",
" [3.3006084]\n",
" [3.2999685]\n",
" [3.2986944]\n",
" [3.2976775]\n",
" [3.2964976]\n",
" [3.295895 ]\n",
" [3.2957573]\n",
" [3.2951908]\n",
" [3.2923412]\n",
" [3.2917373]\n",
" [3.2915683]\n",
" [3.2858973]\n",
" [3.2837114]\n",
" [3.2818575]\n",
" [3.2815952]\n",
" [3.280331 ]\n",
" [3.2783985]\n",
" [3.2752998]\n",
" [3.2734401]\n",
" [3.2714875]\n",
" [3.2678227]\n",
" [3.2654 ]\n",
" [3.2629154]\n",
" [3.2623346]\n",
" [3.2615294]\n",
" [3.261502 ]\n",
" [3.2598805]\n",
" [3.2580748]\n",
" [3.2568758]\n",
" [3.2550483]\n",
" [3.2547085]\n",
" [3.2543325]\n",
" [3.2516768]\n",
" [3.250217 ]\n",
" [3.2497327]\n",
" [3.2496552]\n",
" [3.2496378]\n",
" [3.2490332]\n",
" [3.2448418]\n",
" [3.2406929]\n",
" [3.2395015]\n",
" [3.239248 ]\n",
" [3.2390718]\n",
" [3.2276201]\n",
" [3.2227578]\n",
" [3.2217066]\n",
" [3.2208114]\n",
" [3.2206976]\n",
" [3.2185478]\n",
" [3.2175274]\n",
" [3.21741 ]\n",
" [3.215905 ]\n",
" [3.2151272]\n",
" [3.2151146]\n",
" [3.2150717]\n",
" [3.214353 ]\n",
" [3.2124004]\n",
" [3.2123008]\n",
" [3.2111673]\n",
" [3.2101593]\n",
" [3.206691 ]\n",
" [3.2026687]\n",
" [3.2026548]\n",
" [3.2007709]\n",
" [3.1997688]\n",
" [3.199403 ]\n",
" [3.1985497]\n",
" [3.1978478]\n",
" [3.197446 ]\n",
" [3.1954942]\n",
" [3.1934826]\n",
" [3.1928911]\n",
" [3.1900964]\n",
" [3.1871915]\n",
" [3.1811774]\n",
" [3.178562 ]\n",
" [3.1755004]\n",
" [3.1735594]\n",
" [3.1728945]\n",
" [3.169547 ]\n",
" [3.167243 ]\n",
" [3.1669295]\n",
" [3.1660824]\n",
" [3.1659513]\n",
" [3.164906 ]\n",
" [3.164497 ]\n",
" [3.1614704]\n",
" [3.1595712]\n",
" [3.1581316]\n",
" [3.1552708]\n",
" [3.1544304]\n",
" [3.1529028]\n",
" [3.148159 ]\n",
" [3.1479537]\n",
" [3.1459684]\n",
" [3.1422086]\n",
" [3.137998 ]\n",
" [3.1369293]\n",
" [3.1362145]\n",
" [3.1360586]\n",
" [3.133715 ]\n",
" [3.1332748]\n",
" [3.1324682]\n",
" [3.1307843]\n",
" [3.130145 ]\n",
" [3.1300101]\n",
" [3.129267 ]\n",
" [3.1282744]\n",
" [3.1267204]\n",
" [3.1266656]\n",
" [3.1240056]\n",
" [3.1212623]\n",
" [3.1208658]\n",
" [3.1143823]\n",
" [3.1128042]\n",
" [3.1103473]\n",
" [3.1089778]\n",
" [3.1062849]\n",
" [3.1062045]\n",
" [3.105783 ]\n",
" [3.1048224]\n",
" [3.1043098]\n",
" [3.1037962]\n",
" [3.1034667]\n",
" [3.1029966]\n",
" [3.1027431]\n",
" [3.1010256]\n",
" [3.0994887]\n",
" [3.098975 ]\n",
" [3.0970602]\n",
" [3.0966039]\n",
" [3.0956476]\n",
" [3.0933673]\n",
" [3.0880148]\n",
" [3.080049 ]\n",
" [3.0779386]\n",
" [3.0756278]\n",
" [3.0707138]\n",
" [3.0670824]\n",
" [3.053687 ]\n",
" [3.052639 ]\n",
" [3.0513768]\n",
" [3.0449255]\n",
" [3.0413196]\n",
" [3.0402198]\n",
" [3.0367684]\n",
" [3.036332 ]\n",
" [3.0304322]\n",
" [3.028099 ]\n",
" [3.0262036]\n",
" [3.025929 ]\n",
" [3.0258193]\n",
" [3.021489 ]\n",
" [3.0150995]\n",
" [3.0120177]\n",
" [3.0061603]\n",
" [3.0023756]\n",
" [3.0020638]\n",
" [3.0005748]\n",
" [2.9982178]\n",
" [2.9981656]\n",
" [2.9969645]\n",
" [2.996848 ]\n",
" [2.9949226]\n",
" [2.9949222]\n",
" [2.9949045]\n",
" [2.994885 ]\n",
" [2.993776 ]\n",
" [2.9927406]\n",
" [2.9922206]\n",
" [2.9895105]\n",
" [2.9820132]\n",
" [2.9812298]\n",
" [2.9803925]\n",
" [2.9749563]\n",
" [2.9621365]\n",
" [2.9557693]\n",
" [2.9516983]\n",
" [2.9505053]\n",
" [2.950485 ]\n",
" [2.9504294]\n",
" [2.9472122]\n",
" [2.9456592]\n",
" [2.9450002]\n",
" [2.9445148]\n",
" [2.9368682]\n",
" [2.9348989]\n",
" [2.933586 ]\n",
" [2.9294858]\n",
" [2.9274023]\n",
" [2.9272645]\n",
" [2.9254858]\n",
" [2.9130788]\n",
" [2.909559 ]\n",
" [2.9089494]\n",
" [2.9065766]\n",
" [2.8973398]\n",
" [2.896379 ]\n",
" [2.895289 ]\n",
" [2.8943915]\n",
" [2.892326 ]\n",
" [2.8817203]\n",
" [2.869389 ]\n",
" [2.8548818]\n",
" [2.85002 ]\n",
" [2.845116 ]\n",
" [2.8421295]\n",
" [2.8390074]\n",
" [2.8387945]\n",
" [2.837181 ]\n",
" [2.8353884]\n",
" [2.8307767]\n",
" [2.8263333]\n",
" [2.8083706]\n",
" [2.7986035]\n",
" [2.785888 ]\n",
" [2.7766073]\n",
" [2.7721667]\n",
" [2.7577717]\n",
" [2.7550907]\n",
" [2.7548707]\n",
" [2.7507787]\n",
" [2.7336214]\n",
" [2.7258072]\n",
" [2.7205925]\n",
" [2.7205715]\n",
" [2.7166212]\n",
" [2.698691 ]\n",
" [2.6937573]\n",
" [2.6930623]\n",
" [2.6865757]\n",
" [2.6831324]\n",
" [2.6827147]\n",
" [2.680496 ]\n",
" [2.6753476]\n",
" [2.6639264]\n",
" [2.6622434]\n",
" [2.6619878]\n",
" [2.6511734]\n",
" [2.6507695]\n",
" [2.6484559]\n",
" [2.6405838]\n",
" [2.632195 ]\n",
" [2.616626 ]\n",
" [2.6143005]\n",
" [2.6071675]\n",
" [2.5984168]\n",
" [2.59621 ]\n",
" [2.5941417]\n",
" [2.5903933]\n",
" [2.5875957]\n",
" [2.581897 ]\n",
" [2.577505 ]\n",
" [2.5726297]\n",
" [2.563754 ]\n",
" [2.5532749]\n",
" [2.5420365]\n",
" [2.5416348]\n",
" [2.5295515]\n",
" [2.5279195]\n",
" [2.5182545]\n",
" [2.5135386]\n",
" [2.4929137]\n",
" [2.4861045]\n",
" [2.4829078]\n",
" [2.4667675]\n",
" [2.4558377]\n",
" [2.4555817]\n",
" [2.4535222]\n",
" [2.44599 ]\n",
" [2.422094 ]\n",
" [2.4159842]\n",
" [2.4146128]\n",
" [2.4058738]\n",
" [2.3996449]\n",
" [2.3801227]\n",
" [2.3694572]\n",
" [2.3646634]\n",
" [2.329875 ]\n",
" [2.3047626]\n",
" [2.2951803]\n",
" [2.2928665]\n",
" [2.291357 ]\n",
" [2.2833285]\n",
" [2.263295 ]\n",
" [2.258167 ]\n",
" [2.252688 ]\n",
" [2.2277029]\n",
" [2.220504 ]\n",
" [2.2079935]\n",
" [2.1857708]\n",
" [2.1842527]\n",
" [2.164769 ]\n",
" [2.1378274]\n",
" [2.1236062]\n",
" [2.0792027]\n",
" [2.0754855]\n",
" [2.0737507]\n",
" [2.0438292]\n",
" [2.0259166]\n",
" [2.0259087]\n",
" [1.9996274]\n",
" [1.9932872]\n",
" [1.9416091]\n",
" [1.9331213]\n",
" [1.9047036]\n",
" [1.811736 ]\n",
" [1.7325655]\n",
" [1.7224923]\n",
" [1.7072055]\n",
" [1.683081 ]\n",
" [1.6795535]\n",
" [1.6130847]\n",
" [1.5109389]] (847, 1)\n",
"\n",
"MOVIES/vectors dataset (item_vecs), sorted in DESCENDING way:\n",
" [[8.09060000e+04 2.01000000e+03 4.29166667e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.77600000e+03 2.00100000e+03 3.79411765e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]\n",
" [1.50548000e+05 2.01600000e+03 3.85000000e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]\n",
" ...\n",
" [6.39920000e+04 2.00800000e+03 2.40909091e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]\n",
" [4.38600000e+03 2.00100000e+03 2.81818182e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [7.87720000e+04 2.01000000e+03 2.45000000e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]] (847, 17)\n",
"\n",
"USER/vectors dataset (user_vecs), sorted in DESCENDING way:\n",
" [[ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" ...\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]\n",
" [ 2. 22. 4. ... 0. 3.88 3.89]] (847, 17)\n",
"\n",
"RATINGS/labels y, given by USER 2, each of 847 MOVIES, sorted in DESCENDING way:\n",
" [5. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 3.5 0.\n",
" 0. 4.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 3.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0.\n",
" 0. 0. 0. 4.5 0. 4. 3.5 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 3.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 4. 0. 4.5 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 4. 3. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 5. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 5. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 2.5 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 3. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.\n",
" 0. ] (847,)\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th style=\"text-align: right;\"> y_p</th><th style=\"text-align: right;\"> y</th><th style=\"text-align: right;\"> user</th><th>user genre ave </th><th style=\"text-align: right;\"> movie rating ave</th><th style=\"text-align: right;\"> movie id</th><th>title </th><th>genres </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td style=\"text-align: right;\"> 4.5</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 4.3</td><td style=\"text-align: right;\"> 80906</td><td>Inside Job (2010) </td><td>Documentary </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.2</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 99114</td><td>Django Unchained (2012) </td><td>Action|Drama </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\">4.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\"> 68157</td><td>Inglourious Basterds (2009) </td><td>Action|Drama </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,3.9,3.9] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 115713</td><td>Ex Machina (2015) </td><td>Drama|Sci-Fi|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0,4.0,3.9,3.9]</td><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\"> 79132</td><td>Inception (2010) </td><td>Action|Crime|Drama|Mystery|Sci-Fi|Thriller</td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.1,4.0,3.9] </td><td style=\"text-align: right;\"> 4.3</td><td style=\"text-align: right;\"> 48516</td><td>Departed, The (2006) </td><td>Crime|Drama|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0] </td><td style=\"text-align: right;\"> 4.2</td><td style=\"text-align: right;\"> 58559</td><td>Dark Knight, The (2008) </td><td>Action|Crime|Drama </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 6874</td><td>Kill Bill: Vol. 1 (2003) </td><td>Action|Crime|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0,3.9] </td><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\"> 8798</td><td>Collateral (2004) </td><td>Action|Crime|Drama|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 106782</td><td>Wolf of Wall Street, The (2013) </td><td>Comedy|Crime|Drama </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.2,4.1] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 91529</td><td>Dark Knight Rises, The (2012) </td><td>Action|Adventure|Crime </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0,3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 74458</td><td>Shutter Island (2010) </td><td>Drama|Mystery|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">4.5</td><td style=\"text-align: right;\"> 2</td><td>[4.1,4.0,3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 80489</td><td>Town, The (2010) </td><td>Crime|Drama|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 112552</td><td>Whiplash (2014) </td><td>Drama </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\">3.0</td><td style=\"text-align: right;\"> 2</td><td>[3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 109487</td><td>Interstellar (2014) </td><td>Sci-Fi </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 3.7</td><td style=\"text-align: right;\"> 89774</td><td>Warrior (2011) </td><td>Drama </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.7</td><td style=\"text-align: right;\">3.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0,3.0] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 71535</td><td>Zombieland (2009) </td><td>Action|Comedy|Horror </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.7</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.2,3.9,3.9] </td><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\"> 122882</td><td>Mad Max: Fury Road (2015) </td><td>Action|Adventure|Sci-Fi|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.5</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 3.6</td><td style=\"text-align: right;\"> 60756</td><td>Step Brothers (2008) </td><td>Comedy </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.5</td><td style=\"text-align: right;\">2.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,3.9] </td><td style=\"text-align: right;\"> 3.5</td><td style=\"text-align: right;\"> 91658</td><td>Girl with the Dragon Tattoo, The (2011) </td><td>Drama|Thriller </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.1</td><td style=\"text-align: right;\">3.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 77455</td><td>Exit Through the Gift Shop (2010) </td><td>Comedy|Documentary </td></tr>\n",
"<tr><td style=\"text-align: right;\"> 3.1</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 3.2</td><td style=\"text-align: right;\"> 46970</td><td>Talladega Nights: The Ballad of Ricky Bobby (2006)</td><td>Action|Comedy </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"'<table>\\n<thead>\\n<tr><th style=\"text-align: right;\"> y_p</th><th style=\"text-align: right;\"> y</th><th style=\"text-align: right;\"> user</th><th>user genre ave </th><th style=\"text-align: right;\"> movie rating ave</th><th style=\"text-align: right;\"> movie id</th><th>title </th><th>genres </th></tr>\\n</thead>\\n<tbody>\\n<tr><td style=\"text-align: right;\"> 4.5</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 4.3</td><td style=\"text-align: right;\"> 80906</td><td>Inside Job (2010) </td><td>Documentary </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.2</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 99114</td><td>Django Unchained (2012) </td><td>Action|Drama </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\">4.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\"> 68157</td><td>Inglourious Basterds (2009) </td><td>Action|Drama </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,3.9,3.9] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 115713</td><td>Ex Machina (2015) </td><td>Drama|Sci-Fi|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0,4.0,3.9,3.9]</td><td style=\"text-align: right;\"> 4.1</td><td style=\"text-align: right;\"> 79132</td><td>Inception (2010) </td><td>Action|Crime|Drama|Mystery|Sci-Fi|Thriller</td></tr>\\n<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.1,4.0,3.9] </td><td style=\"text-align: right;\"> 4.3</td><td style=\"text-align: right;\"> 48516</td><td>Departed, The (2006) </td><td>Crime|Drama|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0] </td><td style=\"text-align: right;\"> 4.2</td><td style=\"text-align: right;\"> 58559</td><td>Dark Knight, The (2008) </td><td>Action|Crime|Drama </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 6874</td><td>Kill Bill: Vol. 1 (2003) </td><td>Action|Crime|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0,3.9] </td><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\"> 8798</td><td>Collateral (2004) </td><td>Action|Crime|Drama|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.1,4.0] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 106782</td><td>Wolf of Wall Street, The (2013) </td><td>Comedy|Crime|Drama </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">3.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.2,4.1] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 91529</td><td>Dark Knight Rises, The (2012) </td><td>Action|Adventure|Crime </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0,3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 74458</td><td>Shutter Island (2010) </td><td>Drama|Mystery|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\">4.5</td><td style=\"text-align: right;\"> 2</td><td>[4.1,4.0,3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 80489</td><td>Town, The (2010) </td><td>Crime|Drama|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 112552</td><td>Whiplash (2014) </td><td>Drama </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\">3.0</td><td style=\"text-align: right;\"> 2</td><td>[3.9] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 109487</td><td>Interstellar (2014) </td><td>Sci-Fi </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 3.7</td><td style=\"text-align: right;\"> 89774</td><td>Warrior (2011) </td><td>Drama </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.7</td><td style=\"text-align: right;\">3.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0,3.0] </td><td style=\"text-align: right;\"> 3.9</td><td style=\"text-align: right;\"> 71535</td><td>Zombieland (2009) </td><td>Action|Comedy|Horror </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.7</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.2,3.9,3.9] </td><td style=\"text-align: right;\"> 3.8</td><td style=\"text-align: right;\"> 122882</td><td>Mad Max: Fury Road (2015) </td><td>Action|Adventure|Sci-Fi|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.5</td><td style=\"text-align: right;\">5.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0] </td><td style=\"text-align: right;\"> 3.6</td><td style=\"text-align: right;\"> 60756</td><td>Step Brothers (2008) </td><td>Comedy </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.5</td><td style=\"text-align: right;\">2.5</td><td style=\"text-align: right;\"> 2</td><td>[4.0,3.9] </td><td style=\"text-align: right;\"> 3.5</td><td style=\"text-align: right;\"> 91658</td><td>Girl with the Dragon Tattoo, The (2011) </td><td>Drama|Thriller </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.1</td><td style=\"text-align: right;\">3.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 4.0</td><td style=\"text-align: right;\"> 77455</td><td>Exit Through the Gift Shop (2010) </td><td>Comedy|Documentary </td></tr>\\n<tr><td style=\"text-align: right;\"> 3.1</td><td style=\"text-align: right;\">4.0</td><td style=\"text-align: right;\"> 2</td><td>[4.0,4.0] </td><td style=\"text-align: right;\"> 3.2</td><td style=\"text-align: right;\"> 46970</td><td>Talladega Nights: The Ballad of Ricky Bobby (2006)</td><td>Action|Comedy </td></tr>\\n</tbody>\\n</table>'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# USER ID = 2\n",
"uid = 2 \n",
"\n",
"# Form a set of USER 2 vectors. This is the SAME USER 2 'user_vec' (1,17), \n",
"# repeated n = 847 times, to be matched with 847 MOVIES vectors in 'item_vecs'\n",
"# user_vecs= np.repeat(user_vec,len(item_vecs),axis=0) -> \n",
"# len(item_vecs) = 847 MOVIES/vectors, each MOVIE/vector has 17 MOVIE features\n",
"# axis = 0 -> Repeats 'user_vec' array n=847 times (1 per MOVIE) \n",
"# along x axis (rows)\n",
"user_vecs, y_vecs = get_user_vecs(uid, user_train_unscaled, item_vecs, user_to_genre)\n",
"\n",
"# [[ 2. 22. 4. ... 0. 3.88 3.89] ->[user_vec]1\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] ->[user_vec]2\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] ->[user_vec]3\n",
"# ...\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] ->[user_vec]845\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] ->[user_vec]846\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89]]->[user_vec]847 \n",
"# (847, 17)\n",
"print('EXISTING USER 2 [vector] is repeated * 847 times -> (2D array)\\n', user_vecs, user_vecs.shape)\n",
"\n",
"# [0.0,..., 4.0,..., 3.5,..., 4.0,...,4.5,...,3.5,...,3.0,...,\n",
"# 2.5,...,...,3.5,...,5.0,...,4.0,...,3.5,...,5.0,...,0.0] (847,) \n",
"print('EXISTING USER 2, 847 Ratings/labels (One per MOVIE)\\n',y_vecs, y_vecs.shape)\n",
"\n",
"# [[4.05400000e+03 2.00100000e+03 2.84375000e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] ->[movie 1_vec]\n",
"# [4.06900000e+03 2.00100000e+03 2.90909091e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] ->[movie 2_vec]\n",
"# [4.14800000e+03 2.00100000e+03 2.93589744e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00] ->[movie 3_vec]\n",
"# ...\n",
"# [1.77765000e+05 2.01700000e+03 3.53846154e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] ->[movie 845_vec]\n",
"# [1.79819000e+05 2.01700000e+03 3.12500000e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00] ->[movie 846_vec]\n",
"# [1.87593000e+05 2.01800000e+03 3.87500000e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00]]->[movie 847_vec] \n",
"# (847, 17)\n",
"print('\\n847 MOVIES data set for TRAIN or TEST (2D array)\\n', item_vecs, item_vecs.shape)\n",
"\n",
"# SCALE 'user_vecs' 2D array (847, 17), with 'scalerUser' (obj) \n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = 1).\n",
"suser_vecs = scalerUser.transform(user_vecs)\n",
"\n",
"# [[-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
"# 0.60679525] ->[scaled_user_vec]1\n",
"# [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
"# 0.60679525] ->[scaled_user_vec]2\n",
"# [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
"# 0.60679525] ->[scaled_user_vec]3\n",
"# ...\n",
"# [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
"# 0.60679525] ->[scaled_user_vec]845\n",
"# [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
"# 0.60679525] ->[scaled_user_vec]846\n",
"# [-1.79251238 -1.09062531 0.86345827 ... -4.86517161 0.62123083\n",
"# 0.60679525]] ->[scaled_user_vec]847 \n",
"# (847, 17)\n",
"print('\\nSCALED user_vecs 2D array \\n', suser_vecs, suser_vecs.shape)\n",
"\n",
"# SCALE 'item_vecs' 2D array (847, 17) with 'scalerItem' (obj)\n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = 1).\n",
"sitem_vecs = scalerItem.transform(item_vecs)\n",
"\n",
"# [[-0.9488947 -1.19927702 -1.98317861 ... 2.50507064 -0.50822265\n",
"# -0.65976722] ->[scaled_item_vec]1\n",
"# [-0.94849765 -1.19927702 -1.81511436 ... 2.50507064 -0.50822265\n",
"# -0.65976722] ->[scaled_item_vec]2\n",
"# [-0.94640649 -1.19927702 -1.74616492 ... -0.39919034 -0.50822265\n",
"# 1.51568608] ->[scaled_item_vec]3\n",
"# ...\n",
"# [ 3.64928572 2.816088 -0.19630151 ... -0.39919034 -0.50822265\n",
"# -0.65976722] ->[scaled_item_vec]845\n",
"# [ 3.7036557 2.816088 -1.25977162 ... -0.39919034 1.96764154\n",
"# -0.65976722] ->[scaled_item_vec]846\n",
"# [ 3.90943572 3.06704831 0.6693137 ... -0.39919034 1.96764154\n",
"# -0.65976722]]->[scaled_item_vec]847 \n",
"# (847, 17)\n",
"print('\\nSCALED item_vecs 2D array \\n', sitem_vecs, sitem_vecs.shape)\n",
"\n",
"# Make a y^ SCALED prediction. \n",
"# We defined OVERALL model, as a model with a list of multiple inputs \n",
"# inputs = [ x_u^(j)[All rows, from feature 3] , x_m^(i)[All rows, from feature 1] ]\n",
"# 0,1,2 cols NO 0 col NO\n",
"y_p = model.predict([suser_vecs[:, u_s:], sitem_vecs[:, i_s:]])\n",
"\n",
"# [[0.28323317]\n",
"# [0.22770423]\n",
"# [0.36427093]] ... (847, 1)\n",
"print('\\nSCALED y^ predictions\\n',y_p[:3],'...', y_p.shape)\n",
"\n",
"# UNSCALE y^ prediction, to get back predicted RATINGS between [0-5]\n",
"# with 'scalerTarget' (obj) and '.inverse_transform(scaled prediction)' \n",
"y_pu = scalerTarget.inverse_transform(y_p)\n",
"\n",
"# [[3.3872745]\n",
"# [3.2623346]\n",
"# [3.5696096]] ... (847, 1) \n",
"# min: 1.5109389 \n",
"# max: 4.520821\n",
"print('\\nUNSCALED y^ predictions\\n',y_pu[:3],'...', y_pu.shape)\n",
"print('\\nmin:',np.min(y_pu),'\\nmax:',np.max(y_pu))\n",
"\n",
"# Returns the INDICES that would sort the 'y_pu' 2D array along \n",
"# the 0 axis/cols (vertically) in ASCENDING order.\n",
"# The negative sign '-y_pu' reverses the order, so 'argsort()' \n",
"# returns the indices to sort 'y_pu' in DESCENDING order.\n",
"# Sort the results, HIGHEST prediction first.\n",
"# (-) negate 'y_pu' to get LARGEST rating 1st.\n",
"\n",
"# .reshape(-1) = .reshape(-1,) -> All rows (-1,) \n",
"# can be re-organized into ALL NEEDED columns (,empty).\n",
"\n",
"# .tolist() -> Converts 'numpy' array into 'pandas' list\n",
"sorted_index = np.argsort(-y_pu,axis=0).reshape(-1).tolist()\n",
"\n",
"# [630, 43, 830, ... , 549, 28, 616] (847,) \n",
"print('\\nIndices of UNSCALED predictions y^, sorted in DESCENDING order:\\n',sorted_index)\n",
"\n",
"# Select the predictions/RATINGS, related with 'sorted indices' in DESCENDING way\n",
"# as -> unscaled_predictions[sorted_indices]\n",
"sorted_ypu = y_pu[sorted_index]\n",
"\n",
"# [[4.520821 ]\n",
"# [4.3526335]\n",
"# [4.332033 ]\n",
"# ...\n",
"# [1.6795535]\n",
"# [1.6130847]\n",
"# [1.5109389]] (847, 1)\n",
"print('\\nUNSCALED predictions y^, sorted in DESCENDING way:\\n',sorted_ypu, sorted_ypu.shape )\n",
"\n",
"# Select MOVIES rows/vectors, with 'indices' in DESCENDING way\n",
"# as -> item_vecs[sorted_indices]\n",
"# Use unscaled vectors to get 'movie_id' (i.e 80906, 47760,...) for display\n",
"sorted_items = item_vecs[sorted_index]\n",
"\n",
"# [[8.09060000e+04 2.01000000e+03 4.29166667e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]1\n",
"# [4.77600000e+03 2.00100000e+03 3.79411765e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00] -> [non-scaled_item_vec]2\n",
"# [1.50548000e+05 2.01600000e+03 3.85000000e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00] -> [non-scaled_item_vec]3\n",
"# ...\n",
"# [6.39920000e+04 2.00800000e+03 2.40909091e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00] -> [non-scaled_item_vec]845\n",
"# [4.38600000e+03 2.00100000e+03 2.81818182e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00] -> [non-scaled_item_vec]846\n",
"# [7.87720000e+04 2.01000000e+03 2.45000000e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00]] -> [non-scaled_item_vec]847 \n",
"# (847, 17)\n",
"print('\\nMOVIES/vectors dataset (item_vecs), sorted in DESCENDING way:\\n',sorted_items,sorted_items.shape )\n",
"\n",
"# Select USER rows/vectors, with 'indices' in DESCENDING way\n",
"# as -> user_vecs[sorted_indices]\n",
"sorted_user = user_vecs[sorted_index]\n",
"\n",
"# [[ 2. 22. 4. ... 0. 3.88 3.89] -> [non-scaled_user_vec]1st higher\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] -> [non-scaled_user_vec]2nd higher\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] -> [non-scaled_user_vec]3rd higher\n",
"# ...\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] -> [non-scaled_user_vec]845th higher\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89] -> [non-scaled_user_vec]846th higher\n",
"# [ 2. 22. 4. ... 0. 3.88 3.89]] -> [non-scaled_user_vec]847th higher\n",
"# (847, 17)\n",
"print('\\nUSER/vectors dataset (user_vecs), sorted in DESCENDING way:\\n',sorted_user,sorted_user.shape )\n",
"\n",
"# Select the RATINGS/labels, given by USER 2 each of 847 MOVIES,\n",
"# related with 'sorted indices' in DESCENDING way\n",
"# as -> ratings[sorted_indices]\n",
"sorted_y = y_vecs[sorted_index]\n",
"\n",
"# [5.0, 0.0,..., 3.5,..., 4.5,..., 2.5,..., 0.0] (847,) \n",
"print('\\nRATINGS/labels y, given by USER 2, each of 847 MOVIES, sorted in DESCENDING way:\\n',sorted_y, sorted_y.shape )\n",
"\n",
"#def print_existing_user(y_p, y, user, items, ivs, uvs, movie_dict, maxcount=10):\n",
"# \"\"\" print results of prediction for a user who was in the database.\n",
"# Inputs are expected to be in sorted order, unscaled.\n",
"# \"\"\"\n",
"# count = 0\n",
"# disp = [[\"y_p\", \"y\", \"user\", \"user genre ave\", \"movie rating ave\", \"movie id\", \"title\", \"genres\"]]\n",
"\n",
"# for i in range(y.shape[0]):\n",
"\n",
"# # Display just RATED MOVIES -> y!=0\n",
"# # zero means not rated\n",
"# if y[i, 0] != 0:\n",
"\n",
"# # If count = maxcount -> True\n",
"# if count == maxcount:\n",
"\n",
"# # stop\n",
"# break\n",
"\n",
"# # count = count + 1\n",
"# # (init 0)\n",
"# count += 1\n",
"\n",
"# # Pick feature / col0 -> 'movie_id' per row/vector, at 'sorted_items'\n",
"# # Display as integer 8.09060000e+04 -> 80906\n",
"# movie_id = items[i, 0].astype(int)\n",
"\n",
"# offsets = np.nonzero(items[i, ivs:] == 1)[0]\n",
"# genre_ratings = user[i, uvs + offsets]\n",
"\n",
"# # Feature / col 1 = 'sorted y^ unscaled prediction'\n",
"# x1 = y_p[i, 0]\n",
"\n",
"# # Feature / col 2 = 'sorted y unscaled label/rating'\n",
"# x2 = y[i, 0]\n",
"\n",
"# # Feature / col 3 = 'sorted user id', displayed as integer\n",
"# x3 = user[i, 0].astype(int)\n",
"\n",
"# # Feature / col 4 = 'movie genre avg' \n",
"# # -> [genre 1 avg, genre 2 avg,... ] -> documentary|action\n",
"# x4 =np.array2string(genre_ratings,\n",
"# formatter={'float_kind':lambda x: \"%.1f\" % x},\n",
"# separator=',', suppress_small=True)\n",
"\n",
"# # Feature / col 5 = 'movie average rating'\n",
"# x5 = items[i, 2].astype(float)\n",
"\n",
"# # Feature / col 6 = 'movie_id'\n",
"# x6 = movie_id\n",
"\n",
"# # Feature / col 7 = 'movie title'\n",
"# x7 = movie_dict[movie_id]['title']\n",
"\n",
"# # Feature / col 8 = 'movie genres'\n",
"# x8 = movie_dict[movie_id]['genres']\n",
"# disp.append([x1, x2, x3, x4, x5, x6, x7, x8])\n",
"# col values format -> col1 col2 col3 col4 col5 \n",
"# table = tabulate.tabulate(disp, tablefmt='html', headers=\"firstrow\", floatfmt=[\".1f\", \".1f\", \".0f\", \".2f\", \".1f\"])\n",
"# return table\n",
" \n",
"#Print sorted predictions for movies rated by the user\n",
"print_existing_user(sorted_ypu, sorted_y.reshape(-1,1), sorted_user, sorted_items, ivs, uvs, movie_dict, maxcount = 50)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The model prediction is generally within 1 of the actual rating though it is not a very accurate predictor of how a user rates specific movies. This is especially true if the user rating is significantly different than the user's genre average. You can vary the user id above to try different users. Not all user id's were used in the training set."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"5.3\"></a>\n",
"### 5.3 - Finding Similar Items\n",
"The neural network above produces two feature vectors, a user feature vector $v_u$, and a movie feature vector, $v_m$. These are 32 entry vectors whose values are difficult to interpret. However, similar items will have similar vectors. This information can be used to make recommendations. For example, if a user has rated \"Toy Story 3\" highly, one could recommend similar movies by selecting movies with similar movie feature vectors.\n",
"\n",
"A similarity measure is the squared distance between the two vectors $ \\mathbf{v_m^{(k)}}$ and $\\mathbf{v_m^{(i)}}$ :\n",
"$$\\left\\Vert \\mathbf{v_m^{(k)}} - \\mathbf{v_m^{(i)}} \\right\\Vert^2 = \\sum_{l=1}^{n}(v_{m_l}^{(k)} - v_{m_l}^{(i)})^2\\tag{1}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"ex02\"></a>\n",
"### Exercise 2\n",
"\n",
"Write a function to compute the square distance."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"deletable": false
},
"outputs": [],
"source": [
"# EXERCISE 2\n",
"# GRADED_FUNCTION: sq_dist\n",
"# UNQ_C2\n",
"def sq_dist(a,b):\n",
" \"\"\"\n",
" Returns the squared distance between two vectors\n",
" Args:\n",
" a (ndarray (n,)): vector with n features\n",
" b (ndarray (n,)): vector with n features\n",
" Returns:\n",
" d (float) : distance\n",
" \"\"\"\n",
" ### START CODE HERE ### \n",
" \n",
" d = np.sum(np.square(a - b)) # ||v_m^(k) - v_m(i)||^2 = SUM 1->n (v_m^(k) - v_m(i))^2\n",
" \n",
" ### END CODE HERE ### \n",
" return d"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"squared distance between a1 and b1: 0.000\n",
"squared distance between a2 and b2: 0.030\n",
"squared distance between a3 and b3: 2.000\n"
]
}
],
"source": [
"# V_m^(k) = a1 = [1.0, 2.0, 3.0] V_m^(i) = b1 = [1.0, 2.0, 3.0] \n",
"a1 = np.array([1.0, 2.0, 3.0]); b1 = np.array([1.0, 2.0, 3.0])\n",
"\n",
"# V_m^(k) = a2 = [1.1, 2.1, 3.1] V_m^(i) = b2 = [1.0, 2.0, 3.0] \n",
"a2 = np.array([1.1, 2.1, 3.1]); b2 = np.array([1.0, 2.0, 3.0])\n",
"\n",
"# V_m^(k) = a3 = [0, 1, 0] V_m^(i) = b3 = [1, 0, 0]\n",
"a3 = np.array([0, 1, 0]); b3 = np.array([1, 0, 0])\n",
"\n",
"# Squared Distance / Error between v_m^(k) = a1 and v_m(i) = b1 \n",
"print(f\"squared distance between a1 and b1: {sq_dist(a1, b1):0.3f}\")\n",
"\n",
"# Squared Distance / Error between v_m^(k) = a2 and v_m(i) = b2\n",
"print(f\"squared distance between a2 and b2: {sq_dist(a2, b2):0.3f}\")\n",
"\n",
"# Squared Distance / Error between v_m^(k) = a3 and v_m(i) = b3\n",
"print(f\"squared distance between a3 and b3: {sq_dist(a3, b3):0.3f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Expected Output**:\n",
"\n",
"squared distance between a1 and b1: 0.000 \n",
"squared distance between a2 and b2: 0.030 \n",
"squared distance between a3 and b3: 2.000"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[92mAll tests passed!\n"
]
}
],
"source": [
"# TEST Squared Distance/Error function\n",
"# ||v_m^(k) - v_m^(i)||2 = Σ 1 -> n (v_m^(k) - v_m^(i))2\n",
"test_sq_dist(sq_dist)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary><font size=\"3\" color=\"darkgreen\"><b>Click for hints</b></font></summary>\n",
" \n",
" While a summation is often an indication a for loop should be used, here the subtraction can be element-wise in one statement. Further, you can utilized np.square to square, element-wise, the result of the subtraction. np.sum can be used to sum the squared elements.\n",
" \n",
"</details>\n",
"\n",
" \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A matrix of distances between movies can be computed once when the model is trained and then reused for new recommendations without retraining. The first step, once a model is trained, is to obtain the movie feature vector, $v_m$, for each of the movies. To do this, we will use the trained `item_NN` and build a small model to allow us to run the movie vectors through it to generate $v_m$."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Model: \"model_1\"\n",
"__________________________________________________________________________________________________\n",
"Layer (type) Output Shape Param # Connected to \n",
"==================================================================================================\n",
"input_3 (InputLayer) [(None, 16)] 0 \n",
"__________________________________________________________________________________________________\n",
"sequential_1 (Sequential) (None, 32) 41376 input_3[0][0] \n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_2/Squa [(None, 32)] 0 sequential_1[1][0] \n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_2/Sum [(None, 1)] 0 tf_op_layer_l2_normalize_2/Square\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_2/Maxi [(None, 1)] 0 tf_op_layer_l2_normalize_2/Sum[0]\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_2/Rsqr [(None, 1)] 0 tf_op_layer_l2_normalize_2/Maximu\n",
"__________________________________________________________________________________________________\n",
"tf_op_layer_l2_normalize_2 (Ten [(None, 32)] 0 sequential_1[1][0] \n",
" tf_op_layer_l2_normalize_2/Rsqrt[\n",
"==================================================================================================\n",
"Total params: 41,376\n",
"Trainable params: 41,376\n",
"Non-trainable params: 0\n",
"__________________________________________________________________________________________________\n"
]
}
],
"source": [
"# Extracts out ALL the MOVIE i input features -> x_m^(i), so create the MOVIE input (obj) of MOVIE network, \n",
"# with shape = (num_item_features) = (16) cols/feat -> \n",
"# x_m^(i)=[x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16] features x rows/vectors/batch size\n",
"# tf.Keras -> input_shape=(cols, rows), so if we add just (cols,) = (cols) \n",
"# then tf.keras 'ADDs' automatically a (cols, None) which is later \n",
"# replaced by the batch size (vectors/rows) which is 847 vectors / MOVIES.\n",
"# Input layer\n",
"input_item_m = tf.keras.layers.Input(shape=(num_item_features,)) \n",
"\n",
"# FEEDS x_m^(i) to the MOVIE Network 'item_NN', computing MOVIE vector V_m^(i) with 32 elements.\n",
"# Use the trained NN model -> 'item_NN' as our SMALL (only MOVIES) model\n",
"vm_m = item_NN(input_item_m)\n",
"\n",
"# Each of 32 elements at V_m^(j) is divided by its magnitude or norm -> || V_m^(i) ||, so\n",
"# we end up with a UNITARY VECTOR V_m^(i) WITH LENGTH = ‘1’. \n",
"# This code, NORMALIZES to ‘1’ the LENGHT of vector V_m^(i).\n",
"# It turns out to make this algorithm WORK A BIT BETTER.\n",
"# incorporate normalization as was done in the original model\n",
"# Normalizes each element in row-> direction of the vm tensor (defined by axis=1) \n",
"# so that the L2 norm at row-> direction is equal to 1\n",
"vm_m = tf.linalg.l2_normalize(vm_m, axis=1)\n",
"\n",
"# Tell Keras what is the input = input_item x_m^(i) and the output = V_m^(i)\n",
"# of this SMALL (only MOVIES) model, being the ITEM / MOVIE i features \n",
"# x_m^(i) (40707 MOVIES, 16 features), and the \n",
"# output = V_m^(i) (1, 32 elems).\n",
"model_m = tf.keras.Model(input_item_m, vm_m)\n",
"\n",
"# Provides a concise and useful summary of the model, including:\n",
"# 'Name' and 'type' of each layer in the model, the 'output dimensions' of each layer, \n",
"# and the 'total number of trainable parameters'.\n",
"model_m.summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you have a SMALL (only MOVIES) model, you can create a set of movie feature vectors by using the model to predict using a set of item/movie vectors as input. `item_vecs` is a set of all of the movie vectors. It must be scaled to use with the trained model. The result of the prediction is a 32 entry feature vector for each movie."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"V_m^(i) 1D vector size: (None, 32)\n",
"\n",
"Non-scaled item_vecs 2D array \n",
" [[4.05400000e+03 2.00100000e+03 2.84375000e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.06900000e+03 2.00100000e+03 2.90909091e+00 ... 1.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [4.14800000e+03 2.00100000e+03 2.93589744e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 1.00000000e+00]\n",
" ...\n",
" [1.77765000e+05 2.01700000e+03 3.53846154e+00 ... 0.00000000e+00\n",
" 0.00000000e+00 0.00000000e+00]\n",
" [1.79819000e+05 2.01700000e+03 3.12500000e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]\n",
" [1.87593000e+05 2.01800000e+03 3.87500000e+00 ... 0.00000000e+00\n",
" 1.00000000e+00 0.00000000e+00]] (847, 17)\n",
"\n",
"SCALED item_vecs 2D array [-1 -> 1] \n",
" [[-0.9488947 -1.19927702 -1.98317861 ... 2.50507064 -0.50822265\n",
" -0.65976722]\n",
" [-0.94849765 -1.19927702 -1.81511436 ... 2.50507064 -0.50822265\n",
" -0.65976722]\n",
" [-0.94640649 -1.19927702 -1.74616492 ... -0.39919034 -0.50822265\n",
" 1.51568608]\n",
" ...\n",
" [ 3.64928572 2.816088 -0.19630151 ... -0.39919034 -0.50822265\n",
" -0.65976722]\n",
" [ 3.7036557 2.816088 -1.25977162 ... -0.39919034 1.96764154\n",
" -0.65976722]\n",
" [ 3.90943572 3.06704831 0.6693137 ... -0.39919034 1.96764154\n",
" -0.65976722]] (847, 17)\n",
"\n",
"Size of all predicted MOVIE feature vectors Vm 2D SCALED array\n",
"[[-0.01857825 0.02509343 0.18280987 ... -0.00534167 0.26681036\n",
" -0.1346159 ]\n",
" [-0.03679933 0.10367652 0.14471677 ... -0.00934093 0.15746793\n",
" -0.09405302]\n",
" [ 0.01894972 -0.15505672 0.00943602 ... 0.07133339 0.30358016\n",
" -0.0658022 ]\n",
" ...\n",
" [-0.00257265 -0.17672591 0.06223017 ... 0.02854471 -0.09042729\n",
" 0.06269106]\n",
" [ 0.07080412 0.125177 0.17506361 ... -0.10165994 -0.00500654\n",
" 0.0288814 ]\n",
" [-0.1291337 -0.22834606 -0.08211754 ... 0.17481491 -0.25011542\n",
" 0.03010405]] (847, 32) MAX 0.659007 MIN -0.5849552\n",
"\n",
"Size of all predicted movie feature vectors Vm 2D non-scaled array\n",
"[[2.7081988 2.8064601 3.161322 ... 2.7379813 3.3503234 2.4471142]\n",
" [2.6672015 2.9832723 3.0756125 ... 2.728983 3.104303 2.5383806]\n",
" [2.7926369 2.4011223 2.771231 ... 2.9105 3.4330554 2.601945 ]\n",
" ...\n",
" [2.7442114 2.3523667 2.890018 ... 2.8142254 2.5465386 2.8910549]\n",
" [2.9093091 3.0316482 3.143893 ... 2.5212653 2.7387352 2.8149831]\n",
" [2.459449 2.2362213 2.5652354 ... 3.1433337 2.1872404 2.817734 ]] (847, 32) MAX 4.2327657 MIN 1.4338508\n"
]
}
],
"source": [
"# V_m^(i) vector of features for each MOVIE has size of (1, 32 elems)\n",
"print('V_m^(i) 1D vector size:',vm_m.shape)\n",
"\n",
"# [[4.05400000e+03 2.00100000e+03 2.84375000e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00]\n",
"# [4.06900000e+03 2.00100000e+03 2.90909091e+00 ... 1.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00]\n",
"# [4.14800000e+03 2.00100000e+03 2.93589744e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 1.00000000e+00]\n",
"# ...\n",
"# [1.77765000e+05 2.01700000e+03 3.53846154e+00 ... 0.00000000e+00\n",
"# 0.00000000e+00 0.00000000e+00]\n",
"# [1.79819000e+05 2.01700000e+03 3.12500000e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00]\n",
"# [1.87593000e+05 2.01800000e+03 3.87500000e+00 ... 0.00000000e+00\n",
"# 1.00000000e+00 0.00000000e+00]] (847, 17)\n",
"print('\\nNon-scaled item_vecs 2D array \\n', item_vecs, item_vecs.shape)\n",
"\n",
"# SCALE 'item_vecs' 2D array (847, 17) with 'scalerItem' (obj)\n",
"# Perform standardization / scaling, by centering (mu = 0) \n",
"# and scaling (sigma = ±1).\n",
"scaled_item_vecs = scalerItem.transform(item_vecs)\n",
"\n",
"# [[-0.9488947 -1.19927702 -1.98317861 ... 2.50507064 -0.50822265\n",
"# -0.65976722]\n",
"# [-0.94849765 -1.19927702 -1.81511436 ... 2.50507064 -0.50822265\n",
"# -0.65976722]\n",
"# [-0.94640649 -1.19927702 -1.74616492 ... -0.39919034 -0.50822265\n",
"# 1.51568608]\n",
"# ...\n",
"# [ 3.64928572 2.816088 -0.19630151 ... -0.39919034 -0.50822265\n",
"# -0.65976722]\n",
"# [ 3.7036557 2.816088 -1.25977162 ... -0.39919034 1.96764154\n",
"# -0.65976722]\n",
"# [ 3.90943572 3.06704831 0.6693137 ... -0.39919034 1.96764154\n",
"# -0.65976722]] (847, 17)\n",
"print('\\nSCALED item_vecs 2D array [-1 -> 1] \\n', scaled_item_vecs, scaled_item_vecs.shape)\n",
"\n",
"# Make V_m^(i) MOVIES vector SCALED prediction. \n",
"# input = x_m^(i)[All rows, from feature 1]\n",
"# 0 col NO\n",
"vms = model_m.predict(scaled_item_vecs[:,i_s:])\n",
"\n",
"# V_m^(i) shape is (847 vectors / MOVIES, 32 elements/cols) 2D array\n",
"# 847 MOVIES vectors, each with 32 output elements / cols\n",
"\n",
"# [[-0.01857825 0.02509343 0.18280987 ... -0.00534167 0.26681036\n",
"# -0.1346159 ]\n",
"# [-0.03679933 0.10367652 0.14471677 ... -0.00934093 0.15746793\n",
"# -0.09405302]\n",
"# [ 0.01894972 -0.15505672 0.00943602 ... 0.07133339 0.30358016\n",
"# -0.0658022 ]\n",
"# ...\n",
"# [-0.00257265 -0.17672591 0.06223017 ... 0.02854471 -0.09042729\n",
"# 0.06269106]\n",
"# [ 0.07080412 0.125177 0.17506361 ... -0.10165994 -0.00500654\n",
"# 0.0288814 ]\n",
"# [-0.1291337 -0.22834606 -0.08211754 ... 0.17481491 -0.25011542\n",
"# 0.03010405]] (847, 32) MAX 0.659007 MIN -0.5849552\n",
"print('\\nSize of all predicted MOVIE feature vectors Vm 2D SCALED array')\n",
"print(vms, vms.shape,'MAX',np.max(vms),'MIN',np.min(vms))\n",
"\n",
"# UNSCALE V_m^(i) MOVIES vector SCALED prediction, to get back predicted values \n",
"# between [0-5] with 'scalerTarget' (obj) and '.inverse_transform(scaled prediction)'\n",
"vms_u = scalerTarget.inverse_transform(vms)\n",
"\n",
"# [[2.7081988 2.8064601 3.161322 ... 2.7379813 3.3503234 2.4471142]\n",
"# [2.6672015 2.9832723 3.0756125 ... 2.728983 3.104303 2.5383806]\n",
"# [2.7926369 2.4011223 2.771231 ... 2.9105 3.4330554 2.601945 ]\n",
"# ...\n",
"# [2.7442114 2.3523667 2.890018 ... 2.8142254 2.5465386 2.8910549]\n",
"# [2.9093091 3.0316482 3.143893 ... 2.5212653 2.7387352 2.8149831]\n",
"# [2.459449 2.2362213 2.5652354 ... 3.1433337 2.1872404 2.817734 ]] (847, 32) MAX 4.2327657 MIN 1.4338508\n",
"print('\\nSize of all predicted movie feature vectors Vm 2D non-scaled array')\n",
"print(vms_u,vms_u.shape,'MAX',np.max(vms_u),'MIN',np.min(vms_u))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now compute a matrix of the squared distance between each movie feature vector and all other movie feature vectors:\n",
"<figure>\n",
" <left> <img src=\"./images/distmatrix.PNG\" style=\"width:400px;height:225px;\" ></center>\n",
"</figure>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can then find the CLOSEST MOVIE by finding the MINIMUM between ALL Squared Distance values, along each row ->. We will make use of [numpy masked arrays](https://numpy.org/doc/1.21/user/tutorial-ma.html) to avoid selecting the same movie. The masked / invalid values along the diagonal WON'T be included in the computation."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"deletable": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"dist 2D array\n",
" [[0. 0.20204717 0.87640804 ... 1.8606956 1.19628048 1.97381568]\n",
" [0.20204717 0. 1.23392963 ... 1.70345736 0.88395357 1.6185534 ]\n",
" [0.87640804 1.23392963 0. ... 2.11601639 1.75096154 2.01017022]\n",
" ...\n",
" [1.8606956 1.70345736 2.11601639 ... 0. 0.89416355 0.39250469]\n",
" [1.19628048 0.88395357 1.75096154 ... 0.89416355 0. 0.94981474]\n",
" [1.97381568 1.6185534 2.01017022 ... 0.39250469 0.94981474 0. ]] (847, 847)\n",
"\n",
"m_dist 2D array\n",
" [[-- 0.2020471692085266 0.876408040523529 ... 1.8606956005096436\n",
" 1.1962804794311523 1.973815679550171]\n",
" [0.2020471692085266 -- 1.2339296340942383 ... 1.7034573554992676\n",
" 0.8839535713195801 1.6185534000396729]\n",
" [0.876408040523529 1.2339296340942383 -- ... 2.116016387939453\n",
" 1.7509615421295166 2.0101702213287354]\n",
" ...\n",
" [1.8606956005096436 1.7034573554992676 2.116016387939453 ... --\n",
" 0.8941635489463806 0.3925046920776367]\n",
" [1.1962804794311523 0.8839535713195801 1.7509615421295166 ...\n",
" 0.8941635489463806 -- 0.9498147368431091]\n",
" [1.973815679550171 1.6185534000396729 2.0101702213287354 ...\n",
" 0.3925046920776367 0.9498147368431091 --]] (847, 847) \n",
"\n",
"INDEX of MIN distance for MOVIE/row 0 -> 223 MIN distance= 0.031507059931755066\n",
"INDEX of MIN distance for MOVIE/row 1 -> 110 MIN distance= 0.0005734078004024923\n",
"INDEX of MIN distance for MOVIE/row 2 -> 152 MIN distance= 0.02548859268426895\n",
"INDEX of MIN distance for MOVIE/row 3 -> 169 MIN distance= 0.024568267166614532\n",
"INDEX of MIN distance for MOVIE/row 4 -> 335 MIN distance= 0.04095177352428436\n",
"INDEX of MIN distance for MOVIE/row 5 -> 38 MIN distance= 0.0006838899571448565\n",
"INDEX of MIN distance for MOVIE/row 6 -> 92 MIN distance= 0.0025973529554903507\n",
"INDEX of MIN distance for MOVIE/row 7 -> 323 MIN distance= 0.10817582905292511\n",
"INDEX of MIN distance for MOVIE/row 8 -> 374 MIN distance= 0.06968577206134796\n",
"INDEX of MIN distance for MOVIE/row 9 -> 124 MIN distance= 0.2927837669849396\n",
"INDEX of MIN distance for MOVIE/row 10 -> 104 MIN distance= 0.10015091300010681\n",
"INDEX of MIN distance for MOVIE/row 11 -> 143 MIN distance= 0.00946224294602871\n",
"INDEX of MIN distance for MOVIE/row 12 -> 127 MIN distance= 0.0037179533392190933\n",
"INDEX of MIN distance for MOVIE/row 13 -> 290 MIN distance= 0.40510135889053345\n",
"INDEX of MIN distance for MOVIE/row 14 -> 335 MIN distance= 0.054576575756073\n",
"INDEX of MIN distance for MOVIE/row 15 -> 199 MIN distance= 0.04322923347353935\n",
"INDEX of MIN distance for MOVIE/row 16 -> 31 MIN distance= 0.08330267667770386\n",
"INDEX of MIN distance for MOVIE/row 17 -> 638 MIN distance= 0.14336174726486206\n",
"INDEX of MIN distance for MOVIE/row 18 -> 262 MIN distance= 0.022530732676386833\n",
"INDEX of MIN distance for MOVIE/row 19 -> 291 MIN distance= 0.13572460412979126\n",
"INDEX of MIN distance for MOVIE/row 20 -> 175 MIN distance= 0.01738070324063301\n",
"INDEX of MIN distance for MOVIE/row 21 -> 60 MIN distance= 0.07999090850353241\n",
"INDEX of MIN distance for MOVIE/row 22 -> 85 MIN distance= 0.07254807651042938\n",
"INDEX of MIN distance for MOVIE/row 23 -> 589 MIN distance= 0.25931304693222046\n",
"INDEX of MIN distance for MOVIE/row 24 -> 501 MIN distance= 0.04436425119638443\n",
"INDEX of MIN distance for MOVIE/row 25 -> 179 MIN distance= 0.005933586973696947\n",
"INDEX of MIN distance for MOVIE/row 26 -> 117 MIN distance= 0.07589443773031235\n",
"INDEX of MIN distance for MOVIE/row 27 -> 198 MIN distance= 0.24498337507247925\n",
"INDEX of MIN distance for MOVIE/row 28 -> 325 MIN distance= 0.40055668354034424\n",
"INDEX of MIN distance for MOVIE/row 29 -> 75 MIN distance= 0.005713851191103458\n",
"INDEX of MIN distance for MOVIE/row 30 -> 543 MIN distance= 0.2806133031845093\n",
"INDEX of MIN distance for MOVIE/row 31 -> 45 MIN distance= 0.00018907437333837152\n",
"INDEX of MIN distance for MOVIE/row 32 -> 244 MIN distance= 0.05046743527054787\n",
"INDEX of MIN distance for MOVIE/row 33 -> 111 MIN distance= 0.05047108232975006\n",
"INDEX of MIN distance for MOVIE/row 34 -> 139 MIN distance= 0.0011726617813110352\n",
"INDEX of MIN distance for MOVIE/row 35 -> 202 MIN distance= 0.05993355065584183\n",
"INDEX of MIN distance for MOVIE/row 36 -> 252 MIN distance= 0.04784968122839928\n",
"INDEX of MIN distance for MOVIE/row 37 -> 396 MIN distance= 0.11428750306367874\n",
"INDEX of MIN distance for MOVIE/row 38 -> 5 MIN distance= 0.0006838899571448565\n",
"INDEX of MIN distance for MOVIE/row 39 -> 41 MIN distance= 6.836782267782837e-05\n",
"INDEX of MIN distance for MOVIE/row 40 -> 286 MIN distance= 0.22644734382629395\n",
"INDEX of MIN distance for MOVIE/row 41 -> 39 MIN distance= 6.836782267782837e-05\n",
"INDEX of MIN distance for MOVIE/row 42 -> 5 MIN distance= 0.027735963463783264\n",
"INDEX of MIN distance for MOVIE/row 43 -> 95 MIN distance= 0.01779981702566147\n",
"INDEX of MIN distance for MOVIE/row 44 -> 157 MIN distance= 0.01901673898100853\n",
"INDEX of MIN distance for MOVIE/row 45 -> 31 MIN distance= 0.00018907437333837152\n",
"INDEX of MIN distance for MOVIE/row 46 -> 741 MIN distance= 0.16669386625289917\n",
"INDEX of MIN distance for MOVIE/row 47 -> 165 MIN distance= 0.0037854413967579603\n",
"INDEX of MIN distance for MOVIE/row 48 -> 719 MIN distance= 0.2380872666835785\n",
"INDEX of MIN distance for MOVIE/row 49 -> 72 MIN distance= 0.04667433723807335\n"
]
},
{
"data": {
"text/html": [
"<table>\n",
"<thead>\n",
"<tr><th>movie1 </th><th>genres </th><th>movie2 </th><th>genres </th></tr>\n",
"</thead>\n",
"<tbody>\n",
"<tr><td>Save the Last Dance (2001) </td><td>Drama|Romance </td><td>Mona Lisa Smile (2003) </td><td>Drama|Romance </td></tr>\n",
"<tr><td>Wedding Planner, The (2001) </td><td>Comedy|Romance </td><td>Mr. Deeds (2002) </td><td>Comedy|Romance </td></tr>\n",
"<tr><td>Hannibal (2001) </td><td>Horror|Thriller </td><td>Final Destination 2 (2003) </td><td>Horror|Thriller </td></tr>\n",
"<tr><td>Saving Silverman (Evil Woman) (2001) </td><td>Comedy|Romance </td><td>Down with Love (2003) </td><td>Comedy|Romance </td></tr>\n",
"<tr><td>Down to Earth (2001) </td><td>Comedy|Fantasy|Romance </td><td>Bewitched (2005) </td><td>Comedy|Fantasy|Romance </td></tr>\n",
"<tr><td>Mexican, The (2001) </td><td>Action|Comedy </td><td>Rush Hour 2 (2001) </td><td>Action|Comedy </td></tr>\n",
"<tr><td>15 Minutes (2001) </td><td>Thriller </td><td>Panic Room (2002) </td><td>Thriller </td></tr>\n",
"<tr><td>Enemy at the Gates (2001) </td><td>Drama </td><td>Kung Fu Hustle (Gong fu) (2004) </td><td>Action|Comedy </td></tr>\n",
"<tr><td>Heartbreakers (2001) </td><td>Comedy|Crime|Romance </td><td>Fun with Dick and Jane (2005) </td><td>Comedy|Crime </td></tr>\n",
"<tr><td>Spy Kids (2001) </td><td>Action|Adventure|Children|Comedy </td><td>Tuxedo, The (2002) </td><td>Action|Comedy </td></tr>\n",
"<tr><td>Along Came a Spider (2001) </td><td>Action|Crime|Mystery|Thriller </td><td>Insomnia (2002) </td><td>Action|Crime|Drama|Mystery|Thriller </td></tr>\n",
"<tr><td>Blow (2001) </td><td>Crime|Drama </td><td>25th Hour (2002) </td><td>Crime|Drama </td></tr>\n",
"<tr><td>Bridget Jones&#x27;s Diary (2001) </td><td>Comedy|Drama|Romance </td><td>Punch-Drunk Love (2002) </td><td>Comedy|Drama|Romance </td></tr>\n",
"<tr><td>Joe Dirt (2001) </td><td>Adventure|Comedy|Mystery|Romance </td><td>Polar Express, The (2004) </td><td>Adventure|Animation|Children|Fantasy </td></tr>\n",
"<tr><td>Crocodile Dundee in Los Angeles (2001) </td><td>Comedy|Drama </td><td>Bewitched (2005) </td><td>Comedy|Fantasy|Romance </td></tr>\n",
"<tr><td>Mummy Returns, The (2001) </td><td>Action|Adventure|Comedy|Thriller </td><td>Rundown, The (2003) </td><td>Action|Adventure|Comedy </td></tr>\n",
"<tr><td>Knight&#x27;s Tale, A (2001) </td><td>Action|Comedy|Romance </td><td>Legally Blonde (2001) </td><td>Comedy|Romance </td></tr>\n",
"<tr><td>Shrek (2001) </td><td>Adventure|Animation|Children|Comedy|Fantasy|Romance</td><td>Tangled (2010) </td><td>Animation|Children|Comedy|Fantasy|Romance </td></tr>\n",
"<tr><td>Moulin Rouge (2001) </td><td>Drama|Romance </td><td>Notebook, The (2004) </td><td>Drama|Romance </td></tr>\n",
"<tr><td>Pearl Harbor (2001) </td><td>Action|Drama|Romance </td><td>Bridget Jones: The Edge of Reason (2004) </td><td>Comedy|Drama|Romance </td></tr>\n",
"<tr><td>Animal, The (2001) </td><td>Comedy </td><td>Dumb and Dumberer: When Harry Met Lloyd (2003) </td><td>Comedy </td></tr>\n",
"<tr><td>Evolution (2001) </td><td>Comedy|Sci-Fi </td><td>Behind Enemy Lines (2001) </td><td>Action|Drama </td></tr>\n",
"<tr><td>Swordfish (2001) </td><td>Action|Crime|Drama </td><td>We Were Soldiers (2002) </td><td>Action|Drama </td></tr>\n",
"<tr><td>Atlantis: The Lost Empire (2001) </td><td>Adventure|Animation|Children|Fantasy </td><td>Cloudy with a Chance of Meatballs (2009) </td><td>Animation|Children|Fantasy </td></tr>\n",
"<tr><td>Lara Croft: Tomb Raider (2001) </td><td>Action|Adventure </td><td>National Treasure: Book of Secrets (2007) </td><td>Action|Adventure </td></tr>\n",
"<tr><td>Dr. Dolittle 2 (2001) </td><td>Comedy </td><td>Legally Blonde 2: Red, White &amp; Blonde (2003) </td><td>Comedy </td></tr>\n",
"<tr><td>Fast and the Furious, The (2001) </td><td>Action|Crime|Thriller </td><td>xXx (2002) </td><td>Action|Crime|Thriller </td></tr>\n",
"<tr><td>A.I. Artificial Intelligence (2001) </td><td>Adventure|Drama|Sci-Fi </td><td>Bubba Ho-tep (2002) </td><td>Comedy|Horror </td></tr>\n",
"<tr><td>Cats &amp; Dogs (2001) </td><td>Children|Comedy </td><td>Robots (2005) </td><td>Adventure|Animation|Children|Comedy|Fantasy|Sci-Fi</td></tr>\n",
"<tr><td>Scary Movie 2 (2001) </td><td>Comedy </td><td>Orange County (2002) </td><td>Comedy </td></tr>\n",
"<tr><td>Final Fantasy: The Spirits Within (2001)</td><td>Adventure|Animation|Fantasy|Sci-Fi </td><td>Madagascar: Escape 2 Africa (2008) </td><td>Action|Adventure|Animation|Children|Comedy </td></tr>\n",
"<tr><td>Legally Blonde (2001) </td><td>Comedy|Romance </td><td>Serendipity (2001) </td><td>Comedy|Romance </td></tr>\n",
"<tr><td>Score, The (2001) </td><td>Action|Drama </td><td>Punisher, The (2004) </td><td>Action|Crime|Thriller </td></tr>\n",
"<tr><td>Jurassic Park III (2001) </td><td>Action|Adventure|Sci-Fi|Thriller </td><td>Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (2002)</td><td>Action|Comedy|Sci-Fi </td></tr>\n",
"<tr><td>America&#x27;s Sweethearts (2001) </td><td>Comedy|Romance </td><td>Maid in Manhattan (2002) </td><td>Comedy|Romance </td></tr>\n",
"<tr><td>Ghost World (2001) </td><td>Comedy|Drama </td><td>Station Agent, The (2003) </td><td>Comedy|Drama </td></tr>\n",
"<tr><td>Planet of the Apes (2001) </td><td>Action|Adventure|Drama|Sci-Fi </td><td>Day After Tomorrow, The (2004) </td><td>Action|Adventure|Drama|Sci-Fi|Thriller </td></tr>\n",
"<tr><td>Princess Diaries, The (2001) </td><td>Children|Comedy|Romance </td><td>Lake House, The (2006) </td><td>Drama|Fantasy|Romance </td></tr>\n",
"<tr><td>Rush Hour 2 (2001) </td><td>Action|Comedy </td><td>Mexican, The (2001) </td><td>Action|Comedy </td></tr>\n",
"<tr><td>American Pie 2 (2001) </td><td>Comedy </td><td>Rat Race (2001) </td><td>Comedy </td></tr>\n",
"<tr><td>Others, The (2001) </td><td>Drama|Horror|Mystery|Thriller </td><td>The Machinist (2004) </td><td>Drama|Mystery|Thriller </td></tr>\n",
"<tr><td>Rat Race (2001) </td><td>Comedy </td><td>American Pie 2 (2001) </td><td>Comedy </td></tr>\n",
"<tr><td>Jay and Silent Bob Strike Back (2001) </td><td>Adventure|Comedy </td><td>Mexican, The (2001) </td><td>Action|Comedy </td></tr>\n",
"<tr><td>Training Day (2001) </td><td>Crime|Drama|Thriller </td><td>Frailty (2001) </td><td>Crime|Drama|Thriller </td></tr>\n",
"<tr><td>Zoolander (2001) </td><td>Comedy </td><td>Old School (2003) </td><td>Comedy </td></tr>\n",
"<tr><td>Serendipity (2001) </td><td>Comedy|Romance </td><td>Legally Blonde (2001) </td><td>Comedy|Romance </td></tr>\n",
"<tr><td>Mulholland Drive (2001) </td><td>Crime|Drama|Mystery|Thriller </td><td>Prisoners (2013) </td><td>Drama|Mystery|Thriller </td></tr>\n",
"<tr><td>From Hell (2001) </td><td>Crime|Horror|Mystery|Thriller </td><td>Identity (2003) </td><td>Crime|Horror|Mystery|Thriller </td></tr>\n",
"<tr><td>Waking Life (2001) </td><td>Animation|Drama|Fantasy </td><td>Warm Bodies (2013) </td><td>Comedy|Horror|Romance </td></tr>\n",
"<tr><td>K-PAX (2001) </td><td>Drama|Fantasy|Mystery|Sci-Fi </td><td>Gosford Park (2001) </td><td>Comedy|Drama|Mystery </td></tr>\n",
"</tbody>\n",
"</table>"
],
"text/plain": [
"'<table>\\n<thead>\\n<tr><th>movie1 </th><th>genres </th><th>movie2 </th><th>genres </th></tr>\\n</thead>\\n<tbody>\\n<tr><td>Save the Last Dance (2001) </td><td>Drama|Romance </td><td>Mona Lisa Smile (2003) </td><td>Drama|Romance </td></tr>\\n<tr><td>Wedding Planner, The (2001) </td><td>Comedy|Romance </td><td>Mr. Deeds (2002) </td><td>Comedy|Romance </td></tr>\\n<tr><td>Hannibal (2001) </td><td>Horror|Thriller </td><td>Final Destination 2 (2003) </td><td>Horror|Thriller </td></tr>\\n<tr><td>Saving Silverman (Evil Woman) (2001) </td><td>Comedy|Romance </td><td>Down with Love (2003) </td><td>Comedy|Romance </td></tr>\\n<tr><td>Down to Earth (2001) </td><td>Comedy|Fantasy|Romance </td><td>Bewitched (2005) </td><td>Comedy|Fantasy|Romance </td></tr>\\n<tr><td>Mexican, The (2001) </td><td>Action|Comedy </td><td>Rush Hour 2 (2001) </td><td>Action|Comedy </td></tr>\\n<tr><td>15 Minutes (2001) </td><td>Thriller </td><td>Panic Room (2002) </td><td>Thriller </td></tr>\\n<tr><td>Enemy at the Gates (2001) </td><td>Drama </td><td>Kung Fu Hustle (Gong fu) (2004) </td><td>Action|Comedy </td></tr>\\n<tr><td>Heartbreakers (2001) </td><td>Comedy|Crime|Romance </td><td>Fun with Dick and Jane (2005) </td><td>Comedy|Crime </td></tr>\\n<tr><td>Spy Kids (2001) </td><td>Action|Adventure|Children|Comedy </td><td>Tuxedo, The (2002) </td><td>Action|Comedy </td></tr>\\n<tr><td>Along Came a Spider (2001) </td><td>Action|Crime|Mystery|Thriller </td><td>Insomnia (2002) </td><td>Action|Crime|Drama|Mystery|Thriller </td></tr>\\n<tr><td>Blow (2001) </td><td>Crime|Drama </td><td>25th Hour (2002) </td><td>Crime|Drama </td></tr>\\n<tr><td>Bridget Jones&#x27;s Diary (2001) </td><td>Comedy|Drama|Romance </td><td>Punch-Drunk Love (2002) </td><td>Comedy|Drama|Romance </td></tr>\\n<tr><td>Joe Dirt (2001) </td><td>Adventure|Comedy|Mystery|Romance </td><td>Polar Express, The (2004) </td><td>Adventure|Animation|Children|Fantasy </td></tr>\\n<tr><td>Crocodile Dundee in Los Angeles (2001) </td><td>Comedy|Drama </td><td>Bewitched (2005) </td><td>Comedy|Fantasy|Romance </td></tr>\\n<tr><td>Mummy Returns, The (2001) </td><td>Action|Adventure|Comedy|Thriller </td><td>Rundown, The (2003) </td><td>Action|Adventure|Comedy </td></tr>\\n<tr><td>Knight&#x27;s Tale, A (2001) </td><td>Action|Comedy|Romance </td><td>Legally Blonde (2001) </td><td>Comedy|Romance </td></tr>\\n<tr><td>Shrek (2001) </td><td>Adventure|Animation|Children|Comedy|Fantasy|Romance</td><td>Tangled (2010) </td><td>Animation|Children|Comedy|Fantasy|Romance </td></tr>\\n<tr><td>Moulin Rouge (2001) </td><td>Drama|Romance </td><td>Notebook, The (2004) </td><td>Drama|Romance </td></tr>\\n<tr><td>Pearl Harbor (2001) </td><td>Action|Drama|Romance </td><td>Bridget Jones: The Edge of Reason (2004) </td><td>Comedy|Drama|Romance </td></tr>\\n<tr><td>Animal, The (2001) </td><td>Comedy </td><td>Dumb and Dumberer: When Harry Met Lloyd (2003) </td><td>Comedy </td></tr>\\n<tr><td>Evolution (2001) </td><td>Comedy|Sci-Fi </td><td>Behind Enemy Lines (2001) </td><td>Action|Drama </td></tr>\\n<tr><td>Swordfish (2001) </td><td>Action|Crime|Drama </td><td>We Were Soldiers (2002) </td><td>Action|Drama </td></tr>\\n<tr><td>Atlantis: The Lost Empire (2001) </td><td>Adventure|Animation|Children|Fantasy </td><td>Cloudy with a Chance of Meatballs (2009) </td><td>Animation|Children|Fantasy </td></tr>\\n<tr><td>Lara Croft: Tomb Raider (2001) </td><td>Action|Adventure </td><td>National Treasure: Book of Secrets (2007) </td><td>Action|Adventure </td></tr>\\n<tr><td>Dr. Dolittle 2 (2001) </td><td>Comedy </td><td>Legally Blonde 2: Red, White &amp; Blonde (2003) </td><td>Comedy </td></tr>\\n<tr><td>Fast and the Furious, The (2001) </td><td>Action|Crime|Thriller </td><td>xXx (2002) </td><td>Action|Crime|Thriller </td></tr>\\n<tr><td>A.I. Artificial Intelligence (2001) </td><td>Adventure|Drama|Sci-Fi </td><td>Bubba Ho-tep (2002) </td><td>Comedy|Horror </td></tr>\\n<tr><td>Cats &amp; Dogs (2001) </td><td>Children|Comedy </td><td>Robots (2005) </td><td>Adventure|Animation|Children|Comedy|Fantasy|Sci-Fi</td></tr>\\n<tr><td>Scary Movie 2 (2001) </td><td>Comedy </td><td>Orange County (2002) </td><td>Comedy </td></tr>\\n<tr><td>Final Fantasy: The Spirits Within (2001)</td><td>Adventure|Animation|Fantasy|Sci-Fi </td><td>Madagascar: Escape 2 Africa (2008) </td><td>Action|Adventure|Animation|Children|Comedy </td></tr>\\n<tr><td>Legally Blonde (2001) </td><td>Comedy|Romance </td><td>Serendipity (2001) </td><td>Comedy|Romance </td></tr>\\n<tr><td>Score, The (2001) </td><td>Action|Drama </td><td>Punisher, The (2004) </td><td>Action|Crime|Thriller </td></tr>\\n<tr><td>Jurassic Park III (2001) </td><td>Action|Adventure|Sci-Fi|Thriller </td><td>Men in Black II (a.k.a. MIIB) (a.k.a. MIB 2) (2002)</td><td>Action|Comedy|Sci-Fi </td></tr>\\n<tr><td>America&#x27;s Sweethearts (2001) </td><td>Comedy|Romance </td><td>Maid in Manhattan (2002) </td><td>Comedy|Romance </td></tr>\\n<tr><td>Ghost World (2001) </td><td>Comedy|Drama </td><td>Station Agent, The (2003) </td><td>Comedy|Drama </td></tr>\\n<tr><td>Planet of the Apes (2001) </td><td>Action|Adventure|Drama|Sci-Fi </td><td>Day After Tomorrow, The (2004) </td><td>Action|Adventure|Drama|Sci-Fi|Thriller </td></tr>\\n<tr><td>Princess Diaries, The (2001) </td><td>Children|Comedy|Romance </td><td>Lake House, The (2006) </td><td>Drama|Fantasy|Romance </td></tr>\\n<tr><td>Rush Hour 2 (2001) </td><td>Action|Comedy </td><td>Mexican, The (2001) </td><td>Action|Comedy </td></tr>\\n<tr><td>American Pie 2 (2001) </td><td>Comedy </td><td>Rat Race (2001) </td><td>Comedy </td></tr>\\n<tr><td>Others, The (2001) </td><td>Drama|Horror|Mystery|Thriller </td><td>The Machinist (2004) </td><td>Drama|Mystery|Thriller </td></tr>\\n<tr><td>Rat Race (2001) </td><td>Comedy </td><td>American Pie 2 (2001) </td><td>Comedy </td></tr>\\n<tr><td>Jay and Silent Bob Strike Back (2001) </td><td>Adventure|Comedy </td><td>Mexican, The (2001) </td><td>Action|Comedy </td></tr>\\n<tr><td>Training Day (2001) </td><td>Crime|Drama|Thriller </td><td>Frailty (2001) </td><td>Crime|Drama|Thriller </td></tr>\\n<tr><td>Zoolander (2001) </td><td>Comedy </td><td>Old School (2003) </td><td>Comedy </td></tr>\\n<tr><td>Serendipity (2001) </td><td>Comedy|Romance </td><td>Legally Blonde (2001) </td><td>Comedy|Romance </td></tr>\\n<tr><td>Mulholland Drive (2001) </td><td>Crime|Drama|Mystery|Thriller </td><td>Prisoners (2013) </td><td>Drama|Mystery|Thriller </td></tr>\\n<tr><td>From Hell (2001) </td><td>Crime|Horror|Mystery|Thriller </td><td>Identity (2003) </td><td>Crime|Horror|Mystery|Thriller </td></tr>\\n<tr><td>Waking Life (2001) </td><td>Animation|Drama|Fantasy </td><td>Warm Bodies (2013) </td><td>Comedy|Horror|Romance </td></tr>\\n<tr><td>K-PAX (2001) </td><td>Drama|Fantasy|Mystery|Sci-Fi </td><td>Gosford Park (2001) </td><td>Comedy|Drama|Mystery </td></tr>\\n</tbody>\\n</table>'"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"count = 50 # number of movies to be displayed\n",
"\n",
"# dim = len(vms) = 847 MOVIES/vectors\n",
"dim = len(vms)\n",
"\n",
"# Init 2D square array with '0', of size [847 x 847]\n",
"# dist = [ [0, 0, 0, ... , 0]847\n",
"# [0, 0, 0, ... , 0]\n",
"# ...\n",
"# [0, 0, 0, ... , 0] ]\n",
"# 847\n",
"dist = np.zeros((dim,dim))\n",
"\n",
"# 1st Iterate each row i = 0, 1, 2, ... , 846\n",
"for i in range(dim):\n",
" \n",
" # 2nd Iterate ALL cols j = 0, 1, 2, ... , 846 per row\n",
" for j in range(dim):\n",
" \n",
" # Each MOVIE with features row/vector\n",
" row_movie = vms[i, :]\n",
" \n",
" # Iterate through ALL MOVIES with features cols/vectors\n",
" col_movies = vms[j, :]\n",
" \n",
" # Get Squared Distance/Error between ||each row movie - all col movies||^2\n",
" distance = sq_dist(row_movie, col_movies)\n",
" \n",
" # Assign computed Squared Distance/Error between MOVIES vectors \n",
" # to [i row, j col] position of 'dist' MATRIX\n",
" dist[i,j] = distance\n",
" \n",
"# if dist[i,j] == 0: # ALL equal movies with distance=0 values\n",
"# dist[i,j] = 10 # Will be changed by a HIGHER distance=10 than MAX distance=3.346163034439087\n",
"# # So they CAN'T be picked as the LOWEST distance each row/vector\n",
"\n",
"# [[0. 0.20204717 0.87640804 ... 1.8606956 1.19628048 1.97381568]\n",
"# [0.20204717 0. 1.23392963 ... 1.70345736 0.88395357 1.6185534 ]\n",
"# [0.87640804 1.23392963 0. ... 2.11601639 1.75096154 2.01017022]\n",
"# ...\n",
"# [1.8606956 1.70345736 2.11601639 ... 0. 0.89416355 0.39250469]\n",
"# [1.19628048 0.88395357 1.75096154 ... 0.89416355 0. 0.94981474]\n",
"# [1.97381568 1.6185534 2.01017022 ... 0.39250469 0.94981474 0. ]] (847, 847)\n",
"# Display 2D FILLED array, with ALL col movies distances, per row movie\n",
"print('dist 2D array\\n',dist,dist.shape)\n",
"\n",
"# mask = np.identity(n=dist.shape[0]) create an IDENTITY MATRIX / SQUARE MATRIX [n x n] \n",
"# with the same quantity of n rows/vectors and n cols at 'dist' -> dist.shape = (847,847) \n",
"# -> n = dist.shape[0] = 847 rows/vectors.\n",
"\n",
"# mask = [ [1, 0, 0, ... ,0]847\n",
"# [0, 1, 0, ..., 0]\n",
"# [0, 0, 1, ..., 0]\n",
"# ...\n",
"# [0, 0, 0, ..., 1] ]\n",
"# 847\n",
"\n",
"# All '1's values in mask, mean we wish to mark as INVALID \n",
"# these position values in 'dist' MATRIX\n",
"# 'dist' diagonal values = 0 mean, distances between each row movie and its equal col movie -> '0'\n",
"# So it masks/invalid ALL the diagonal values = 0 at 'dist' ('0' -> '--')\n",
"m_dist = ma.masked_array(dist, mask=np.identity(dist.shape[0]))\n",
"\n",
"# [[-- 0.2020471692085266 0.876408040523529 ... 1.8606956005096436 1.1962804794311523 1.973815679550171] \n",
"# [0.2020471692085266 -- 1.2339296340942383 ... 1.7034573554992676 0.8839535713195801 1.6185534000396729]\n",
"# [0.876408040523529 1.2339296340942383 -- ... 2.116016387939453 1.7509615421295166 2.0101702213287354]\n",
"# ...\n",
"# [1.8606956005096436 1.7034573554992676 2.116016387939453 ... -- 0.8941635489463806 0.3925046920776367]\n",
"# [1.1962804794311523 0.8839535713195801 1.7509615421295166 ... 0.8941635489463806 -- 0.9498147368431091]\n",
"# [1.973815679550171 1.6185534000396729 2.0101702213287354 ... 0.3925046920776367 0.9498147368431091 --]] (847, 847)\n",
"# Display 2D filled array, with ALL col movies distances, per row movie\n",
"# with Masked/invalid values in main diagonal '0' -> '--'\n",
"print('\\nm_dist 2D array\\n',m_dist,m_dist.shape,'\\n')\n",
"\n",
"# Set cols/features/header names (1st row of disp), to be displayed at table/df\n",
"# disp = [ [header names]0, [x1,x2,x3,x4]1,..., [x1,x2,x3,x4]50 MOVIES ]\n",
"disp = [[\"movie1\", \"genres\", \"movie2\", \"genres\"]]\n",
"\n",
"# Iterate/Display just count=50 MOVIES\n",
"for i in range(count):\n",
" \n",
" # Each row/vector at 'm_dist', find the LOWEST squared distance\n",
" # to other 846 possible distances, related to other 846 MOVIES\n",
" # Each row return the 'index' position, related to the LOWEST squared distance\n",
" min_idx = np.argmin(m_dist[i]) # When uncomment if -> min_idx = np.argmin(dist[i])\n",
" \n",
" # INDEX of MIN distance for MOVIE/row i -> min_index MIN distance= x.xxxxxxxxxxxxxxxxx\n",
" print('INDEX of MIN distance for MOVIE/row',i,'->',min_idx, 'MIN distance=',np.min(m_dist[i]))\n",
" \n",
" # Each of 847 row MOVIES at 'item_vecs', returns the MOVIE ID as integer \n",
" movie1_id = int(item_vecs[i,0])\n",
" \n",
" # Each of 847 row MOVIES at 'item_vecs', returns the MOVIE ID\n",
" # to be placed at LOWEST squared distance, between ALL available col movies \n",
" movie2_id = int(item_vecs[min_idx,0])\n",
" \n",
" # Feature / col 1 = movie 1_id -> 'title'\n",
" x1 = movie_dict[movie1_id]['title']\n",
" \n",
" # Feature / col 2 = movie 1_id -> 'genres' Drama|Romance|Action|Comedy|Thriller...\n",
" x2 = movie_dict[movie1_id]['genres']\n",
" \n",
" # Feature / col 3 = movie 2_id -> 'title'\n",
" x3 = movie_dict[movie2_id]['title']\n",
" \n",
" # Feature / col 4 = movie 2_id -> 'genres' Drama|Romance|Action|Comedy|Thriller...\n",
" x4 = movie_dict[movie2_id]['genres']\n",
" \n",
" # [ [\"movie1\", \"genres\", \"movie2\", \"genres\"]_0, [x1,x2,x3,x4]_1, [x1,x2,x3,x4]_2, ... , [x1,x2,x3,x4]_50 MOVIES ] \n",
" disp.append( [x1, x2, x3, x4] )\n",
"\n",
"# Create table/df from 'disp [ [ headers], [ ], ... , [ ] ]' array , using headers as 1st row of array\n",
"table = tabulate.tabulate(disp, tablefmt='html', headers=\"firstrow\")\n",
"\n",
"# Display table/df\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The results show the SMALL built model 'item_NN' will generally SUGGEST a MOVIE k with similar genre's to MOVIE i."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<a name=\"6\"></a>\n",
"## 6 - Congratulations! <img align=\"left\" src=\"./images/film_award.png\" style=\" width:40px;\">\n",
"You have completed a content-based recommender system. \n",
"\n",
"This structure is the basis of many commercial recommender systems. The user content can be greatly expanded to incorporate more information about the user if it is available. Items are not limited to movies. This can be used to recommend any item, books, cars or items that are similar to an item in your 'shopping cart'."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<details>\n",
" <summary><font size=\"2\" color=\"darkgreen\"><b>Please click here if you want to experiment with any of the non-graded code.</b></font></summary>\n",
" <p><i><b>Important Note: Please only do this when you've already passed the assignment to avoid problems with the autograder.</b></i>\n",
" <ol>\n",
" <li> On the notebook’s menu, click “View” > “Cell Toolbar” > “Edit Metadata”</li>\n",
" <li> Hit the “Edit Metadata” button next to the code cell which you want to lock/unlock</li>\n",
" <li> Set the attribute value for “editable” to:\n",
" <ul>\n",
" <li> “true” if you want to unlock it </li>\n",
" <li> “false” if you want to lock it </li>\n",
" </ul>\n",
" </li>\n",
" <li> On the notebook’s menu, click “View” > “Cell Toolbar” > “None” </li>\n",
" </ol>\n",
" <p> Here's a short demo of how to do the steps above: \n",
" <br>\n",
" <img src=\"https://lh3.google.com/u/0/d/14Xy_Mb17CZVgzVAgq7NCjMVBvSae3xO1\" align=\"center\" alt=\"unlock_cells.gif\">\n",
"</details>"
]
}
],
"metadata": {
"colab": {
"authorship_tag": "ABX9TyOFYdA6zQJ1FpgYwYmRIeXa",
"collapsed_sections": [],
"name": "Recsys_NN.ipynb",
"private_outputs": true,
"provenance": [
{
"file_id": "1RO0HLb7kRE0Tj_0D4E5I-vQz2QLu3CUm",
"timestamp": 1655169179306
}
]
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment