Factored Models

Problem and Data

We demonstrate how to run Matrix Factorization on the MovieLens1M dataset (a subset of the MovieLens dataset). The descriptions are the same as in this example. The schema of the mlens1m table is as follows:

 Column |       Type       |                       Modifiers                       
 did    | integer          | not null default nextval('mlens1m_did_seq'::regclass)
 row    | integer          | 
 col    | integer          | 
 rating | double precision |

Python-Based Front-End

The spec file for this task is as given below (also available in the bin folder as factor-spec.py):

verbose = False
model = 'factor'
model_id = 333
data_table = 'mlens1m'
feature_cols = 'row, col'
label_col = 'rating'
nrows = 6040
ncols = 3952
maxrank = 10
stepsize = 0.05
decay = 0.95
is_shmem = True

The stepsize and decay values were picked for this dataset after a grid search to get minimum loss value. The maxrank of the factored matrices is picked as 10. Then, run the following command:

python bin/bismarck_front.py factor-spec.py

SQL-Based Front-End

A SQL query for running LMF is as follows:

SELECT factor('mlens1m', 333, 6040, 3952, 10, 20, 3, 0.01, 0.05, 0.95, 't', 't');

The same values are input here, in addition to iteration = 20, and b = 3. The column names are implicitly assumed here to be the same as in the given schema. An alternate SQL query with implicit default values for many of the parameters (refer Using Bismarck) is as follows:

SELECT factor('mlens1m', 333, 6040, 3952, 10);

Model Application

The factored model can be applied for prediction (completion) using the factor_pred function:

SELECT factor_init(333);
CREATE TABLE factor_pred AS SELECT did, factor_pred(333, row, col) FROM mlens1m;
SELECT factor_clear(333);