Factored Models
Problem and Data
We demonstrate how to run Matrix Factorization on the MovieLens1M dataset (a subset of the MovieLens dataset).
The descriptions are the same as in this example.
The schema of the mlens1m
table is as follows:
Column | Type | Modifiers --------+------------------+------------------------------------------------------- did | integer | not null default nextval('mlens1m_did_seq'::regclass) row | integer | col | integer | rating | double precision |
Python-Based Front-End
The spec file for this task is as given below (also available in the bin folder as factor-spec.py):
verbose = False model = 'factor' model_id = 333 data_table = 'mlens1m' feature_cols = 'row, col' label_col = 'rating' nrows = 6040 ncols = 3952 maxrank = 10 stepsize = 0.05 decay = 0.95 is_shmem = True
The stepsize and decay values were picked for this dataset after a grid search to get minimum loss value. The maxrank of the factored matrices is picked as 10. Then, run the following command:
python bin/bismarck_front.py factor-spec.py
SQL-Based Front-End
A SQL query for running LMF is as follows:
SELECT factor('mlens1m', 333, 6040, 3952, 10, 20, 3, 0.01, 0.05, 0.95, 't', 't');
The same values are input here, in addition to iteration = 20, and b = 3. The column names are implicitly assumed here to be the same as in the given schema. An alternate SQL query with implicit default values for many of the parameters (refer Using Bismarck) is as follows:
SELECT factor('mlens1m', 333, 6040, 3952, 10);
Model Application
The factored model can be applied for prediction (completion) using the factor_pred
function:
SELECT factor_init(333); CREATE TABLE factor_pred AS SELECT did, factor_pred(333, row, col) FROM mlens1m; SELECT factor_clear(333);