Victor: Factored Models
Problem Definition
Factored models arise when approximating low-rank matrices in multi-dimensional scaling, principal component analysis, and multi-task classification. In this class of problems, we simultaneously partition both the model and the data.
Data set
For this problem, we are using the MovieLens dataset. This data set contains ratings and tags applied to movies by users of the online movie recommender service MovieLens (see details).
Victor Model and Code
The following code shows the model specification and model instantiation for the Factored Model problem applied to the MovieLens data set.
-- This deletes the model specification DELETE MODEL SPECIFICATION low_rank_nopara; -- This creates the model specification CREATE MODEL SPECIFICATION low_rank_nopara ( model_type=(python) as (w), data_item_type=(int,int,float8) as (rowr,col,rating), objective=examples.factor_simple.factor_simple.se_loss, objective_agg=SUM, grad_step=examples.factor_simple.factor_simple.grad ); -- This instantiates the model CREATE MODEL INSTANCE movielens1m EXAMPLES movielens1m(row,col,rating) MODEL SPEC low_rank_nopara INIT_FUNCTION examples.factor_simple.factor_simple.initialize_model STOP WHEN examples.factor_simple.factor_simple.stopping_condition ;
This specification creates a python-type "low_rank_nopara" model which is stored in the database as a byte array. The data items are composed of the 3 values: row, column, and rating which are stored as an integer, integer, and float respectively. We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.
In the code section below, you can see the loss and gradient function that the user provides. Note that this code is defined in a few lines of python using the utilities that Victor provides.
def se_loss(m, row, col, rating): i = row - 1 #matlab... j = col - 1 v = low_rank_helper.dot(m.L[i], m.R[j]) - rating return v*v def grad(m, row, col, rating): L = m.L R = m.R i = row - 1 # matlab... j = col - 1 err = low_rank_helper.dot(L[i],R[j]) - rating e = -(m.stepsize * err) tempLi = list(L[i]) low_rank_helper.scale_and_add(tempLi, R[j],e) low_rank_helper.scale_and_add(R[j], L[i],e) L[i] = tempLi low_rank_helper.ball_project(L[i],m.B,m.B2) low_rank_helper.ball_project(R[j],m.B,m.B2) m.set_L(i, L[i]) m.set_R(j, R[j]) return m
For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model. Again, these functions are written in a few lines of python code as seen below:
def initialize_model(): nRows = 6040 nCols = 3952 # TODO QUERY FOR THESE return LowRankModel(20, nRows, nCols, 1.5) def stopping_condition(s, loss): if not (s.has_key('state')): s['state'] = 0 s['state'] += 1 return s['state'] > 5
Coming soon.
Running the Example
Coming soon.