# Victor: Factored Models

Problem Definition

Factored models arise when approximating low-rank matrices in multi-dimensional scaling, principal component analysis, and multi-task classification. In this class of problems, we simultaneously partition both the model and the data.

Data set

For this problem, we are using the MovieLens dataset. This data set contains ratings and tags applied to movies by users of the online movie recommender service MovieLens (see details).

Victor Model and Code

The following code shows the model specification and model instantiation for the Factored Model problem applied to the MovieLens data set.

```-- This deletes the model specification
DELETE MODEL SPECIFICATION low_rank_nopara;

-- This creates the model specification
CREATE MODEL SPECIFICATION low_rank_nopara (
model_type=(python) as (w),
data_item_type=(int,int,float8) as (rowr,col,rating),
objective=examples.factor_simple.factor_simple.se_loss,
objective_agg=SUM,
);

-- This instantiates the model
CREATE MODEL INSTANCE movielens1m
EXAMPLES movielens1m(row,col,rating)
MODEL SPEC low_rank_nopara
INIT_FUNCTION examples.factor_simple.factor_simple.initialize_model
STOP WHEN examples.factor_simple.factor_simple.stopping_condition
;
```

This specification creates a python-type "low_rank_nopara" model which is stored in the database as a byte array. The data items are composed of the 3 values: row, column, and rating which are stored as an integer, integer, and float respectively. We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.

In the code section below, you can see the loss and gradient function that the user provides. Note that this code is defined in a few lines of python using the utilities that Victor provides.

```def se_loss(m, row, col, rating):
i   = row - 1 #matlab...
j   = col - 1
v      = low_rank_helper.dot(m.L[i], m.R[j]) - rating
return v*v

L   = m.L
R   = m.R
i   = row - 1 # matlab...
j   = col - 1
err = low_rank_helper.dot(L[i],R[j])  - rating
e   = -(m.stepsize * err)
tempLi = list(L[i])
L[i] = tempLi
low_rank_helper.ball_project(L[i],m.B,m.B2)
low_rank_helper.ball_project(R[j],m.B,m.B2)
m.set_L(i, L[i])
m.set_R(j, R[j])
return m
```

For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model. Again, these functions are written in a few lines of python code as seen below:

```def initialize_model():
nRows = 6040
nCols = 3952 # TODO QUERY FOR THESE
return LowRankModel(20, nRows, nCols, 1.5)

def stopping_condition(s, loss):
if not (s.has_key('state')):
s['state'] = 0
s['state'] += 1
return s['state'] > 5
```

Coming soon.

Running the Example

Coming soon.