Logistic Regression
Problem and Data
We demonstrate how to run Logistic Regression on the DBLife dataset.
The descriptions are the same as in this example.
And the dataset can be downloaded from Bismarck Download.
Running Support Vector Machine is very similar to this.
The schema of the dblife
table is as follows:
Column | Type | Modifiers --------+--------------------+------------------------------------------------------ did | integer | not null default nextval('dblife_did_seq'::regclass) k | integer[] | v | double precision[] | label | integer |
Python-Based Front-End
The spec file for this task is as given below (also available in the bin folder as sparse-logit-spec.py):
verbose = False model = 'sparse_logit' model_id = 22 data_table = 'dblife' feature_cols = 'k, v' label_col = 'label' ndims = 41270 stepsize = 0.5 decay = 0.9 is_shmem = True
The stepsize and decay values were picked for this dataset after a grid search to get minimum loss value. To invoke the training, run the following command:
python bin/bismarck_front.py bin/sparse-logit-spec.py
SQL-Based Front-End
A SQL query for training the LR model is as follows:
SELECT sparse_logit('dblife', 22, 41270, 20, 1, 0.5, 0.9, 't', 't');
The same values are input here, in addition to iteration = 20, and mu = 1. The column names are implicitly assumed here to be the same as in the given schema. An alternate SQL query with implicit default values for many of the parameters (refer Using Bismarck) is as follows:
SELECT sparse_logit('dblife', 22, 41270);
Model Application
The trained model can be applied for prediction using the sparse_logit_pred
function:
SELECT sparse_logit_init(22); CREATE TABLE dblife_pred AS SELECT did, sparse_logit_pred(22, k, v) FROM dblife; SELECT sparse_logit_clear(22);