Victor: Conditional Random Fields
Problem Definition
Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data (Definition taken from here).
Dataset
For this problem we are using a set of loaded documents.
Victor Model and Code
The following code shows the model specification and model instantiation for the Conditional Random Fields problem applied to a set of documents. This model is used to label the documents in this set.
-- This deletes the model specification DELETE MODEL SPECIFICATION crf; -- This creates the model specification CREATE MODEL SPECIFICATION crf ( model_type=(python,python) as (w,template), data_item_type=(python) as (featurized_docs), objective=examples.CRFs.crf_psql.compute_objective_item, objective_agg=SUM, grad_step=examples.CRFs.crf_psql.gradient_step ); -- This instantiates the model CREATE MODEL INSTANCE crf_single_doc EXAMPLES documents_loaded(featurized_docs) MODEL SPEC crf INIT_FUNCTION examples.CRFs.crf_psql.initialize_model STOP WHEN examples.CRFs.crf_psql.stopping_condition ;
This specification creates a "crf" model whose type is defined as a (python,python) pair. The data items are featurized documents that are stored as byte arrays in the data base (as python type). We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.
In the code section below, you can see the loss and gradient function that the user provides. Note that this code is defined in a few lines of python using the utilities that Victor provides.
# The Gradient step function. # We are maximizing so the rule is # w^{k+1} = w^{k} + \nabla L(w^{k}, x) # where L is the log likelihood. def gradient_step( (m,t), (labeled_doc,)): d = parse_template.template_single_document(labeled_doc, t) m = simple_crf.take_gradient_step(m, d) return (m,t) # The objective value is the log likelihood # Expressed below. def compute_objective_item( (m,t), (labeled_doc,) ): d = parse_template.template_single_document(labeled_doc, t) z = simple_crf.compute_normalization(m, d) v = simple_crf.compute_weight_of_labeled(m, d) return z - v
For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model. Again, these functions are written in a few lines of python code as seen below:
def internal_initialize_model(template_file_name, stepsize): t = parse_template.parse_template(template_file_name) m = simple_crf.build_model([],[],stepsize) return (m, t) def initialize_model(): return internal_initialize_model('<100
Coming soon.
Running the Example
Open VICTOR_SQL/examples/CRFs/crf_psql.py and make sure that it uses the right path to your VICTOR_SQL folder.
Run the following commands to run the example:
$ cd VICTOR_SQL/examples/CRFs $ make $ ../../bin/victor_front.py crf_spec