Victor: Minimum Cut
Problem Definition
In graph theory, a minimum cut of a graph is a cut whose cutset has the smallest number of elements (unweighted case), or smallest sum of weights possible (Definition taken from here).
Data set
DBLife is a prototype system that manages information for the database research community (see dblife.cs.wisc.edu). DBLife is developed by the Database group at University of Wisconsin-Madison and Yahoo! Research. It contains information about Databases papers, conferences, people and more.
Victor Model and Code
The following code shows the model specification and model instantiation for the Minimum Cut problem applied to the DBLife Web site data set.
-- This deletes the model specification DELETE MODEL SPECIFICATION cut; -- This creates the model specification CREATE MODEL SPECIFICATION cut ( model_type=(python) as (w), data_item_type=(int,int,float8) as (x,y,weight), objective=examples.CUTS.stochast_cuts.loss, objective_agg=SUM, grad_step=examples.CUTS.stochast_cuts.grad, bulk_grad=examples.CUTS.stochast_cuts.minibatch ); -- This instantiates the model CREATE MODEL INSTANCE dblife_cut_mi EXAMPLES dblife_cut(x,y,score) MODEL SPEC cut INIT_FUNCTION examples.CUTS.stochast_cuts.init STOP WHEN examples.CUTS.stochast_cuts.stopping_condition ;
1,2. Model Specification and Instantiation
This specification creates a python-type "cut" model which is stored in the database as a byte array. The data items are specified by 3 values: integer x, integer y, and float weight. We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.
For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model.
In the code section below, you can see the loss, gradient, init, and stopping condition function that the user provides. Note that each of the functions are defined in a few lines of python.
class GraphModel: def __init__(self, nNodes, S, T, array): nNodes = max([nNodes,max(S), max(T)]) self.nodes = array for s in S: self.nodes[s] = 0.0 for t in T: self.nodes[t] = 1.0 self.terminals = set(S).union(set(T)) self.stepsize = 5e-2 self.steps = 1 def c_grad(self, int x, int y, double w, double stepsize): if self.nodes[x] == self.nodes[y]: return diff = math.copysign(stepsize*w, self.nodes[x] - self.nodes[y]) if x not in self.terminals: self.nodes[x] = clip(self.nodes[x] - diff) if y not in self.terminals: self.nodes[y] = clip(self.nodes[y] + diff) def grad(self, d): self.steps += 1 cdef double stepsize = 0.1/(1+self.steps) self.c_grad(d[0], d[1],d[2], stepsize) def loss(self, d): (x,y,w) = d return w*abs(self.nodes[x] - self.nodes[y]) def stopping_condition(s, loss): if not (s.has_key('state')): s['state'] = 0 s['last_loss'] = loss.value return False else: s['state'] += 1 print "\tlast_loss={} loss_value={} diff={}".format(s['last_loss'], loss.value, abs(s['last_loss'] - loss.value)) done = (s['state'] > 1000) or (abs(s['last_loss'] - loss.value) < 1e-3) done = done and (s['state'] > 2) s['last_loss'] = loss.value return done def init(): import src.globals.global_utils as gu conn = gu.connect_db() cur = conn.cursor() cur.execute("SELECT max(eid) FROM dblife_entities") (nNodes,) = cur.fetchone() conn.close() S = range(10) T = range(30,50) array = numpy.array([0.5]*nNodes) return (GraphModel(nNodes+1, S, T, array),array) def grad(m,data): m[0].grad(data) return m def loss(m,data): return m[0].loss(data)
Coming soon.
Running the Example
Open VICTOR_SQL/examples/CUTS/stochast_cuts_front.py and make sure that it uses the right paths for include.
Run the following commands to run the example:
$ cd VICTOR_SQL/examples/CUTs $ make $ ../../bin/victor_front.py cuts.spec