Victor: Minimum Cut

Problem Definition

In graph theory, a minimum cut of a graph is a cut whose cutset has the smallest number of elements (unweighted case), or smallest sum of weights possible (Definition taken from here).

Data set

DBLife is a prototype system that manages information for the database research community (see DBLife is developed by the Database group at University of Wisconsin-Madison and Yahoo! Research. It contains information about Databases papers, conferences, people and more.

Victor Model and Code

The following code shows the model specification and model instantiation for the Minimum Cut problem applied to the DBLife Web site data set.

-- This deletes the model specification

-- This creates the model specification
   model_type=(python) as (w),
   data_item_type=(int,int,float8) as (x,y,weight),   

-- This instantiates the model
   EXAMPLES dblife_cut(x,y,score)
   INIT_FUNCTION examples.CUTS.stochast_cuts.init
   STOP WHEN examples.CUTS.stochast_cuts.stopping_condition

1,2. Model Specification and Instantiation

This specification creates a python-type "cut" model which is stored in the database as a byte array. The data items are specified by 3 values: integer x, integer y, and float weight. We specify the loss function, and that the scores are going to be aggregated by the SUM aggregator. Finally, we define the gradient step for the model.

For instantiating the model, we specify how to initialize the model by giving it a function name. Also, we specify when we should stop refining the model.

In the code section below, you can see the loss, gradient, init, and stopping condition function that the user provides. Note that each of the functions are defined in a few lines of python.

class GraphModel:
   def __init__(self, nNodes, S, T, array):
      nNodes = max([nNodes,max(S), max(T)])
      self.nodes = array
      for s in S: self.nodes[s] = 0.0
      for t in T: self.nodes[t] = 1.0
      self.terminals     = set(S).union(set(T))
      self.stepsize      = 5e-2
      self.steps         = 1

   def c_grad(self, int x, int y, double w, double stepsize):
      if self.nodes[x] == self.nodes[y]:

      diff = math.copysign(stepsize*w, self.nodes[x] - self.nodes[y])

      if x not in self.terminals: self.nodes[x] = clip(self.nodes[x] - diff)
      if y not in self.terminals: self.nodes[y] = clip(self.nodes[y] + diff)

   def grad(self, d):
      self.steps += 1
      cdef double stepsize = 0.1/(1+self.steps)
      self.c_grad(d[0], d[1],d[2], stepsize)

   def loss(self, d):
      (x,y,w) = d
      return w*abs(self.nodes[x] - self.nodes[y])
def stopping_condition(s, loss):
   if not (s.has_key('state')):
      s['state'] = 0
      s['last_loss'] = loss.value
      return False
      s['state'] += 1
      print "\tlast_loss={} loss_value={} diff={}".format(s['last_loss'], loss.value, abs(s['last_loss'] - loss.value))
      done = (s['state'] > 1000) or (abs(s['last_loss'] - loss.value) < 1e-3)
      done = done and (s['state'] > 2)
      s['last_loss'] = loss.value
      return done

def init():
   import src.globals.global_utils as gu
   conn = gu.connect_db()
   cur  = conn.cursor()

   cur.execute("SELECT max(eid) FROM dblife_entities")
   (nNodes,) = cur.fetchone()
   S     = range(10)
   T     = range(30,50)
   array = numpy.array([0.5]*nNodes)
   return (GraphModel(nNodes+1, S, T, array),array)

def grad(m,data):
   return m

def loss(m,data):
   return m[0].loss(data)

3. Model Application

Coming soon.

Running the Example

Open VICTOR_SQL/examples/CUTS/ and make sure that it uses the right paths for include.

Run the following commands to run the example:

$ cd VICTOR_SQL/examples/CUTs
$ make
$ ../../bin/ cuts.spec