Columbus Usage with Examples

We now explain the syntax for using Columbus operations to perform exploratory feature selection. The operations are invoked from the R console.

Get the Datasetnames and handler

An analyst can get the available datasetnames and get the handle for the dataset as follows

GetDatasetnames()
[1] "Telecom"
id <- GetDatasetId("Telecom")

GetDatasetnames retrieves the available datasets by querying the database. Further the analyst can print the features in the dataset by issuing the following command

print(GetFeatureNames(GetFeatureIndices(id), dataset.id = id))

Feature set operations:
In the Columbus system, three feature set operations are supported:

AssignFeatureSet
AddFeatureSet

DelFeatureSet

An example usage of the feature set operation is illustrated below

feat.set.1 <- AssignFeatureSet(c("DATAVOLUME", "NUMMMSOUT", "NUMVASOUT", "NUMSMSVASINC"), dataset.id = id)
feat.set.2 <- AssignFeatureSet(c("DURATIONFIXEDINC", "NUMSMSCMPINC", "NUMSMSINTEROUT"), dataset.id = id)
feat.set.3 <- AddFeatureSet(feat.set.1, feat.set.2, dataset.id = id)
feat.set.4 <- AssignFeatureSet(c("NUMSMSINTEROUT"), dataset.id=id)
feat.set.5 <- DelFeatureSet(feat.set.3, feat.set.4, dataset.od = id)

The data types of the parameters are given below

FeatureSetVector [Quoted string] or [Integer] Denotes the feature names in the dataset. If Integer values are given, then the indices are internally mapped to feature names
dataset.id [Integer] Dataset identifier.

Descriptive Statistic Operation
In the columbus system, we support the following descriptive statistic operations

CorrelationX: Given a feature set and a dataset id, the function computes the pair wise correlation among the features in the dataset.
CorrelationY: The function computes the correlation with the target. Note that the target is implicit from the dataset.id given.
CoeffLearner: The function learns the co-efficients for the given feature set and the dataset. The function is generic and we currently support two learning models: Incremental Gradient Descent and Conjugate Gradient. The configuration parameters for the learning methods are exposed to the user, where she can specify appropriate values.

An Example usage of descriptive statistic operation is illustrated below

corrx.val <- CorrelationX(feat.set.1, dataset.id = id)
corry.val <- CorrelationY(feat.set.2, dataset.id = id)
igd.coef.learn <- CoeffLearner(feat.set.1, type="igd", num.iters = 5, step.size = 0.01, decay = 1, init.wt = 0)
cg.coef.learn <- CoeffLearner(feat.set.2, type="cg", num.iters = 5, init.wt = 0)

The datatypes of the parameters are given below

type : [Quoted string] Identifies the type of coefficient learner. Allowed string constants : "igd", "cg"
num.iters : [Integer] Denotes the number of iterations that the co-efficient learner should be iterated.
step.size : [Float] Denotes the learning rate in IGD
decay : [Float] Denotes the devay value in IGD
init.wt : [Float] Initial weight to be assigned.

Evaluate Operation
In the Columbus system, a feature set evaluation involves two phases: train and test. Train phase is nothing but learning coefficients for the features and the test phase can be crossvalidation or Akaike Information Criterian score. An example usage of evaluate operation is given below

cv.eval <- Evaluate(feat.set.1, train.type = "igd", eval.type = "cv", num.iters = 5, step.size = 0.01, decay = 1, init.wt = 0, num.folds = 5)
aic.eval <- Evaluate(feat.set.1, train.type = "cg", eval.type = "aic", num.iters = 5, init.wt = 0)

Additional parameters for the evaluate operation include

eval.type : [Quoted string] Denotes the evaluation type to be used. Valid string constants are "cv" and "aic"

Explore Operation
In the Columbus system, given a feature set explore operation can be used to add or delete one another feature from the available set of features. It involves evaluating a group of feature sets and choosing a best feature set. An example usage of explore operation is given below

add.feat.set <- StepAdd(inp.set = feat.set.1, mask.set = feat.set.2, train.type = "igd", eval.type = "cv", num.iters = 5, step.size = 0.01, decay = 1, init.wt = 0, num.folds = 5)
del.feat.set <- StepAdd(inp.set = feat.set.1, train.type = "cg", eval.type = "aic", num.iters = 5,  init.wt = 0)

Additional parameters for the evaluate operation include

mask.set : [feature set]: Denotes the list of features that should be omitted while adding a new feature.

An analyst can use a combination of above operations to write a feature selection program. An example feature selection program is given below.

fs1 <- AssignFeatureSet(c("DATAVOLUME", "NUMMMSOUT", "NUMVASOUT", "NUMCALLSFIXEDOUT", "DURATIONFIXEDINC", "NUMSMSINTEROUT", "NUMSMSCMPINC", "NUMSMSVASINC"), dataset.id = id)
fm1 <- CorrelationX(fs1, dataset.id = id) 
fs2 <- AssignFeatureSet(c("NUMCALLSFIXEDOUT"), dataset.id = id) 
fs3 <- DelFeatureSet(fs1, fs2, dataset.id = id) 
fm2 <- CoeffLearner(fs3, "igd", num.iters = 2, dataset.id = id) 
fs4 <- BestK(fm2, 6, dataset.id = id) 
fm6 <- Evaluate(fs4, "cg", "cv", num.iters=2, num.folds=3, dataset.id=id) # change to 5 folds
fs5 <- StepDel(fs4, "cg", "aic", dataset.id = id)

The analyst can choose the save the program as given below

SaveSession("ColumbusProgram1", dataset.id = id)

Further she can run the same program in batch mode and on a different or same dataset as shown below.

ExecuteProgram("ColumbusProgram1", dataset.id = id)

For more detailed examples, please refer to the Examples Page.

Download

Examples

Documentation

Columbus Usage with Examples

Get the Datasetnames and handler

Feature set operations:

Descriptive Statistic Operation

Evaluate Operation

Explore Operation