Installing Columbus

This Web page describes how to set up Columbus and run a simple example. It is assumed that the target user of the system has basic familiarity with the Linux Operating System.

This documentation is created to be compatible with Red Hat or Ubuntu Linux Operating System running the bash shell. You might need to do minor modifications to the commands based on your environment. You do NOT need to have root access to your working machine in order to set up and run Columbus.

Source code (.tar.gz or .zip) is available in the download page.

1. Dependencies

You need to install the following dependency packages in order to run Columbus. The source code and examples in the Columbus release are compatible with the versions in parentheses.

Oracle 11g 64bit

Refer Oracle documentation for installation procedure.

R >= 2.15.2
R Packages to be installed
ROracle (1.1-10)

Requires client libraries that can be downloaded from here.
Detailed installation procedure is given here.

DBI (0.2-7)
hash (2.2.6)
data.table (1.8.8)
ggplot2 (0.9.3.1)
Rgraphviz

2. Set up the Database

Once the Oracle is installed , use the following command to startup the database from sqlplus command line.

oracle startup screen

If the Oracle instance is successfully started, it will display a success message as above.

Create a user for example columbus with the password set to columbus. Grant all privileges to the created user.
From the root of source code, change the directory to scripts and execute the following command from sqlplus as follows

sqlconfig

The above sql file will create required types and objects for the Columbus system. If the execution was successful, the output should look like the following

sql config execution

From the scripts directory, load the User Defined Functions in to the database using loadscripts.sh file. Note that, the file has to be modified to comply with the user name and password that was created in the previous steps. After successful loading of the scripts, one should see the following output

Now the tables and initial values required for the Columbus system is created by executing createColumbusTables.sql in the script directory from sqlplus command prompt. The output of the script should look like the following

createColumbusTables

If all the steps go through without any fatal errors, then the database is setup to run Columbus system. Congratulations.

If some of the above steps did not succeed, you can safely re-run it after solving the issue that caused the interruption.

3. Environment

The following environment variables are specific to oracle and must be set correctly to connect database instance from Columbus. The environment variables will be specific the path and sid of oracle installation. An example set of environment variables is given below.

export ORACLE_HOSTNAME=localhost.localdomain
export ORACLE_BASE=/u01/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/11.2.0/dbhome_1
export ORACLE_SID=AL32UTF8
export NLS_LANG=.AL32UTF8
export ORACLE_UNQNAME=AL32UTF8
export PATH=$PATH:$ORACLE_HOME/bin

Set the environment variable COLUMBUS_HOME pointing to the base of the source code directory. For example

 export COLUMBUS_HOME=/home/oracle/columbus

You need to install the following dependency packages in order to run Columbus. The source code and examples in the Columbus release are compatible with the versions in parentheses.

5. Load Test Data

Sample datafiles are located in the data folder. The schema for the data files are created by executing corresponding create files in the scripts directory. Run these files in the sqlplus command prompt in the usual way.

Once the schema is created, execute the control function using the sqlldr command from the scripts directory. Note that the control function must be invoked with appropriate username and password created. An example is illustrated below.

loading data

Once the data is loaded, we are now ready to use the system.

6. Run Columbus

Detailed usage information for each operation is given at the Using Columbus page , but we present simple invocation to test our Columbus installation.

Before executing, we have to initialize some parameters for the Columbus system. In the DataScientist directory, the config.R file contains the basic configuraion information. The variable config is a list and holds value for different configuration parameters. The parameters are explained below

repository : directory location where temporary R data files are stored.
user: user name to access the oracle database
pass: password to access the oracle database
read.intercept : intercept value for the read cost
write.intercept: intercept value for the write cost
read.cost : cost to read a value from a column in a tuple.
write.cost: cost to write a value to a column in a tuple

From the R command prompt, change the working directory to DataScientist and execute the

source('testprogram.R')

The execution should go through without any errors. The testprogram executes columbus operations in an interactive manner followed by batch execution.

After successful execution, one should observe the following message in the screen