Project Elementary

Installation Guide for Elementary

SIGMOD Paper of Elementary

Table of Contents

This page contains the following information:

  1. Installation Guide
  2. User Guide
  3. More Options (Using Greenplum)
  4. Future Improvements

1. Installation Guide

Elementary relies on C++ and PostgreSQL. This section explains how to install the prerequisites.

Install C++

We require gcc/g++ newer than 4.7.2. We have successfully tested our code with gcc/g++ 4.7.2 on RHEL 5 and MacOS 10.7.

Install and Configure PostgreSQL

  1. We have tested our code on PostgreSQL 9.2.3. If you don't have PostgreSQL 9.2.3 on your machine, please download and install it. Let PG_DIST be the path where you unpack the source distribution. Let PG_PATH be the location where you want to install PostgreSQL. Run the following commands:
    cd PG_DIST
    ./configure --prefix=PG_PATH
    gmake; gmake install
    cd PG_PATH/bin
    initdb -D PG_PATH/data
    postgres -D PG_PATH/data &
    createdb test
    psql test
    
    initdb initializes a directory to store databases; postgres launches the PostgreSQL server daemon; createdb creates a new database (with the name 'test' in our example); and psql takes you to the interactive console of PostgreSQL, where you can issue numerous kinds of SQL queries. Type '\h' for help, and '\q' to quit.
  2. Create a PostgreSQL super user with a name, say, "postgres". Look here, or simply run the following command:
    PG_PATH/bin/createuser -s -P postgres
    
    You will be prompted for a password. Henceforth, let's assume the password is "strongPasswoRd".
  3. Create a database with a name, say, "bugs". Look here, or simply run the following command:
    PG_PATH/bin/createdb bugs
    

Compile Elementary

Download Elementary source from the download page. Please follow the instructions below to install it.

  1. After unpacking, go to Elementary0.3 (lets call this $ELE_HOME), where you will see a Makefile. First make the dependencies using:
    $ CC="PATH_TO_CC" make dep
    

    where CC and CPP point to the installed location of gcc/g++. On a Mac, you need to further specify BUILD_PREFIX="--build=x86_64-apple-darwin10.0.0"

  2. Then we build Elementary by:
    $ CC="PATH_TO_CC" CPP="PATH_TO_CPP" PG_PATH="PATH_TO_POSTGRES_INSTALLATION" make
    

    where PG_PATH is the directory where PostgreSQL was installed in the previous step. This will produce a binary called "ele" in the same folder.

  3. You then need to set up an environment variable to include ./lib/urcu/lib/
    $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./lib/urcu/lib/
    

Configuration

  1. Go to $ELE_HOME; you should see a configuration file named "bugs_config". This file contains a list of parameters that help you configure the requisites for Elementary. The figure below shows the default values. (# indicates comments. This is used here just for explanatory purposes. Please don't use # in the "bugs_config" file).
    # Database username
    PostgreSQL_uname=postgres
    
    # The password for the database user
    PostgreSQL_password=strongPasswoRd
    
    # The hostname
    PostgreSQL_host=localhost
    
    # The port in which postgres is running
    PostgreSQL_port=5433
    
    # The database name
    PostgreSQL_dbname=bugs
    
    # This parameter controls the amount of memory that will be consumed by the OpenBUGS Interface. 
    # A conservative setting would be [Available_Memory_For_Elementary]/400.
    Rows_per_fetch=2000000
    
    #EXP_STORAGE MM (or) FILE. Default is MM (Main Memory). 
    EXP_STORAGE=MM
    
    

2. User Guide

We use a simple example to illustrate how to use OpenBugs Model Specification Interface for Elementary. The input/output formats and command options are compatible with OpenBUGS language Specification.

Input

The Input consists of the standard Model File, Data File, and Inits File, as specified in OpenBUGS. The location of these files can be specified in the "bugs_config" file, as shown below. One can also set monitors (as can be done in OpenBUGS) on the variables one wants to observe the results. The monitors should be specified as comma separated values without any spaces in between them. The default is empty, which monitors all the variables.

#The model file. Default is test.model
Bugs_Model=test.model

#The data file. Default is test.data
Bugs_Data=test.data

#The inits file. Default is test.inits
Bugs_Inits=test.inits

#Monitors on variables.
Bugs_Monitors=alpha,beta

Inference

To run inference for the Model specified, one can specify the following command from $ELE_HOME. In the command, --work_dir specifies the TEMPORARY_DATA_DIR, which will be used by elementary for storing temporary data. --app "bugs" invokes the OpenBUGS interface. --nepoch specifies the NUMBER_OF_UPDATES to the model (as specified in OpenBUGS)

./ele --work_dir "TEMPORARY_DATA_DIR" --app "bugs" --nepoch "NUMBER_OF_UPDATES"

An example inference output for the Pumps Model is as shown below

3. More Options

We also have a version where we use Greenplum. For this version, alongside the previous installation of PostgreSQL, a user also needs to install Greenplum. We have tested our code on Greenplum 4.2. If you don't have Greenplum installed on your machine, please download and install it. After installation, create a database with the name "bugs" in Greenplum. Once it is done, add the Greenplum specific parameters to "bugs_config" as shown below:

# Database username
Greenplum_uname=greenplum

# The password for the database user
Greenplum_password=greenplumPassWOrd

# The hostname
Greenplum_host=localhost

# The port in which greenplum is running
Greenplum_port=5432

# The database name
Greenplum_dbname=bugs

To use Greenplum while doing inference, please specify the command as follows.

./ele --work_dir "TEMPORARY_DATA_DIR" --app "bugs" --bugs_hybrid --nepoch "NUMBER_OF_UPDATES"

4. Current Unsupported features from the OpenBUGS language Specification

  1. Data Transformations
  2. Wishart and Generalized F distribution
  3. Truncation
  4. Rectangular format for data
  5. Comments are not yet supported in the model, data, or inits file.
  6. Scalar Functions: cut, density, deviance, gammap, integral, post.p.value, prior.p.value,replicate.post(s),replicate.prior(s), solution, cumulative.
  7. Vector Functions: interp.lin, inverse, logdet, eigen.vals, ode, prod, p.valueM, rank, ranked, replicate.postM, sort

Also, when sampling from distributions with very high variance, we don't get the same results as OpenBUGS. We are currently working on this!