There are two different Hogwild! releases. The newest version is an alpha release of Hogwild! which has been rewritten with a focus on performance and maintainability. The Hogwild!-Experiments release is the original Hogwild! used to preform the experiments in the Hogwild! paper. This version behaves slightly differently than the current alpha-release Hogwild!.
Alpha Source code and Datasets
The release is available with and without example datasets since the datasets are relatively large.
(Version: 03a; last updated: 11/7/2012)
Test with the following compilers:
- g++ (GCC) 4.4.6 20110731 (Red Hat 4.4.6-3)
- clang version 3.1 (tags/RELEASE_31/final)
Paper Version: Code and Experiments
This is the original version of Hogwild! used to conduct the experiments for the paper. The source code of Hogwild is small, but the data used in our experiments is quite sizable. For your convenience, we provide two packages: one package with source code only and another package with both source code and several (prepared) datasets we used:
(last updated: 11/21/2011)
Below is a list of datasets we experimented with and their links. ([+] indicates that it's included in the "code and data" package. We leave out datasets that are too large.)
- [+] Reuters Corpora RCV1 for SVM
- [+] DBLife Entity Mentions for MultiCut
- [+] Netflix for Matrix Factorization
- [-] Abdomen 3D Image for Cut
- [-] KDD Cup 2011 Data for Matrix Factorization
After downloading and unpacking, you can follow the instructions in README. As a quick guide, follow the following steps to run the experiments and generate graphs:
- Run "make", which will build binaries in the directory "bin".
- If you don't have a "data" directory, you probably want to download the code-and-data package of HOGWILD!
- Open "experiments/experiments_settings.py" to change data paths and other parameters as necessary.
- Run "python experiments/build_experiments.py" to generate command lines based on your settings in "experiments/experiments_settings.py".
- Run these command lines, which log to the directory "output".
- Run "python experiments/produce_graphs.py" to generate experiment graphs based on the logs in "output". The graphs are stored as PDF files in the directory "graphs". (Note that matplotlib is required.)
Christopher Ré, and
Stephen J. Wright.
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent.
Published on NIPS 2011.