We provide both the source code of Staccato (with associated ETL utilities) and a VM with Staccato installed on it.
Staccato Source Code[Notes 1, 2] (Compressed: 1.6 MB)
For instructions on how to install Staccato, please see: Installation.
Staccato Data (Compressed: 312 MB)Download Data
These are the main datasets reported in the paper. Scalability datasets will be made available on request.
Staccato in Virtual Machine
This is an Ubuntu 10.10 Virtual Machine (.vmx). Login details: Username = staccato, Password = password.
[Note 1]: The code and the VM above assume the alphabet of the FSTs (as output by the open-source OCR recognition tool OCRopus in our case) is the ASCII character set 32-125. However, newer versions of OCRopus seem to use the set 0-125. Staccato patched for this alphabet change can be downloaded here.
[Note 2]: The older version of the Staccato approximation code, which was used for the paper, can be downloaded here. This version is substantially slower than the latest one above, since the implementation was subsequently optimized for performance.