Overview
Trio is a new kind of
database management system: one in which data,
uncertainty of the data, and data lineage are all
first-class citizens. Trio is based on an extended relational model
called ULDBs, and it supports a SQL-based query language called
TriQL. (See the online TriQL Language
Manual.)
A wide variety of application domains can benefit from using Trio, including but not limited to: scientific and sensor data management; data cleaning and integration; information extraction systems; and approximate and hypothetical query processing.
We have completed an initial working prototype of the Trio system,
available for download or use over the web as detailed below. A book chapter we
wrote provides a comprehensive overview of Trio as of July 2008.
The Trio project is supported in part by the National Science
Foundation under grants IIS-0414762
and IIS-0904497,
and by a grant from the Boeing Corporation.
Trio Online Demo
The Trio system is available for use over the web. You can create
your own databases or use our samples, and you can run TriQL queries
and browse the results.
- Click here to get
started.
We endeavor to keep the system up as much as possible,
but if you find it not working please let us know.
- We develop primarily using the Firefox and Safari browsers, although there are usually no problems with IE.
- Problems or feedback? Email triohelp@db.stanford.edu
Download Trio Source Code and Binaries
The Trio prototype is available as open-source code under the BSD license.
To install the Trio system at your own site, a package containing the source code and some precompiled binaries is available for download here.
The Trio client can be used via a convenient browser interface (TrioExplorer), through a command-line interface (trioplus), using direct API calls linked from another Python script, or as an external command-line call.
We are currently running Trio successfully under Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server). For more eccentric environments,
the required Trio binaries can be recompiled.
News
- [March 2009] One of our directions is to
customize and extend Trio for specific application domains. We're
currently developing an "entity-resolution workbench" using Trio; see
the Trio-ER
technical report.
- [July 2008] Anish Das Sarma posted an excellent blog entry: Why Uncertainty in Data is Great.
- [July 2008] We've written a chapter, Trio: A System for Data,
Uncertainty, and Lineage, that appears in the book Managing
and Mining Uncertain Data. This chapter is the best current
overview of the Trio project.
- [June 2007] We've made a small but important change
to the ULDB data model on which Trio is based. See
An Update to the Trio Data Model for an explanation and justification of the change.
- [June 2007] Dbworld message announcing the open-source release of the Trio prototype.
- [January 2007] Dbworld message announcing the web release of the Trio online demo.
- [January 2007] A demonstration of the latest
Trio prototype was given at the CIDR 2007 conference.
Here is the 6-page paper that appeared in the conference proceedings.
- [September 2006] On 9/22/06 several groups working in
the area of uncertain and probabilistic data management got together
at Stanford for an informal meeting. Here are some slides and notes from the meeting.
- [June 2006] A demonstration of the Trio prototype was
given at the VLDB 2006 conference.
Here is the 4-page demonstration description
that appeared in the conference proceedings, and here are some
photos from the demo session at the conference.
- [April 2006] A
news article on the Trio project (with a terrific photo) appeared in the March 22, 2006
Stanford Report.
Subsequently, Trio was featured in an April 20, 2006
PC World article.
Papers: Overviews and Demo Descriptions
In reverse chronological order of when they were written
- P. Agrawal, R. Ikeda, H. Park, and J. Widom. Trio-ER: The Trio System
as a Workbench for Entity-Resolution. Technical Report, March
2009.
- J. Widom. Trio: A System for Data,
Uncertainty, and Lineage. In C. Aggarwal, editor, Managing
and Mining Uncertain Data, Springer, 2009.
- M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom,
P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara.
Trio-One: Layering
Uncertainty and Lineage on a Conventional DBMS. Proc.
Third Biennial Conference on Innovative Data Systems Research (CIDR '07),
Pacific Grove, California, January 2007. Demonstration description.
- P. Agrawal, O. Benjelloun, A. Das Sarma, C. Hayworth,
S. Nabar, T. Sugihara, and J. Widom.
Trio: A System for Data,
Uncertainty, and Lineage. Proc. 32nd Intl.
Conference on Very Large Data Bases, pages 1151-1154, Seoul, Korea,
September 2006. Demonstration description.
- O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom.
An Introduction to ULDBs
and the Trio System. IEEE Data Engineering Bulletin, Special Issue
on Probabilistic Databases, 29(1):5-16, March 2006.
- J. Widom. Trio: A System for
Integrated Management of Data, Accuracy, and Lineage.
Proc. Second Biennial Conference on Innovative Data Systems Research
(CIDR '05), Pacific Grove, California, January 2005.
Papers: Technical Topics
In reverse chronological order of when they were written
- P. Agrawal and and J. Widom. Generalized Uncertain
Databases: First Steps. Proc. Workshop on Management of Uncertain
Data, Singapore, September 2010.
- P. Agrawal and J. Widom. Continuous Uncertainty in
Trio. Proc. Workshop on Management
of Uncertain Data, Lyon, France, August 2009.
- R. Ikeda and J. Widom. Outerjoins in Uncertain
Databases. Proc. Workshop on Management
of Uncertain Data, Lyon, France, August 2009.
- A. Das Sarma, M. Theobald, and J. Widom. LIVE: A Lineage-Supported
Versioned DBMS. Proc. 22nd Intl.
Conference on Scientific and Statistical Database Management,
Heidelberg, Germany, June 2010.
- P. Agrawal, A. Das Sarma, J.D. Ullman, and J. Widom. Foundations of
Uncertain-Data Integration. Proc. 36th
International Conference on Very Large Data Bases, Singapore,
September 2010.
- A. Das Sarma, P. Agrawal, S. Nabar, and J. Widom. Towards Special-Purpose
Indexes and Statistics for Uncertain Data. Proc. 2008 Workshop on
Management of Uncertain Data, Auckland, New Zealand, August 2008.
- A. Das Sarma, J.D. Ullman, and J. Widom. Schema Design for
Uncertain Databases. Proc. 3rd Alberto Mendelzon Workshop on
Foundations of Data Management, Arequipa, Peru, May 2009.
- R. Murthy, R. Ikeda, and J. Widom. Making Aggregation Work
in Uncertain and Probabilistic Databases. IEEE Transactions on
Knowledge and Data Engineering, 23(8):1261-1273, August 2011. Initial
shorter version appeared in Proceedings of the 2007 Workshop on
Management of Uncertain Data, pages 76-90, Vienna, Austria, September
2007.
- A. Das Sarma, M. Theobald, and J. Widom.
Exploiting Lineage for
Confidence Computation in Uncertain and Probabilistic Databases.
Proc. 24th Intl.
Conference on Data Engineering, Cancun, Mexico, April 2008
- P. Agrawal and J. Widom. Confidence-Aware Join
Algorithms. Proc. 25th Intl.
Conference on Data Engineering, Shanghai, China, March 2009.
- O. Benjelloun, A. Das Sarma, A. Halevy, M. Theobald, and
J. Widom. Databases
with Uncertainty and Lineage. VLDB Journal, 17(2):243-264, March
2008. Note: much of the material in this paper appeared in
preliminary form in ULDBs: Databases with Uncertainty and
Lineage, cited next, and Trio-One: Layering Uncertainty and
Lineage on a Conventional DBMS, cited above.
- O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom.
ULDBs: Databases with
Uncertainty and Lineage. Proc.
32nd Intl. Conference on Very Large Data Bases, pages 953-964,
Seoul, Korea, September 2006.
- A. Das Sarma, O. Benjelloun, A. Halevy, S. Nabar, and
J. Widom. Representing
Uncertain Data: Models, Properties, and Algorithms. VLDB Journal,
18(5):989-1019, October, 2009. Note: This journal paper combines
material from the conference paper Working Models for Uncertain
Data, and the technical report Representing Uncertain Data:
Uniqueness, Equivalence, Minimization, and Approximation, cited
below.
- A. Das Sarma, S.U. Nabar, and J. Widom.
Representing Uncertain Data: Uniqueness, Equivalence, Minimization, and
Approximation. Technical Report, December 2005. Note: much of the
material in this technical report also appears in the journal paper
Representing Uncertain Data: Models, Properties, and
Algorithms, cited above.
- A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working Models for
Uncertain Data. Proc. 22nd Intl. Conference on Data Engineering,
Atlanta, Georgia, April 2006. Note: much of the material in this
conference paper also appears in the journal paper Representing
Uncertain Data: Models, Properties, and Algorithms, cited above.
Talks
In reverse chronological order of when they were first given
- Trio: A System for Data, Uncertainty,
and Lineage (current overview talk, updated May '07)
Given by Jennifer at various venues, 2006-07
Slides in ppt or
pdf
- Representation Formalisms for Uncertain Data
Given by Jennifer at UW/Microsoft Summer Research Institute, Aug. 2005
Slides in ppt or
pdf
- Trio: A System for Integrated Management of Data, Accuracy,
and Lineage (original vision talk)
Given by Jennifer at various venues, 2004-05
Slides in ppt or
pdf
People
- Faculty
- Graduate students
- Alums