Wisci/DeepDive

Introduction

Wisci (IPA: /ˈwɪski/) is a research project with the goal of understanding the challenges in building, scaling, and maintaining a probabilistic inference system.

The demonstrations below (Wisci and DeepDive) enrich Wikipedia with structured data that is extracted from both unstructured sources, such as text, video, audio, and also existing structured sources. Wikipedia (as with any other human-curated database) is high-precision as the facts in the database are manually entered by knowledgable users. Wikipedia is also low-recall. Although entities in Wikipedia are mentioned on a host of webpages, blogs, tweets, news articles, other databases, the information carried by those mentions is not described in Wikipedia. To improve the recall of Wikipedia, Wisci reads many of these sources and fuses information extracted from these sources together.

In contrast to many other Natural Language Understanding (NLU) projects whose goal is to perform higher quality linguistic processing, Wisci's goal is not to build better NLU tools. Wisci's goal is to combine many best-of-breed modules together -- ideally in a plug-and-play fashion. Wisci addresses the technical challenges of reconciling conflicting sources of information in a way that is principled and scalable using the Felix system. A 6.5-minute video explaining the motivation for Wisci is here.

Wisci and DeepDive

Check out Barak Obama's enhanced page in Wisci and the same page in DeepDive (an early version that will be Wisci's successor)! Every Wikipedia page is reflected in both Wisci and DeepDive. To browse different pages, either follow the links in a Wisci/DeepDive page, or edit the URL directly to see the page for any entity in Wikipedia. Below is a triple of URLs, a Wikipedia URL together with its correspoding Wisci and DeepDive URLs:

Wikipedia: http://en.wikipedia.org/wiki/Barack_Obama 
WiscI: http://research.cs.wisc.edu/hazy/wikidemo/index.php/Barack_Obama
DeepDive: http://research.cs.wisc.edu/hazy/demos/deepdive/index.php/Barack_Obama 

Video Demo

Watch on YouTube

Watch Download (182MB)

NOTE: The demo website shown in the video is a very preliminary alpha version and is made to entertain the hazy group: it uses an old Web crawl from 2009, the user interface has not been thoroughly tested and polished, and we are yet to tune the underlying algorithms for better data quality. Stay tuned for the arrival of a major UI/data update and public release very soon!

Acknowledgements

Wisci is generously supported by DARPA under prime contract no. FA8750-09-C-0181 managed by the Air Force Research Laboratory (AFRL), and gifts or research awards from Google, Greenplum, Johnson Controls, LogicBlox, Microsoft, and Oracle. Any opinions, findings, and conclusion or recommendations expressed in this work are those of the authors and do not necessarily reflect the views of any of the above sponsors including DARPA, AFRL, or the US government. We are also grateful to the generous support from the Condor team at UW-Madison.