Aditya Parameswaran

I am an assistant professor of Computer Science at the University of Illinois (UIUC) . My research interests are broadly in simplifying and improving data analytics, i.e., helping users make better use of their data.

My work involves building real data analytics systems with principled foundations, designing algorithms (with formal guarantees) for the systems, as well as mining data obtained from such systems.

Biographical Sketch

Aditya Parameswaran is an Assistant Professor in Computer Science at the University of Illinois (UIUC). He spent the 2013-14 year visiting MIT CSAIL and Microsoft Research New England, after completing his Ph.D. from Stanford University, advised by Prof. Hector Garcia-Molina. He is broadly interested in data analytics, with research results in human computation, visual analytics, information extraction and integration, and recommender systems.

Aditya is a recipient of the Arthur Samuel award for the best dissertation in CS at Stanford (2014), the SIGMOD Jim Gray dissertation award (2014), the SIGKDD dissertation award runner up (2014), the Key Scientific Challenges Award from Yahoo! Research (2010), three best-of-conference citations (VLDB 2010, KDD 2012 and ICDE 2014), the Terry Groswith graduate fellowship at Stanford (2007), and the Gold Medal in Computer Science at IIT Bombay (2007).


  • March 10, 2015: Four more new preprints in the last month! These were:
    • our paper on SeeDB for query driven automatic visualization generation;
    • our jellybean paper on counting objects in images; turns out we can do way better than humans or computer vision algorithms!
    • our paper on debiasing of batches; crowdsourcing practitioners often use batching to save costs, but this can lead to non-independence: we deal with this issue.
    • our versioning theory paper; to build a solid foundation for our DataHub project, we explored how to trade off storage and retrieval costs.
  • February 9, 2015: Our paper on exploiting correlations to avoid expensive predicate evaluations was accepted at SIGMOD 2015!
  • February 12, 2015: Many thanks to Google for their support via a Google Faculty Research Award! Excited to be building the next generation visualization toolkit.
  • December 10, 2014: Three new preprints in the last month! These were:
    • smart drill-down, our tool for zooming into portions of a dataset quickly;
    • our paper on globally optimal crowdsourcing quality management; and
    • our paper on gathering data using the crowd, exploiting a hierarchy and MABs.
  • November 10, 2014: Three new paper acceptances in the last month!
  • October 10, 2014: Thrilled to be a part of the new NIH BD2K (Big Data 2 Knowledge) center for revolutionizing genomic data analysis. Thank you, NIH, for the support!
  • September 2, 2014: We can finally talk about our exciting new project, titled Datahub (i.e., GitHub for Data) on collaborative data science and version management. The ambitious goal is to eliminate the pain-points of data book-keeping while doing collaborative data science.
  • September 1, 2014: Our paper on pricing for crowdsourcing tasks has been accepted for presentation at VLDB 2015! The paper studies a simple, but important problem: if you have a batch of tasks and a deadline, how should you vary price to meet the deadline?
  • August 25, 2014: Pleasantly surprised to be selected as the KDD dissertation award runner-up, having already been given the SIGMOD dissertation award! Feel truly lucky to have two communities - SIGMOD and KDD - supporting my work!
  • August 24, 2014: Had a blast being a keynote speaker at KDD IDEA 2014 - a big thank you to the organizers for inviting me! If this year was any indication, IDEA is going to flourish as a workshop for many years!
  • August 20, 2014: Our paper on optimally learning maximum-likelihood worker accuracies has been accepted as a work-in-progress paper for HCOMP 2014! The paper tackles the problem of worker quality estimation in a way EM-based algorithms cannot - by providing optimality guarantees.
  • August 15, 2014: Started at Illinois; exciting times ahead!

Synergistic Activities

I am currently serving on or have served on the Program Committees of: VLDB 2013-14-15, KDD 2015, SIGMOD 2014-15, WSDM 2015, WWW 2014, SOCC 2014, HCOMP 2014, ICDE 2014, and EDBT 2014.

Visual Analytics

Automatically recommending visualizations or visual summaries on very large volumes of data

View details »

Interactive Analytics

Interactive querying of large datasets, keeping track of versions, while possibly sacrificing slightly on accuracy of query results

View details »

Crowd-Powered Analytics

Using crowdsourcing to process and make sense of large volumes of data

View details »

Information Extraction

Extracting information from the web, integrating it with existing information, and surfacing this information to users

View details »

Recommendation Systems

Building scalable recommendation systems that take into account contextual information

View details »

Recent Releases

Selected Projects


DataHub: Collaborative Dataset Version Management at Scale

DataHub (or "GitHub for Data") is a system that enables collaborative data science by keeping track of large numbers of versions and their dependencies compactly, and allowing users to progressively clean, integrate and visualize their datasets.


DataSift: A Crowd-Powered Search Engine

DataSift is a crowd-powered search engine that is useful for long or complex queries that traditional search engines have trouble with, or with queries that contain rich media, such as images or videos.


SeeDB: Automatic Visualization Recommendation

SeeDB automates the task of finding the right visualization for a query, significantly simplifying the laborious task of identifying appropriate visualizations.


Crowd Algorithms

Our work has developed a number of algorithms for gathering, processing, and understanding data obtained from humans (or crowds), while minimizing cost, latency, and error.


NeedleTail: A System for Browsing

NeedleTail is a system tuned towards instantly returning a small number (a "screenful") of query results very quickly on extremely large datasets.