Anish Das Sarma's Publications


2014

Fusing Data With Correlations Ravali Pochampally, Anish Das Sarma, Xin Luna Dong, Alexandra Meliou, Divesh Srivastava. In SIGMOD, Utah, USA June 2014.
Crowd-Powered Find Algorithms Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina and Alon Halevy. In ICDE, Chicago, IL, USA March 2014.
Anchor-Points Algorithms for Hamming and Edit Distances Using MapReduce Foto Afrati, Anish Das Sarma, Anand Rajaraman, Pokey Rule, Semih Salihoglu and Jeffrey Ullman. In ICDT, Athens, Greece March 2014.

2013

Data Cleaning: A Practical Perspective Venkatesh Ganti, Anish Das Sarma. In Morgan and Claypool Publishers, September 2013.
Consistent Thinning of Large Geographical Data for Map Visualization Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy. In ACM Transactions on Database Systems (TODS), May 2013. (Invited from best papers of SIGMOD 2012.)
Optimal Hashing Schemes for Entity Matching Nilesh Dalvi, Vibhor Rastogi, Anirban Dasgupta, Anish Das Sarma, Tamas Sarlos. To appear in WWW, Rio de Janeiro, Brazil, May 2013.
Upper and Lower Bounds on the Cost of a Map-Reduce Computation Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman. To appear in PVLDB, Trento, Italy, August 2013.
Finding Connected Components on Map-reduce in Logarithmic Rounds Vibhor Rastogi, Ashwin Machanavajjhala, Laukik Chitnis, Anish Das Sarma. International Conference on Data Engineering (ICDE), Brisbane, Australia, April 2013.

2012

Designing Good Algorithms for Map-Reduce and Beyond. Foto N. Afrati, Magdalena Balazinska, Anish Das Sarma, Bill Howe, Semih Salihoglu, and Jeffrey D. Ullman. Synposium on Cloud Computing (SoCC) USA, 2012. (Tutorial)
An Automatic Blocking Mechanism for Large-Scale De-duplication Tasks. Anish Das Sarma, Ankur Jain, Ashwin Machanavajjhala, Philip Bohannon. To appear in CIKM, Maui Hawaii, USA, 2012. (slides)
Dynamic Covering for Recommendation Systems. Ioannis Antonellis, Anish Das Sarma, Shaddin Dughmi. To appear in CIKM, Maui Hawaii, USA, 2012. (slides)
Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation Foto N. Afrati, Anish Das Sarma, Semih Salihoglu, Jeffrey D. Ullman. In Cloud Futures, Berkeley, California, USA, May 2012. (slides)
Efficient Spatial Sampling of Large Geographical Tables Anish Das Sarma, Hongrae Lee, Hector Gonzalez, Jayant Madhavan, Alon Halevy. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Arizona, USA, May 2012. (slides) (Selected among "best papers of SIGMOD '12", invited to TODS special issue.)
Finding Related Tables Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Cong Yu. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Arizona, USA, May 2012.
REX: Explaining Relationships between Entity Pairs. Lujun Fang, Anish Das Sarma, Cong Yu, Philip Bohannon. In Proceedings of the conference on Very Large Data Basess (VLDB), Istanbul, Turkey, August 2012.
Understanding Cyclic Trends in Social Choices Anish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy and Li Zhang. In Proceedings of the conference on Web Search and Data Mining (WSDM), Seattle, USA, 2012.
Fuzzy Joins Using MapReduce. Foto Afrati, Anish Das Sarma, David Menestrina, Aditya Parameswaran, Jeffrey Ullman. In Proceedings of the conference on International Conference on Data Engineering (ICDE), Washington, USA, April 2012. (slides)
Extracting Information from Google Fusion Tables. Marco Brambilla, Stefano Ceri, Nicola Cinefra, Anish Das Sarma, Fabio Forghieri, Silvia Quarteroni. In SeCO Book 2012.

2011

CoScan: Cooperative Scan Sharing in the Cloud. Xiaodan Wang, Anish Das Sarma, Christopher Olston, Randal Burns. In Proceedings of the Synmposium on Cloud Computing (SoCC), Portugal, 2011.
Human-Assisted Graph Search: It's Okay to Ask Questions. Aditya Parameswaran, Anish Das Sarma, Hector Garcia-Molina, Alkis Polyzotis, Jennifer Widom. In Proceedings of the conference on Very Large Data Basess (VLDB), Seattle, USA, 2011.
Dynamic Relationship and Event Discovery. Anish Das Sarma, Alpa Jain, Cong Yu. Proceedings of the Web-Search and Data Mining Conference (WSDM), Hong Kong, 2011.
Data Integration with Dependent Sources. Anish Das Sarma, Luna Dong, Alon Halevy. Proceedings of the International Conference on Extending Database Technology (EDBT), 2011. (slides)
Building a Generic Debugger for Information Extraction Pipelines. Anish Das Sarma, Alpa Jain, Philip Bohannon. Poster paper, CIKM, October 2011. (Full version)
Ibis: A Provenance Manager for Multi-Layer Systems. Christopher Olston, Anish Das Sarma. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research (CIDR) , Pacific Grove, California, January 2011.

2010

Uncertainty in Data Integration and Dataspace Support Platforms. Anish Das Sarma, Luna Dong, Alon Halevy. Book chapter, In Schema Matching and Mapping, ISBN: 978-3-642-16517-7, 2010.
Foundations of Uncertain-Data Integration. Parag Agrawal, Anish Das Sarma, Jeffrey Ullman, Jennifer Widom. Proceedings of the 36th International Conference on Very Large Data Bases (VLDB), Singapore, September 2010.
LIVE: A Lineage-Supportd Versioned DBMS. Anish Das Sarma, Martin Theobald, Jennifer Widom. In Proceedings of the 22nd International Conference on Scientific and Statistical Database Management (SSDBM), Heidelberg, Germany, June 2010.
I4E: Interactive Investigation of Iterative Information Extraction. Anish Das Sarma, Alpa Jain, Divesh Srivastava. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Indianapolis, Indiana, USA, June 2010.
Synthesizing View Definitions from Data. Anish Das Sarma, Aditya Parameswaran, Hector Garcia-Molina, Jennifer Widom. In Proceedings of the International Conference on Database Theory (ICDT), Lausanne, Switzerland, March 2010.
Ranking Mechanisms in Twitter-Like Forums. Anish Das Sarma, Atish Das Sarma, Sreenivas Gollapudi, Rina Panigrahy. In Proceedings of the International Conference on Web Search and Data Mining (WSDM), New York City, USA, February 2010.

2009

Managing Uncertain Data. Anish Das Sarma. Ph.D. Thesis, Stanford University, November 2009.

Functional Dependency Generation and Applications in Pay-As-You-Go Data Integration Systems. Daisy Zhe Wang, Luna Dong, Anish Das Sarma, Michael J. Franklin, Alon Halevy. In Proceedings of WebDB, Providence, Rhode Island, June 2009.
Representing Uncertain Data: Models, Properties, and Algorithms. Anish Das Sarma, Omar Benjelloun, Alon Halevy, Shubha Nabar, Jennifer Widom. In VLDB Journal, 18(5), 989-1019, October 2009. (Special issue on uncertain and probabilistic databases.)

Data Modeling in Dataspace Support Platforms. Anish Das Sarma, Luna Dong, Alon Halevy. In Conceptual Modeling: Foundations and Applications, Essays in Honor of John Mylopoulos, Springer Festschrift, LNCS 5600, 2009.

Schema Design for Uncertain Databases. Anish Das Sarma, Jeffrey Ullman, Jennifer Widom. Proceedings of the 3rd Alberto Mendelzon Workshop on Foundations of Data Management, Peru, May 2009.

Sailing the Information Ocean with Awareness of Currents: Discovery and Application of Source Dependence. Laure Berti-Equille, Anish Das Sarma, Xin Luna Dong, Amelie Marian, Divesh Srivastava. Proceedings of the 4th Biennial Conference on Innovative Data Systems Research (CIDR) , Pacific Grove, California, January 2009.
Uncertainty In Data Integration. Anish Das Sarma, Luna Dong, Alon Halevy. In C. Aggarwal, editor, Managing and Mining Uncertain Data, Springer, 2009.

2008

Towards Special-Purpose Indexes and Statistics for Uncertain Data. Anish Das Sarma, Parag Agrawal, Shubha Nabar, Jennifer Widom. Proceedings of the Workshop on Management of Uncertain Data (MUD), Auckland, New Zealand, August 2008.

Bootstrapping Pay-As-You-Go Data Integration Systems. Anish Das Sarma, Luna Dong, Alon Halevy. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Vancouver, Canada, June 2008. (Selected among "outstanding papers of SIGMOD '08", invited to special issue of Information Systems - declined.)

Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases. Anish Das Sarma, Martin Theobald, Jennifer Widom. Proceedings of the 24th International Conference on Data Engineering (ICDE), Cancun, Mexico, April 2008. (DBClip)

Databases with Uncertainty and Lineage. Omar Benjelloun, Anish Das Sarma, Alon Halevy, Martin Theobald, Jennifer Widom. VLDB Journal, 17(2), 243-264, March 2008. (Special issue on Best papers of VLDB '06.)

2007

Leveraging Aggregate Constraints for Deduplication. Surajit Chaudhuri, Anish Das Sarma, Venkatesh Ganti, Raghav Kaushik. Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), Beijing, China, June 2007.

Detecting Near-Duplicates for Web-Crawling. Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma. Proceedings of the 16th International World Wide Web (WWW) Conference, Banff, Canada, May 2007.

Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Michi Mutsuzaki, Martin Theobald, Ander de Keijzer, Jennifer Widom, Parag Agrawal, Omar Benjelloun, Anish Das Sarma, Raghotham Murthy, Tomoe Sugihara. Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research (CIDR) , Pacific Grove, California, January 2007. Demostration description.

2006

ULDBs: Databases with Uncertainty and Lineage. Omar Benjelloun, Anish Das Sarma, Alon Halevy, Jennifer Widom. Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, September 2006. (Selected among "best papers of VLDB '06" for special issue of VLDB Journal.)

Trio: A System for Data, Uncertainty, and Lineage. Parag Agrawal, Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Shubha Nabar, Tomoe Sugihara, Jennifer Widom. Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB), Seoul, Korea, September 2006. Demostration description.

Estimating Data Stream Quality for Object-Detection Applications. Anish Das Sarma, Shawn R. Jeffery, Michael J. Franklin, Jennifer Widom. Proceedings of the Third International ACM SIGMOD Workshop on Information Quality in Information Systems, Chicago, Illinois, June 2006.

Working Models for Uncertain Data. Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom. Proceedings of the Twenty-Second International Conference on Data Engineering (ICDE), Atlanta, Georgia, April 2006.

An Introduction to ULDBs and the Trio System. Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Jennifer Widom. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1), March 2006.

2005

Representing Uncertainty: Uniqueness, Equivalence, Minimization and Approximation. Anish Das Sarma, Shubha U. Nabar, Jennifer Widom. Technical Report, Stanford University, December 2005.

2004

A Decomposition Based Approach for Design of Supply Aggregation and Demand Aggregation Exchanges. Shantanu Biswas, Y. Narahari, Anish Das Sarma. The International Workshop on Theory Building and Formal Methods in Electronic/Mobile Commerce (TheFormEMC) collocated with FORTE, 2004. Published in LNCS, pp. 58-71, Volume 3236, 2004.

Generic Text Summarization Using WordNet. Kedar Bellare, Anish Das Sarma, Atish Das Sarma, Navneet Loiwal, Vaibhav Mehta, Ganesh Ramakrishnan, Pushpak Bhattacharya. Internationational Conference on Language Resources and Evaluation (LREC), 2004.


Anish Das Sarma < Email: anish.dassarma@gmail.com >