# Jeffrey D. Ullman

Jeff Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus). His interests include database theory, database integration, data mining, and education using the information infrastructure.

What's New | Polemics | Books | Biographical Information

## What's New

I refuse to get involved with Facebook, or Twitter, or LinkedIn, or any of the old social-network sites, but I have started to post observations and reports of my trips and such on Google+. I'm ullmanAtGmailDotCom (you can figure it out!).

### Map-Reduce Algorithms

Along with a number of colleagues, I have been looking at the question of algorithm design for Hadoop (map-reduce). Foto Afrati and I started with a technique for optimal implementation of multiway joins using a single MR job: Optimizing Joins in a Map-Reduce Environment. This idea was applied to the problem of finding triangles and more complex structures in graphs using MR: Enumerating Subgraph Instances Using Map-Reduce. We then started looking at the problem of similarity joins using MR: Fuzzy Joins Using MapReduce.

The outgrowth of these studies was something that I believe is really important: a model of map-reduce computation that gets at the heart of what makes it different from plain old parallel computing. In Upper and Lower Bounds on the Cost of a Map-Reduce Computation. The theory explained here lets you get lower bounds on the amount of communication needed, as a function of how much work each reducer is assigned. That, in turn, lets you know when your algorithm can be improved and when it cannot. We already have a number of surprises in this regard, and we invite interested researchers to help explore the many problems and algorithms that can be analyzed this way. I'm part of a team that will give a Tutorial at the SoCC on Oct. 17, 2012. The tutorial covers these developments and a number of others related to map-reduce and its extensions.

Gradiance is a system for creating and administering class exercises. These homeworks and progamming labs are designed to teach students, rather than merely to test. Through the concept of a "root question," students can repeat the same work several times, and are given advice when they make an error.

The Gradiance contract with Pearson (Addison-Wesley + Prentice-Hall) has terminated, and we have decided to turn Gradiance into a FREE service. If you are an instructor who wants to use the system, start by creating an account for yourself at www.gradiance.com/services NOT at the Pearson site. Note: passwords are >= 10 letters+digits, with at least one of each. Also, we cannot make an account be an instructor account for a book if the same account has registered as a student for a course using the same materials.

Then, email your chosen login, with the book whose materials you want, to support@gradiance.com We'll enable you to create a class using those materials. There are manuals at www.gradiance.com/info.html that should enable you to use the system without problems, but feel free to email support@gradiance.com if you encounter difficulties.

In addition, we have created eleven free "omnibus classes" covering Databases, Automata, Compilers, Operating Systems, Introductory Java, Data Structures, and Data Mining. Students wishing to join either one of these classes will find the Student Directions useful.

### EDBT-2011 Keynote

Here are the Slides from a keynote talk I gave at the 2011 EDBT Conference in Uppsala. They cover extensions of Map-Reduce to Workflow systems and the problems of how one deals with recursive applications. In particular, while nonrecursive applications allow tasks to be blocking (no delivery of output until done), you can't have recursive tasks that are blocking. Yet failure management in Map-Reduce and related systems depends on the blocking property. Further, recursive tasks often have a "long-tail" of rounds that involve communication of small files, with a high resulting overhead. As a result, a variety of new challenges are created, both for those interested in system implementation and those interested in algorithms.

### Free Book: Mining of Massive Datasets

The book Mining of Massive Dataasets by Anand Rajaraman and me, based on the CS345A course we have been teaching (now CS246 in the Winter and CS341 in the Spring), Has just been published in hardcopy by Cambridge University Press. It can be purchased in hardcopy Here. By agreement with Cambridge Press, it may still be downloaded gratis Here.

### Another Free Book: Foundations of Computer Science

In 1992, Al Aho and I published a book called Foundations of Computer Science, whose goal was to present CS theory as something integrally connected to CS practice. For example, we viewed recursive programs, recursive definitions, and inductive proofs as the same thing. We believe the book deserved a better fate, but the publisher took it out of print years ago. Having received back the rights to the book, we are happy to make it freely available Here.

### I Would Like to Hear From You, But...

I generally enjoy getting emails, even if it is to tell me of a mistake in a book. In fact, those notes are particularly important; they help not only me and my coauthors, but more importantly, later readers of the book. I try to respond helpfully and in a timely manner to email that isn't spam. However, there are two classes of emails that I think should not be written and that get a form response. Read More.

## Polemics

As part of my role as old curmudgeon (replacing my previous role as young curmudgeon), I have been writing a few articles about things I think especially stupid. I hope to write more.

• Answers to All Questions Iranian. Am I the only one who gets peppered with questions from Iranians? They're technical questions or political questions, but there seems to be a system behind them. So I decided to save time and answer them all here.

• Fundamentalism. Shortly after 9/11/01, I wrote this article on fundamentalism --- the nonnegotiable belief in some unproven hypothesis. It's been added to periodically, and rambles a bit, over the foolishness of the Palestinian leadership, the drug warriors, the politically correct, the anti-abortion crowd, and a few others. This article also advocated research on a system like "Total Information Awareness," although I doubt that it had any influence on that development.

• 4-Way Stop Signs. Dumb traffic policies have always been way up on my list of things I'd like to fix but can't. The global problem is that those with the power to decide, focus on specific risks, which they can see. But in solving one problem, they often introduce far more harmful effects --- e.g., pollution. The problem is that the harm is distributed sufficiently widely so that no one can point to a particular death and attribute it to a particular instance of some dumb traffic policy. This polemic, originally written for Club Nexus (the prototype for Orkut), focuses on the 4-way stop signs on Stanford Campus, but the principles apply broadly.

• Attack of the Fifty Foot NIMBY's. Throughout my life, I've been infuriated by the tendency of our social structure to allow theft, as long as the loss is distributed in tiny parcels among a large number of people. "Affirmative action" is probably the best known of these subtle thefts, but this essay is not about affirmative action. Rather, it's about a theft that occurs on our streets every day, yet no one notices.

• Software Patents. This article was written for the Knuth Prize. Don, who is of course ineligible for the Knuth Prize, would probably say we should get rid of all patents. I don't agree, but I do feel that in the software area, things have gotten out of hand. Here, I tried to argue that courts should apply the same standards that are used when choosing papers worthy of presentation at a conference, to the matter of whether a certain software idea deserves a patent.

• University of California (Non)Confidentiality Policy. The UC system, unlike every other university in this country, refuses to offer the maximum confidentiality that the law allows for letters of recommendation that they solicit. Members of the various computer-science departments in the UC system understand that this policy makes letter-writing irrelevant and deprives them of an important source of advice. However, the policy comes from so high up that it is impossible for them to do anything about it. I have therefore endeavored to put in place a solution similar to that used in response to the equally foolish "Buckley amendment," which gave student applicants the right to see their recommendations. The document referred to is a pledge that the person about whom you are writing a letter can sign, to assure you that you can write an honest and frank letter without fear of later reprisal or embarrassment. I wish the UC departments would themselves solicit this pledge, in order to preserve your anonymity as well as your privacy, but apparently they are forbidden to do so.

## Books --- Past and Future

For some sets of notes and materials for supplementing current books, click here:

## Biographical Information

Jeffrey D. Ullman
ullman @ cs.stanford.edu
650-494-8016 (home)
650-725-2588 (FAX)