Report Number: CS-TR-99-1621
Institution: Stanford University, Department of Computer Science
Title: Perceptual Metrics for Image Database Navigation
Author: Rubner, Yossi
Date: August 1999
Abstract: The increasing amount of information available in today's
world raises the need to retrieve relevant data efficiently.
Unlike text-based retrieval, where keywords are successfully
used to index into documents, content-based image retrieval
poses up front the fundamental questions how to extract
useful image features and how to use them for intuitive
retrieval. We present a novel approach to the problem of
navigating through a collection of images for the purpose of
image retrieval, which leads to a new paradigm for image
database search. We summarize the appearance of images by
distributions of color or texture features, and we define a
metric between any two such distributions. This metric, which
we call the "Earth Mover's Distance" (EMD), represents the
least amount of work that is needed to rearrange the mass is
one distribution in order to obtain the other. We show that
the EMD matches perceptual dissimilarity better than other
dissimilarity measures, and argue that it has many desirable
properties for image retrieval. Using this metric, we employ
Multi-Dimensional Scaling techniques to embed a group of
images as points in a two- or three-dimensional Euclidean
space so that their distances reflect image dissimilarities
as well as possible. Such geometric embeddings exhibit the
structure in the image set at hand, allowing the user to
understand better the result of a database query and to
refine the query in a perceptually intuitive way. By
iterating this process, the user can quickly zoom in to the
portion of the image space of interest. We also apply these
techniques to other modalities such as mug-shot retrieval.
http://i.stanford.edu/pub/cstr/reports/cs/tr/99/1621/CS-TR-99-1621.pdf