Institution: Stanford University, Department of Computer Science

Title: Incremental Updates of Inverted Lists for Text Document Retrieval

Author: Tomasic, Anthony

Author: Garcia-Molina, Hector

Author: Shoens, Kurt

Date: December 1993

Abstract: With the proliferation of the world's "information highways" a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index data structure. The index dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

CS-TN-93-1

Report Number: CS-TN-93-2

Institution: Stanford University, Department of Computer Science

Title: The Efficacy of GlOSS for the Text Database Discovery Problem

Author: Gravano, Luis

Author: Garcia-Molina, Hector

Author: Tomasic, Anthony

Date: December 1993

Abstract: The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this problem based on estimating the result size of a query and a database. The method is termed GlOSS--Glossary of Servers Server. The second part of this paper evaluates GlOSS using four different semantics to answer a user's queries. Real users' queries were used in the experiments. We also describe several variations of GlOSS and compare their efficacy. In addition, we analyze the storage cost of our approach to the problem.

CS-TN-93-2

Report Number: CS-TN-93-3

Institution: Stanford University, Department of Computer Science

Title: Correct View Update Translations via Containment

Author: Tomasic, Anthony

Date: December 1993

Abstract: One approach to the view update problem for deductive databases proves properties of translations - that is, a language specifies the meaning of an update to the intensional database (IDB) in terms of updates to the extensional database (EDB). We argue that the view update problem should be viewed as a question of the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language as a means of specifying translations of updates on the IDB into updates on the EDB. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extension to the translation language.

CS-TN-93-3

Report Number: CS-TN-94-6

Institution: Stanford University, Department of Computer Science

Title: Ascribing Beliefs

Author: Brafman, Ronen I.

Author: Tennenholtz, Moshe

Date: December 1993

Abstract: Models of agents that employ formal notions of mental states are useful and often easier to construct than models at the symbol (e.g., programming language) or physical (e.g., mechanical) level. In order to enjoy these benefits, we must supply a coherent picture of mental-level models, that is, a description of the various components of the mental level, their dynamics and their inter-relations. However, these abstractions provide weak modelling tools unless (1) they are grounded in more concrete notions; and (2) we can show when it is appropriate to use them. In this paper we propose a model that grounds the mental state of the agent in its actions. We then characterize a class of {\em goal-seeking\/} agents that can be modelled as having beliefs. This paper emphasizes the task of belief ascription. On one level this is the practical task of deducing an agent's beliefs, and we look at assumptions that can help constrain the set of beliefs an agent can be ascribed, showing cases in which, under these assumptions, this set is unique. We also investigate the computational complexity of this task, characterizing a class of agents to whom belief ascription is tractable. But on a deeper level, our model of belief ascription supplies concrete semantics to beliefs, one that is grounded in an observable notion -- action.

CS-TN-94-6

Report Number: CS-TN-94-7

Institution: Stanford University, Department of Computer Science

Title: Overcoming Unexpected Obstacles

Author: McCarthy, John

Date: May 1994

Abstract: The present note illustrates how logical formalizations of common sense knowledge and reasoning can achieve some of the open-endedness of human common sense reasoning. A plan is made to fly from Glasgow to Moscow and is shown by circumscription to lead to the traveller arriving in Moscow. Then a fact about an unexpected obstacle---the traveller losing his ticket---is added without changing any of the previous facts, and the original plan can no longer be shown to work if it must take into account the new fact. However, an altered plan that includes buying a replacement ticket can now be shown to work. The formalism used is a modification of one developed by Vladimir Lifschitz, and I have been informed that the modification isn't correct, and I should go back to Lifschitz's original formalism.

CS-TN-94-7

Report Number: CS-TN-94-8

Institution: Stanford University, Department of Computer Science

Title: Emulating Soft Real-Time Scheduling Using Traditional Operating System Schedulers

Author: Adelberg, Brad

Author: Garcia-Molina, Hector

Author: Kao, Ben

Date: May 1994

Abstract: Real-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time problems, a traditional operating system may be the development platform of choice. This paper addresses methods of emulating real-time scheduling algorithms on top of standard time-share schedulers. We examine (through simulations) three strategies for priority assignment within a traditional multi-tasking environment. The results show that the emulation algorithms are comparable in performance to the real-time algorithms and in some instances outperform them.

CS-TN-94-8

Report Number: CS-TN-94-9

Institution: Stanford University, Department of Computer Science

Title: Reasoning About The Effects of Communication On Beliefs

Author: Young, R. Michael

Date: June 1994

Abstract: Perrault has presented a formal framework describing communicative action and the change of mental state of agents participating in the performance of speech acts. This approach, using an axiomatization in default logic, suffers from several drawbacks dealing with the persistence of beliefs and ignorance over time. We provide an example which illustrates these drawbacks and then present a second approach which avoids these problems. This second approach, an axiomatization of belief transfer in a nonmonotonic modal logic of belief and time, is a reformulation of Perrault's main ideas within a logic which uses an ignorance-based semantics to ensure that ignorance is maximized. We present an axiomatization of this logic and describe the associated techniques for nonmonotonic reasoning. We then show how this approach deals with inter-agent communications in a more intuitively appealing way.

CS-TN-94-9

Report Number: CS-TN-94-10

Institution: Stanford University, Department of Computer Science

Title: Precision and Recall of GlOSS Estimators for Database Discovery

Author: Tomasic, Anthony

Author: Gravano, Luis

Author: Garcia-Molina, Hector

Date: July 1994

Abstract: The availability of large numbers of network information sources has led to a new problem: finding which text databases (out of perhaps thousands of choices) are the most relevant to a query. We call this the text-database discovery problem. Our solution to this problem, GlOSS--Glossary-Of-Servers Server, keeps statistics on the available databases to decide which ones are potentially useful for a given query. In this paper we present different query-result size estimators for GlOSS and we evaluate them with metrics based on the precision and recall concepts of text-document information-retrieval theory. Our generalization of these metrics uses different notions of the set of relevant databases to define different query semantics.

CS-TN-94-10

Report Number: CS-TN-94-11

Institution: Stanford University, Department of Computer Science

Title: On Exact and Approximate Cut Covers of Graphs

Author: Motwani, Rajeev

Author: Naor, Joseph

Date: September 1994

Abstract: We consider the minimum cut cover problem for a simple, undirected graphs G(V,E): find a minimum cardinality family of cuts C in G such that each edge e in E belongs to at least one cut c in C. The cardinality of the minimum cut cover of G is denoted by c(G). The motivation for this problem comes from testing of electronic component boards. Loulou showed that the cardinality of a minimum cut cover in the complete graph is precisely the ceiling of log n. However, determining the minimum cut cover of an arbitrary graph was posed as an open problem by Loulou. In this note we settle this open problem by showing that the cut cover problem is closely related to the graph coloring problem, thereby also obtaining a simple proof of Loulou's main result. We show that the problem is NP-complete in general, and moreover, the approximation version of this problem still remains NP-complete. Some other observations are made, all of which follow as a consequence of the close connection to graph coloring.

CS-TN-94-11

Report Number: CS-TN-94-12

Institution: Stanford University, Department of Computer Science

Title: Cross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection

Author: John, George H.

Date: October 1994

Abstract: Machine learning algorithms for supervised learning are in wide use. An important issue in the use of these algorithms is how to set the parameters of the algorithm. While the default parameter values may be appropriate for a wide variety of tasks, they are not necessarily optimal for a given task. In this paper, we investigate the use of cross-validation to select parameters for the C4.5 decision tree learning algorithm. Experimental results on five datasets show that when cross-validation is applied to selecting an important parameter for C4.5, the accuracy of the induced trees on independent test sets is generally higher than the accuracy when using the default parameter value.

CS-TN-94-12

Report Number: CS-TN-94-13

Institution: Stanford University, Department of Computer Science

Title: Formalizing Context (Expanded Notes)

Author: McCarthy, John

Author: Buvac, Sasa

Date: October 1994

Abstract: These notes discuss formalizing contexts as first class objects. The basic relation is Ist(c,p). It asserts that the proposition p is true in the context c. The most important formulas relate the propositions true in different contexts. Introducing contexts as formal objects will permit axiomatizations in limited contexts to be expanded to transcend the original limitations. This seems necessary to provide AI programs using logic with certain capabilities that human fact representation and human reasoning possess. Fully implementing transcendence seems to require further extensions to mathematical logic, i.e. beyond the nonmonotonic inference methods first invented in AI and now studied as a new domain of logic.

CS-TN-94-13

Report Number: CS-TN-94-14

Institution: Stanford University, Department of Computer Science

Title: Generalized Projections: A Powerful Query-Optimization Technique

Author: Harinarayan, Venky

Author: Gupta, Ashish

Date: November 1994

Abstract: In this paper we introduce generalized projections (GP). GPs capture aggregations, groupbys, conventional projection with duplicate elimination (Distinct), and duplicate preserving projections. We develop a technique for pushing GPs down query trees of Select-project-join queries that may use aggregations like Max, Sum, etc. and that use arbitrary functions in their selection conditions. Our technique pushes down to the lowest levels of a query tree aggregation computation, duplicate elimination, and function computation. The technique also creates aggregations in queries that did not use aggregation to begin with. Our technique is important since applying aggregations early in query processing can provide significant performance improvements. In addition to their value in query optimization, generalized projections unify set and duplicate semantics, and help better understand aggregations.

CS-TN-94-14

Report Number: CS-TN-94-15

Institution: Stanford University, Department of Computer Science

Title: Reasoning Theories: Towards an Architecture for Open Mechanized Reasoning Systems

Author: Giunchiglia, Fausto

Author: Pecchiari, Paolo

Author: Talcott, Carolyn

Date: December 1994

Abstract: Our ultimate goal is to provide a framework and a methodology which will allow users, and not only system developers, to construct complex reasoning systems by composing existing modules, or to add new modules to existing systems, in a ``plug and play'' manner. These modules and systems might be based on different logics; have different domain models; use different vocabularies and data structures; use different reasoning strategies; and have different interaction capabilities. This paper makes two main contributions towards our goal. First, it proposes a general architecture for a class of reasoning modules and systems called Open Mechanized Reasoning Systems (OMRSs). An OMRS has three components: a reasoning theory component which is the counterpart of the logical notion of formal system, a control component which consists of a set of inference strategies, and an interaction component which provides an OMRS with the capability of interacting with other systems, including OMRSs and human users. Second, it develops the theory underlying the reasoning theory component. This development is motivated by an analysis of state of the art systems. The resulting theory is then validated by using it to describe the integration of the linear arithmetic module into the simplification process of the Boyer-Moore system, NQTHM.

CS-TN-94-15

Report Number: CS-TN-95-16

Institution: Stanford University, Department of Computer Science

Title: The Meaning of Negative Premises in Transition System Specifications II

Author: Glabbeek, R.J. van

Date: February 1995

Abstract: This paper reviews several methods to associate transition relations to transition system specifications with negative premises in Plotkin's structural operational style. Besides a formal comparison on generality and relative consistency, the methods are also evaluated on their taste in determining which specifications are meaningful and which are not.

CS-TN-95-16

Report Number: CS-TN-95-17

Institution: Stanford University, Department of Computer Science

Title: Ntyft/ntyxt Rules Reduce to Ntree Rules

Author: Glabbeek, R.J. van

Date: February 1995

Abstract: Groote and Vaandrager introduced the tyft/tyxt format for Transition System Specifications (TSSs), and established that for each TSS in this format that is well-founded, the bisimulation equivalence it induces is a congruence. In this paper, we construct for each TSS in tyft/tyxt format an equivalent TSS that consists of tree rules only. As a corollary we can give an affirmative answer to an open question, namely whether the well-foundedness condition in the congruence theorem for tyft/tyxt can be dropped. These results extend to tyft/tyxt with negative premises and predicates.

CS-TN-95-17

Report Number: CS-TN-95-18

Institution: Stanford University, Department of Computer Science

Title: Effective models of polymorphism, subtyping and recursion

Author: Mitchell, John

Author: Viswanathan, Ramesh

Date: March 1995

Abstract: We develop a class of models of polymorphism, subtyping and recursion based on a combination of traditional recursion theory and simple domain theory. A significant property of our primary model is that types are coded by natural numbers using any index of their supremum operator. This leads to a distinctive view of polymorphic functions that has many of the usual parametricity properties. It also gives a distinctive but entirely coherent interpretation of subtyping. An alternate construction points out some peculiarities of computability theory based on natural number codings. Specifically, the polymorphic fixed point is computable by a single algorithm at all types when we construct the model over untyped call-by-value lambda terms, but not when we use Godel numbers for computable functions. This is consistent with trends away from natural numbers in the field of abstract recursion theory.

CS-TN-95-18

Report Number: CS-TN-95-19

Institution: Stanford University, Department of Computer Science

Title: Fast Approximation Algorithm for Minimum Cost Multicommodity Flow

Author: Kamath, Anil

Author: Palmon, Omri

Author: Plotkin, Serge

Date: March 1995

Abstract: Minimum-cost multicommodity flow problem is one of the classical optimization problems that arises in a variety of contexts. Applications range from finding optimal ways to route information through communication networks to VLSI layout. In this paper, we describe an efficient deterministic approximation algorithm, which given that there exists a multicommodity flow of cost $B$ that satisfies all the demands, produces a flow of cost at most $(1+\delta)B$ that satisfies $(1-\epsilon)$-fraction of each demand. For constant $\delta$ and $\epsilon$, our algorithm runs in $O^*(kmn^2)$ time, which is an improvement over the previously fastest (deterministic) approximation algorithm for this problem due to Plotkin, Shmoys, and Tardos, that runs in $O^*(k^2m^2)$ time.

CS-TN-95-19

Report Number: CS-TN-95-20

Institution: Stanford University, Department of Computer Science

Title: Interior Point Algorithms for Exact and Approximate Solution of Multicommodity Flow Problems

Author: Kamath, Anil

Author: Palmon, Omri

Date: March 1995

Abstract: In this paper, we present a new interior-point based polynomial algorithm for the multicommodity flow problem and its variants. Unlike all previously known interior point algorithms for multicommodity flow that have the same complexity for approximate and exact solutions, our algorithm improves running time in the approximate case by a polynomial factor. For many cases, the exact bounds are better as well. Instead of using the conventional linear programming formulation for the multicommodity flow problem, we model it as a quadratic optimization problem which is solved using interior-point techniques. This formulation allows us to exploit the underlying structure of the problem and to solve it efficiently. The algorithm is also shown to have improved stability properties. The improved complexity results extend to minimum cost multicommodity flow, concurrent flow and generalized flow problems.

CS-TN-95-20

Report Number: CS-TN-95-21

Institution: Stanford University, Department of Computer Science

Title: Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies

Author: Gravano, Luis

Author: Garcia-Molina, Hector

Date: April 1995

Abstract: As large numbers of text databases have become available on the Internet, it is getting harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work, which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations.

CS-TN-95-21

Report Number: CS-TN-95-22

Institution: Stanford University, Department of Computer Science

Title: Combining Register Allocation and Instruction Scheduling

Author: Motwani, Rajeev

Author: Palem, Krishna V.

Author: Sarkar, Vivek

Author: Reyen, Salem

Date: August 1995

Abstract: We formulate combined register allocation and instruction scheduling within a basic block as a single optimization problem, with an objective cost function that more directly captures the primary measure of interest in code optimization --- the completion time of the last instruction. We show that although a simple instance of the combined problem is NP-hard, the combined problem is much easier to solve approximately than graph coloring, which is a common formulation used for the register allocation phase in phase-ordered solutions. Using our framework, we devise a simple and effective heuristic algorithm for the combined problem. This algorithm is called the (alpha,beta)-Combined Heuristic; parameters alpha and beta provide relative weightages for controlling register pressure and instruction parallelism considerations in the combined heuristic. Preliminary experiments indicate that the combined heuristic yields improvements in the range of 16-21% compared to the phase-ordered solutions, when the input graphs contain balanced amount of register pressure and instruction-level parallelism.

CS-TN-95-22

Report Number: CS-TN-95-23

Institution: Stanford University, Department of Computer Science

Title: Dynamic Maintenance of Kinematic Structures

Author: Halperin, Dan

Author: Latombe, Jean-Claude

Author: Motwani, Rajeev

Date: August 1995

Abstract: We study the following dynamic data structure problem. Given a collection of rigid bodies moving in three-dimensional space and hinged together in a kinematic structure, our goal is to maintain a data structure that describes certain geometric features of these bodies, and efficiently update it as the bodies move. This data structure problem seems to be fundamental and it comes up in a variety of applications such as conformational search in molecular biology, simulation of hyper-redundant robots, collision detection and computer animation. In this note we present preliminary results on a few variants of the problem.

CS-TN-95-23

Report Number: CS-TN-95-24

Institution: Stanford University, Department of Computer Science

Title: Approximation Algorithms for $k$-Delivery TSP

Author: Chalasani, Prasad

Author: Motwani, Rajeev

Date: August 1995

Abstract: We provide O(1) approximation algorithms for the following NP-hard problem called k-Delivery TSP: We have at our disposal a truck of capacity k, and there are n depots and n customers at various locations in some metric space, and exactly one item (all of which are identical) at each depot. We want to find an optimal tour using the truck to deliver one item to each customer. Our algorithms run in time polynomial in both n and k. The 1-Delivery problem is one of finding an optimal tour that alternately visits depots and customers. For this case we use matroid intersection to show a polynomial-time 2-approximation algorithm, improving upon a factor 2.5 algorithm of Anily and Hassin. Using this approximation combined with certain lower bounding arguments we show a factor 11.5 approximation to the optimal k-Delivery tour. For the infinite k case we show a factor 2 approximation.

CS-TN-95-24

Report Number: CS-TN-95-25

Institution: Stanford University, Department of Computer Science

Title: Complexity Measures for Assembly Sequences

Author: Goldwasser, Michael

Author: Latombe, Jean-Claude

Author: Motwani, Rajeev

Date: October 1995

Abstract: Our work examines various complexity measures for two-handed assembly sequences. Many present assembly sequencers take a description of a product and output a valid assembly sequence. For many products there exists an exponentially large set of valid sequences, and a natural goal is to use automated systems to attempt to select wisely from the choices. Since assembly sequencing is a preprocessing phase for a long and expensive manufacturing process, any work towards finding a ``better'' assembly plan is of great value when it comes time to assemble the physical product in mass quantities. We take a step in this direction by introducing a formal framework for studying the optimization of several complexity measures. This framework focuses on the combinatorial aspect of the family of valid assembly sequences, while temporarily separating out the specific geometric assumptions inherent to the problem. With an exponential number of possibilities, finding the true optimal cost solution seems hard. In fact in the most general case, our results suggest that even finding an approximate solution is hard. Future work is directed towards using this model to study how the original geometric assumptions can be reintroduced to prove stronger approximation results.

CS-TN-95-25

Report Number: CS-TN-95-26

Institution: Stanford University, Department of Computer Science

Title: Mediation and Software Maintenance

Author: Wiederhold, Gio

Date: October 1995

Abstract: This paper reports on recent work and directions in modern software architectures and their formal models with respect to software maintenance. Related earlier work, now entering practice, provides automatic creation of object structures for customer applications using such models and their algebra, and we will summarize that work. Our focus on maintenance intends to attack the most costly and frustrating aspect in dealing with large-scale software systems: keeping them up-to-date and responsive to user needs in changing environments. We introduce the concept of domain-specific mediators to partition the maintenance effort. Mediators are autonomous modules which create information objects out of source data. These modules are placed into an intermediate layer, bridging clients and servers. These mediators contain knowledge required to establish and maintain services in a coherent domain. A mediated architecture can reduce the cost growth of maintenance to a near-linear function of system size, whereas current system architectures have quadratic factors.

CS-TN-95-26

Report Number: CS-TN-95-27

Institution: Stanford University, Department of Computer Science

Title: Comparing Very Large Database Snapshots

Author: Labio, Wilburt Juan

Author: Garcia-Molina, Hector

Date: May 1995

Abstract: Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not "very different". The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.

CS-TN-95-27

Report Number: CS-TN-96-28

Institution: Stanford University, Department of Computer Science

Title: A Common Framework for Steerability, Motion Estimation and Invariant Feature Detection

Author: Hel-Or, Yacov

Author: Teo, Patrick C.

Date: January 1996

Abstract: Many problems in computer vision and pattern recognition involve groups of transformations. In particular, motion estimation, steerable filter design and invariant feature detection are often formulated with respect to a particular transformation group. Traditionally, these problems have been investigated independently. From a theoretical point of view, however, the issues they address are similar. In this paper, we examine these common issues and propose a theoretical framework within which they can be discussed in concert. This framework is based on constructing a more natural representation of the image for a given transformation group. Within this framework, many existing techniques of motion estimation, steerable filter design and invariant feature detection appear as special cases. Furthermore, several new results are direct consequences of this framework. First, a canonical decomposition of all filters that can be steered with respect to any one-parameter group and any multi-parameter Abelian group is proposed. Filters steerable under various subgroups of the affine group are also tabulated. Second, two approximation techniques are suggested to deal with filters that cannot be steered exactly. Approximating steerable filters can also be used for motion estimation. Third, within this framework, invariant features can easily be constructed using traditional techniques for computing point invariance.

CS-TN-96-28

Report Number: CS-TN-96-29

Institution: Stanford University, Department of Computer Science

Title: The Optimization Complexity of Constraint Satisfaction Problems

Author: Khanna, Sanjeev

Author: Sudan, Madhu

Date: December 1995

Abstract: In 1978, Schaefer considered a subclass of languages in NP and proved a ``dichotomy theorem'' for this class. The subclass considered were problems expressible as ``constraint satisfaction problems'', and the ``dichotomy theorem'' showed that every language in this class is either in P, or is NP-hard. This result is in sharp contrast to a result of Ladner, which shows that such a dichotomy does not hold for NP, unless NP=P. We consider optimization version of the dichotomy question and show an analog of Schaefer's result for this case. More specifically, we consider optimization version of ``constraint satisfaction problems'' and show that every optimization problem in this class is either solvable exactly in P, or is MAX SNP-hard, and hence not approximable to within some constant factor in polynomial time, unless NP=P. This result does not follow directly from Schaefer's result. In particular, the set of problems that turn out to be hard in this case, is quite different from the set of languages which are shown hard by Schaefer's result.

CS-TN-96-29

Report Number: CS-TN-96-30

Institution: Stanford University, Department of Computer Science

Title: The Development of Type Systems for Object-Oriented Languages

Author: Fisher, Kathleen

Author: Mitchell, John C.

Date: January 1996

Abstract: This paper, which is partly tutorial in nature, summarizes some basic research goals in the study and development of typed object-oriented programming languages. These include both immediate repairs to problems with existing languages and the long-term development of more flexible and expressive, yet type-safe, approaches to program organization and design. The technical part of the paper is a summary and comparison of three object models from the literature. We conclude by discussing approaches to selected research problems, including changes in the type of a method from super class to sub class and the use of types that give information about the implementations as well as the interfaces of objects. Such implementation types seem essential for adequate typing of binary operations on objects, for example.

CS-TN-96-30

Report Number: CS-TN-96-31

Institution: Stanford University, Department of Computer Science

Title: Classes = Objects + Data Abstraction

Author: Fisher, Kathleen

Author: Mitchell, John C.

Date: January 1996

Abstract: We describe a type-theoretic foundation for object systems that include ``interface types'' and ``implementation types.'' Our approach begins with a basic object calculus that provides a notion of object, method lookup, and object extension (an object-based form of inheritance). In this calculus, the type of an object gives its interface, as a set of methods and their types, but does not imply any implementation properties. We extend this object calculus with a higher-order form of data abstraction that allows us to declare supertypes of an abstract type and a list of methods guaranteed not to be present. This results in a flexible framework for studying and improving practical programming languages where the type of an object gives certain implementation guarantees, such as would be needed to statically determine the offset of a method or safely implement binary operations without exposing the internal representation of objects. We prove type soundness for the entire language using operational semantics and an analysis of typing derivations. One insight that is an immediate consequences of our analysis is a principled, type-theoretic explanation (for the first time, as far as we know) of the link between subtyping and inheritance in C++, Eiffel and related languages.

CS-TN-96-31

Report Number: CS-TN-96-32

Institution: Stanford University, Department of Computer Science

Title: Design of Multi-Parameter Steerable Functions Using Cascade Basis Reduction

Author: Teo, Patrick C.

Author: Hel-Or, Yacov

Date: March 1996

Abstract: A new cascade basis reduction method of computing the optimal least-squares set of basis functions steering a given function is presented. The method combines the Lie group-theoretic and the singular value decomposition approaches in such a way that their respective strengths complement each other. Since the Lie group-theoretic approach is used, the set of basis and steering functions computed can be expressed analytically. Because the singular value decomposition method is used, this set of basis and steering functions is optimal in the least-squares sense. Furthermore, the computational complexity in designing basis functions for transformation groups with large numbers of parameters is significantly reduced. The efficiency of the cascade basis reduction method is demonstrated by designing a set of basis functions that steers a Gabor function under the four-parameter linear transformation group.

CS-TN-96-32

Report Number: CS-TN-96-33

Institution: Stanford University, Department of Computer Science

Title: A Computational Group-Theoretic Approach to Steerable Functions

Author: Teo, Patrick C.

Author: Hel-Or, Yacov

Date: April 1996

Abstract: We present a computational, group-theoretic approach to steerable functions. The approach is group-theoretic in that the treatment involves continuous transformation groups for which elementary Lie group theory may be applied. The approach is computational in that the theory is constructive and leads directly to a procedural implementation. For functions that are steerable with $n$ basis functions under a $k$-parameter group, the procedure is efficient in that at most $nk+1$ iterations of the procedure are needed to compute all the basis functions. Furthermore, the procedure is guaranteed to return the minimum number of basis functions. If the function is not steerable, a numerical implementation of the procedure could be used to compute basis functions that approximately steer the function over a range of parameters. Examples of both applications are described.

CS-TN-96-33

Report Number: CS-TN-96-34

Institution: Stanford University, Department of Computer Science

Title: Fast Estimation of Diameter and Shortest Paths (without Matrix Multiplication)

Author: Aingworth, Donald

Author: Chekuri, Chandra

Author: Indyk, Piotr

Author: Motwani, Rajeev

Date: May 1996

Abstract: In the recent past, there has been considerable progress in devising algorithms for the all-pairs shortest paths problem running in time significantly smaller than the obvious time bound of O(n^3). Unfortunately, all the new algorithms are based on fast matrix multiplication algorithms that are notoriously impractical. Our work is motivated by the goal of devising purely combinatorial algorithms that match these improved running times. Our results come close to achieving this goal, in that we present algorithms with a small additive error in the length of the paths obtained. Our algorithms are easy to implement, have the desired property of being combinatorial in nature, and the hidden constants in the running time bound are fairly small. Our main result is an algorithm which solves the all-pairs shortest paths problem in unweighted, undirected graphs with an additive error of 2 in time O(n^{2.5} sqrt{log n}). This algorithm returns actual paths and not just the distances. In addition, we give more efficient algorithms with running time O(n^{1.5} sqrt{k log n} + n^2 log^2 n) for the case where we are only required to determine shortest paths between k specified pairs of vertices rather than all pairs of vertices. The starting point for all our results is an O(m sqrt{n log n}) algorithm for distinguishing between graphs of diameter 2 and 4, and this is later extended to obtaining a ratio 2/3 approximation to the diameter in time O(m sqrt{n log n} + n^2 log n). Unlike in the case of all-pairs shortest paths, our results for approximate diameter computation can be extended to the case of directed graphs with arbitrary positive real weights on the edges.

CS-TN-96-34

Report Number: CS-TN-96-35

Institution: Stanford University, Department of Computer Science

Title: A Calculus for Concurrent Objects

Author: DiBlasio, Paolo

Author: Fisher, Kathleen

Date: June 1996

Abstract: This paper presents an imperative and concurrent extension of the functional object-oriented calculus described in [FHM94]. It belongs to the family of so-called prototype-based object-oriented languages, in which objects are created from existing ones via the inheritance primitives of object extension and method override. Concurrency is introduced through the identification of objects and processes. To our knowledge, the resulting calculus is the first concurrent object calculus to be studied. We define an operational semantics for the calculus via a transition relation between configurations, which represent snapshots of the run-time system. Our static analysis includes a type inference system, which statically detects message-not-understood errors, and an effect system, which guarantees that synchronization code, specified via guards, is side-effect free. We present a subject reduction theorem, modified to account for imperative and concurrent features, and type and effect soundness theorems.

CS-TN-96-35

Report Number: CS-TN-96-36

Institution: Stanford University, Department of Computer Science

Title: Efficient Snapshot Differential Algorithms for Data Warehousing

Author: Garcia-Molina, Hector

Author: Labio, Wilburt Juan

Date: June 1996

Abstract: Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this em snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a {\em window} algorithm that works very well if the snapshots are not ``very different.'' The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.

CS-TN-96-36

Report Number: CS-TN-96-37

Institution: Stanford University, Department of Computer Science

Title: An Improved Lower Bound for Load Balancing of Tasks with Unknown Duration

Author: Plotkin, Serge

Author: Ma, Yuan

Date: October 1996

Abstract: Suppose there are n servers and a sequence of tasks, each of which arrives in an on-line fashion and can be handled by a subset of the servers. The level of the service required by a task is known upon arrival, but the duration of the service is unknown. The on-line load balancing problem is to assign each task to an appropriate server so that the maximum load on the servers is minimized. The best known lower bound on the competitive ratio for this problem was Sqrt(n). However, the argument used to prove this lower bound used a sequence of tasks with exponential duration, and therefore this lower bound does not preclude an algorithm with a competitive ratio that is polylogarithmic in T, the maximum task duration. In this paper we prove a lower bound of sqrt(T), thereby proving that a competitive ratio that is polylogarithmic in T is impossible. This should be compared to the analogous case for known-duration tasks, where it is possible to achieve competitive ratio that is logarithmic in T.

CS-TN-96-37

Report Number: CS-TN-96-38

Institution: Stanford University, Department of Computer Science

Title: Intractability of Assembly Sequencing: Unit Disks in the Plane

Author: Goldwasser, Michael

Author: Motwani, Rajeev

Date: December 1996

Abstract: We consider the problem of removing a given disk from a collection of unit disks in the plane. At each step, we allow a disk to be removed by a collision-free translation to infinity, and the goal is to access a given disk using as few steps as possible. Recently there has been a focus on optimizing assembly sequences over various cost measures, however with very limited algorithmic success. We explain this lack of success, proving strong inapproximability results in this simple geometric setting. These inapproximability results, to the best of our knowledge, are the strongest hardness results known for any purely combinatorial problem in a geometric setting. As a stepping stone, we study the approximability of scheduling with AND/OR precedence constraints. The Disks problem can be formulated as a scheduling problem where the order of removals is to be scheduled. Before scheduling a disk to be removed, a path must be cleared, and so we get precedence constraints on the tasks; however, the form of such constraints differs from traditional scheduling in that there is a choice of which path to clear.

CS-TN-96-38

Report Number: CS-TN-96-39

Institution: Stanford University, Department of Computer Science

Title: Complexity Measures for Assembly Sequences

Author: Goldwasser, Michael

Author: Motwani, Rajeev

Date: December 1996

Abstract: Our work examines various complexity measures for two-handed assembly sequences. Although there has been a great deal of algorithmic success for finding feasible assembly sequences, there has been very little success towards optimizing the costs of sequences. We attempt to explain this lack of progress, by proving the inherent difficulty in finding optimal, or even near-optimal, assembly sequences. We begin by introducing a formal framework for studying the optimization of several complexity measures. We consider a variety of different settings and natural cost measures for assembly sequences. Following which, we define a graph-theoretic problem which is a generalization of assembly sequencing. For our virtual assembly sequencing problem we are able to use techniques common to the theory of approximability to prove the hardness of finding even near-optimal sequences for most cost measures in our generalized framework. Of course, hardness results in our generalized framework do not immediately carry over to the original geometric problems. We continue by realizing several of these hardness results in rather simple geometric settings, proving the difficulty of some of the original problems. These inapproximability results, to the best of our knowledge, are the strongest hardness results known for a purely combinatorial problem in a geometric setting.

CS-TN-96-39

Report Number: CS-TN-97-40

Institution: Stanford University, Department of Computer Science

Title: Content Ratings, and Other Third-Party Value-Added Information: Defining an Enabling Platform

Author: Roscheisen, Martin

Author: Winograd, Terry

Author: Paepcke, Andreas

Date: January 1997

Abstract: This paper describes the ComMentor annotation architecture and its usages, with a specific emphasis on the content rating application.

CS-TN-97-40

Report Number: CS-TN-97-41

Institution: Stanford University, Department of Computer Science

Title: Reducing Initial Latency in a Multimedia Storage System

Author: Chang, Edward

Author: Garcia-Molina, Hector

Date: February 1997

Abstract: A multimedia server delivers presentations (e.g., videos, movies, games), providing high bandwidth and continuous real-time delivery. In this paper we present techniques for reducing the initial latency of presentations, i.e., for reducing the time between the arrival of a request and the start of the presentation. Traditionally, initial latency has not received much attention. This is because one major application of multimedia servers is ``movies on demand'' where a delay of a few minutes before a new multi-hour movie starts is acceptable. However, latency reduction is important in interactive applications such as playing of video games and browsing of multimedia documents. Latency reduction is also crucial to improve access performance to media data in a multimedia database system. Various latency reduction schemes are proposed and analyzed, and their performance compared. We show that our techniques can significantly reduce (almost eliminate in some cases) initial latency without adversely affecting throughput. Moreover, a novel on-disk partial data replication scheme that we propose proves to be far more cost effective than any other previous attempts in reducing initial latency.

CS-TN-97-41

Report Number: CS-TN-97-42

Institution: Stanford University, Department of Computer Science

Title: From User Access Patterns to Dynamic Hypertext Linking

Author: Yan, Tak Woon

Author: Jacobsen, Matthew

Author: Garcia-Molina, Hector

Author: Dayal, Umeshwar

Date: February 1997

Abstract: This paper describes an approach for automatically classifying visitors of a web site according to their access patterns. User access logs are examined to discover clusters of users that exhibit similar information needs; e.g., users that access similar pages. This may result in a better understanding of how users visit the site, and lead to an improved organization of the hypertext documents for navigational convenience. More interestingly, based on what categories an individual user falls into, we can dynamically suggest links for him to navigate. In this paper, we describe the overall design of a system that implements these ideas, and elaborate on the preprocessing, clustering, and dynamic link suggestion tasks. We present some experimental results generated by analyzing the access log of a web site.

CS-TN-97-42

Report Number: CS-TN-97-43

Institution: Stanford University, Department of Computer Science

Title: Report on the May 18-19 1995 IITA Digital Libraries Workshop: Final Draft for Participant Review, August 4, 1995

Author: Lynch, Clifford

Author: Garcia-Molina, Hector

Date: February 1997

Abstract: This report summarizes the outcomes of a workshop on Digital Libraries held under the auspices of the US Government's Information Infrastructure Technology and Applications (IITA) Working Group in Reston, Virginia on May 18-19, 1995. The objective of the workshop was to refine the research agenda for digital libraries with specific emphasis on scaling and interoperability and the infrastructure needed to enable this research agenda.

CS-TN-97-43

Report Number: CS-TN-97-44

Institution: Stanford University, Department of Computer Science

Title: Addressing Heterogeneity in the Networked Information Environment

Author: Baldonado, Michelle Q. Wang

Author: Cousins, Steve B.

Date: February 1997

Abstract: Several ongoing Stanford University Digital Library projects address the issue of heterogeneity in networked information environments. A networked information environment has the following components: users, information repositories, information services, and payment mechanisms. This paper describes three of the heterogeneity-focused Stanford projects-InfoBus, REACH, and DLITE. The InfoBus project is at the protocol level, while the REACH and DLITE projects are both at the conceptual model level. The InfoBus project provides the infrastructure necessary for accessing heterogeneous services and utilizing heterogeneous payment mechanisms. The REACH project sets forth a uniform conceptual model for finding information in networked information repositories. The DLITE project presents a general task-based strategy for building user interfaces to heterogeneous networked information services.

CS-TN-97-44

Report Number: CS-TN-97-45

Institution: Stanford University, Department of Computer Science

Title: Information Needs in Technical Work Settings and Their Implications for the Design of Computer Tools

Author: Paepcke, Andreas

Date: February 1997

Abstract: We interviewed information workers in multiple technical areas of a large, diverse company, and we describe some of the unsatisfied information needs we observed during our study. Two clusters of issues are described. The first covers how loosely coupled work groups use and share information. We show the need to structure information for multiple, partly unanticipated uses. We show how the construction of information compounds helps users accomplish some of this restructuring, and we explain how structuring flexibility is also required because of temperamental differences among users. The second cluster of issues revolves around collections of tightly coupled work groups. We show that information shared within such groups differs from information shared across group boundaries. We present the barriers to sharing which we saw operating both within groups and outside, and we explain the function of resource and contact broker which evolved in the settings we examined. For each of these issues we propose implications for information tool design.

CS-TN-97-45

Report Number: CS-TN-97-46

Institution: Stanford University, Department of Computer Science

Title: A Proposal for Basing our Protocols on a General Information Exchange Level

Author: Winograd, Terry

Date: February 1997

Abstract: In order to build our protocols in a way that will provide for long term growth and extensibility, we should define a standard level on top of the CORBA/ILU level, to deal with the management of information across objects on different servers. It is based on a generalization of the protocols we have been working on.

CS-TN-97-46

Report Number: CS-TN-97-47

Institution: Stanford University, Department of Computer Science

Title: Modes of Information Integration

Author: Winograd, Terry

Date: February 1997

Abstract: In order to better understand the different approaches to digital libraries, I compared a number of existing and proposed systems and developed a taxonomy that can be used in identifying the different tradeoffs they make in the overall design space.

CS-TN-97-47

Report Number: CS-TN-97-48

Institution: Stanford University, Department of Computer Science

Title: Conceptual Models for Comparison of Digital Library Systems and Approaches

Author: Winograd, Terry

Date: February 1997

Abstract: This document is a working paper that grew out of the discussions at the Digital Libraries joint projects meeting in Washington on Nov. 8-9, 1994. It is intended as a first rough cut at a conceptual framework for understanding the significant differences among systems and ideas, so that we can better decide where to work for interoperability and where to take complementary approaches.

CS-TN-97-48

Report Number: CS-TN-97-49

Institution: Stanford University, Department of Computer Science

Title: Why You Won't Be Buying and Selling Information Yourself

Author: Winograd, Terry

Date: February 1997

Abstract: A large part of the economics of electronic publishing of library materials will be based on site licensing, not on per-use fees.

CS-TN-97-49

Report Number: CS-TN-97-50

Institution: Stanford University, Department of Computer Science

Title: Lightweight Objects for the Digital Library

Author: Winograd, Terry

Date: February 1997

Abstract: We have been looking at the potentials for integrating Xerox's GAIA system into the INFObus architecture. Ramana suggested we look at Xerox/Novell's Document Enhanced Networking (DEN) specification which incorporates some of the GAIA ideas in a product. I went through the spec, and had some realizations about what we are trying to do with the INFObus that I thought would be generally useful. The latter part of this message is a proposal for our own architecture.

CS-TN-97-50

Report Number: CS-TN-97-51

Institution: Stanford University, Department of Computer Science

Title: The Proxy Is Where It's At!

Author: Winograd, Terry

Date: February 1997

Abstract: Soon, more than 90% of access to the Internet will be through commercial proxies.

CS-TN-97-51

Report Number: CS-TN-97-52

Institution: Stanford University, Department of Computer Science

Title: An Adaptive Agent for Automated Web Browsing

Author: Balabanovic, Marko

Author: Shoham, Yoav

Author: Yun, Yeogirl

Date: February 1997

Abstract: The current exponential growth of the Internet precipitates a need for new tools to help people cope with the volume of information. To complement recent work on creating searchable indexes of the World-Wide Web and systems for filtering incoming e-mail and Usenet news articles, we describe a system which learns to browse the Internet on behalf of a user. Every day it presents a selection of interesting Web pages. The user evaluates each page, and given this feedback the system adapts and attempts to produce better pages the following day. After demonstrating that our system is able to learn a model of a user with a single well-defined interest, we present an initial experiment where over the course of 24 days the output of our system was compared to both randomly-selected and human-selected pages. It consistently performed better than the random pages, and was better than the human-selected pages half of the time.

CS-TN-97-52

Report Number: CS-TN-97-53

Institution: Stanford University, Department of Computer Science

Title: A Communication Agreement Framework of Access Control

Author: Roscheisen, Martin

Author: Winograd, Terry

Date: February 1997

Abstract: We introduce a framework of access control which shifts the emphasis from the participants to their relationship. The framework is based on a communication model in which participants negotiate the mutually agreed-upon boundary conditions of their relationship in compact "communication pacts," called "commpacts." Commpacts can be seen as a third fundamental type next to access-control lists (ACLs) and capabilities. We argue that in current networked environments characterized by multiple authorities and "trusted proxies," this model provides an encapsulation for interdependent authorization policies, which reduces the negotiation complexity of general (user- and content-dependent) distributed access control and provides a clear user-conceptual metaphor; it also generalizes work in electronic contracting and embeds naturally into the existing legal and institutional infrastructure. The framework is intended to provide a language enabling a social mechanism of coordinated expectation.

CS-TN-97-53

Report Number: CS-TN-97-54

Institution: Stanford University, Department of Computer Science

Title: Combining CORBA and the World-Wide Web in the Stanford Digital Library Project

Author: Paepcke, Andreas

Author: Hassan, Scott

Date: February 1997

Abstract: Describes in 1.5 pages how SIDL combines CORBA and the WWW.

CS-TN-97-54

Report Number: CS-TN-97-55

Institution: Stanford University, Department of Computer Science

Title: Performance Evaluation of Centralized and Distributed Index Schemes for a Page Server OODBMS

Author: Basu, Julie

Author: Keller, Arthur M.

Author: Poess, Meikel

Date: March 1997

Abstract: Recent work on client-server data-shipping OODBs has demonstrated the usefulness of local data caching at client sites. However, none of the studies has investigated index-related performance issues in particular. References to index pages arise from associative queries and from updates on indexed attributes, often making indexes the most heavily used "hot spots" in a database. System performance is therefore quite sensitive to the index management scheme. This paper examines the effects of index caching, and investigates two schemes, one centralized and the other distributed, for index page management in a page server OODB. In the centralized scheme, index pages are not allowed to be cached at client sites; thus, communication with the central server is required for all index-based queries and index updates. The distributed index management scheme supports inter-transaction caching of index pages at client sites, and enforces a distributed index consistency control protocol similar to that of data pages. We study via simulation the performance of these two index management schemes under several different workloads and contention profiles, and identify scenarios where each of the two schemes performs better than the other.

CS-TN-97-55

Report Number: CS-TN-97-56

Institution: Stanford University, Department of Computer Science

Title: Approximation Algorithms for Directed Steiner Tree Problems

Author: Charikar, Moses

Author: Chekuri, Chandra

Author: Goel, Ashish

Author: Guha, Sudipto

Date: March 1997

Abstract: We obtain the first non-trivial approximation algorithms for the Steiner Tree problem and the Generalized Steiner Tree problem in general directed graphs. Essentially no approximation algorithms were known for these problems. For the Directed Steiner Tree problem, we design a family of algorithms which achieve an approximation ratio of O(k^\epsilon) in time O(kn^{1/\epsilon}) for any fixed (\epsilon > 0), where k is the number of terminals to be connected. For the Directed Generalized Steiner Tree Problem, we give an algorithm which achieves an approximation ratio of O(k^{2/3}\log^{1/3} k), where k is the number of pairs to be connected. Related problems including the Group Steiner tree problem, the Node Weighted Steiner tree problem and several others can be reduced in an approximation preserving fashion to the problems we solve, giving the first non-trivial approximations to those as well.

CS-TN-97-56

Report Number: CS-TN-97-57

Institution: Stanford University, Department of Computer Science

Title: Axiomatizing Flat Iteration

Author: Glabbeek, R.J. van

Date: April 1997

Abstract: Flat iteration is a variation on the original binary version of the Kleene star operation P*Q, obtained by restricting the first argument to be a sum of atomic actions. It generalizes prefix iteration, in which the first argument is a single action. Complete finite equational axiomatizations are given for five notions of bisimulation congruence over basic CCS with flat iteration, viz. strong congruence, branching congruence, eta-congruence, delay congruence and weak congruence. Such axiomatizations were already known for prefix iteration and are known not to exist for general iteration. The use of flat iteration has two main advantages over prefix iteration: 1. The current axiomatizations generalize to full CCS, whereas the prefix iteration approach does not allow an elimination theorem for an asynchronous parallel composition operator. 2. The greater expressiveness of flat iteration allows for much shorter completeness proofs. In the setting of prefix iteration, the most convenient way to obtain the completeness theorems for eta-, delay, and weak congruence was by reduction to the completeness theorem for branching congruence. In the case of weak congruence this turned out to be much simpler than the only direct proof found. In the setting of flat iteration on the other hand, the completeness theorems for delay and weak (but not eta-) congruence can equally well be obtained by reduction to the one for strong congruence, without using branching congruence as an intermediate step. Moreover, the completeness results for prefix iteration can be retrieved from those for flat iteration, thus obtaining a second indirect approach for proving completeness for delay and weak congruence in the setting of prefix iteration.

CS-TN-97-57

Report Number: CS-TN-97-58

Institution: Stanford University, Department of Computer Science

Title: Precedence Constrained Scheduling to Minimize Weighted Completion Time on a Single Machine.

Author: Chekuri, Chandra

Author: Motwani, Rajeev

Date: August 1997

Abstract: We consider the problem of scheduling a set of jobs on a single machine with the objective of mimizing weighted (average) completion time. The problem is NP-hard when there are precedence constraints between jobs, [12] and we provide a simple and efficient combinatorial 2-approximation algorithm. In contrast to our work, earlier approximation altorithms [9] achieving the same ratio are based on solving a linear programming relaxation of the problem.

CS-TN-97-58

Report Number: CS-TN-97-59

Institution: Stanford University, Department of Computer Science

Title: Stability of Networks and Protocols in the Adversarial Queueing Model for Packet Routing

Author: Goel, Ashish

Date: September 1997

Abstract: The adversarial queueing theory model for packet routing was suggested by Borodin et al. We give a complete and simple characterization of all networks that are universally stable in this model. We show that a specific greedy protocol, SIS (Shortest In System), is stable against a large class of stochastic adversaries. New applications such as multicast packet scheduling and job scheduling with precedence constraints xsare suggested for the adversarial model.

CS-TN-97-59

Report Number: CS-TN-97-60

Institution: Stanford University, Department of Computer Science

Title: Efficient Linear Re-rendering for Interactive Lighting Design

Author: Teo, Patrick C.

Author: Simoncelli, Eero P.

Author: Heeger, David J.

Date: October 1997

Abstract: We present a framework for interactive lighting design based on linear re-rendering. The rendering operation is linear with respect to light sources, assuming a fixed scene and camera geometry. This linearity means that a scene may be interactively re-rendered via linear combination of a set of basis images, each rendered under a particular basis light. We focus on choosing and designing a suitable set of basis lights. We provide examples of bases that allow 1) interactive adjustment of a spotlight direction, 2) interactive adjustment of the position of an area light, and 3) a combination in which light sources are adjusted in both position and direction. We discuss a method for reducing the size of the basis using principal components analysis in the image domain.

CS-TN-97-60

Report Number: CS-TN-97-61

Institution: Stanford University, Department of Computer Science

Title: Performance Analysis of an Associative Caching Scheme for Client-Server Databases

Author: Julie Basu, Meikel Poess, Arthur M. Keller

Date: September

Abstract: This paper presents a detailed performance study of the associative caching scheme proposed in "A Predicate-based Caching Scheme for Client-Server Database Architectures," The VLDB Journal, Jan 1996. A client cache dynamically loads query results in the course of transaction execution, and formulates a description of its current contents. Predicate-based reasoning is used on the cache description to examine and maintain the cache. The benefits of the scheme include local evaluation of associative queries, at the cost of maintaining the cached query results through update notifications >From the server. In this paper, we investigate through detailed simulation the behavior of this caching scheme for a client-server database under different workloads and contention profiles. An optimized version of our basic caching scheme is also proposed and studied. We examine both read-only and update transactions, with the effect of updates on the caching performance as our primary focus. Using an extended version of a standard database benchmark, we identify scenarios where these caching schemes improve the system performance and scalability, as compared to systems without client-side caching. Our results demonstrate that associative caching can be beneficial even for moderately high update activity.

CS-TN-97-61

Report Number: CS-TN-98-62

Institution: Stanford University, Department of Computer Science

Title: A Type System for Object Initialization in the Java Bytecode Language

Author: Freund, Stephen N.

Author: Mitchell, John C.

Date: April 1998

Abstract: In the standard Java implementation, a Java language program is compiled to Java bytecode. This bytecode may be sent across the network to another site, where it is then interpreted by the Java Virtual Machine. Since bytecode may be written by hand, or corrupted during network transmission, the Java Virtual Machine contains a bytecode verifier that performs a number of consistency checks before code is interpreted. As illustrated by previous attacks on the Java Virtual Machine, these tests, which include type correctness, are critical for system security. In order to analyze existing bytecode verifiers and to understand the properties that should be verified, we develop a precise specification of statically-correct Java bytecode, in the form of a type system. Our focus in this paper is a subset of the bytecode language dealing with object creation and initialization. For this subset, we prove that for every Java bytecode program that satisfies our typing constraints, every object is initialized before it is used. The type system is easily combined with a previous system developed by Stata and Abadi for bytecode subroutines. Our analysis of subroutines and object initialization reveals a previously unpublished bug in the Sun JDK bytecode verifier.

CS-TN-98-62

Report Number: CS-TN-98-63

Institution: Stanford University, Department of Computer Science

Title: Collaborative value filtering on the Web

Author: Rodriguez-Mula, Gerard

Author: Garcia-Molina, Hector

Author: Paepcke, Andreas

Date: May 1998

Abstract: Today's Internet search engines help users locate information based on the textual similarity of a query and potential documents. Given the large number of documents available, the user often finds too many documents, and even if the textual similarity is high, in many cases the matching documents are not relevant or of interest. Our goal is to explore other ways to decide if documents are "of value" to the user, i.e., to perform what we call "value filtering." In particular, we would like to capture access information that may tell us-within limits of privacy concerns-which user groups are accessing what data, and how frequently. This information can then guide users, for example, helping identify information that is popular, or that may have helped others before. This is a type of collaborative filtering or community-based navigation. Access information can either be gathered by the servers that provide the information, or by the clients themselves. Tracing accesses at servers is simple, but often information providers are not willing to share this information. We therefore are exploring client-side gathering. Companies like Alexa are currently using client gathering in the large. We are studying client gathering at a much smaller scale, where a small community of users with shared interest collectively track their information accesses. For this, we have developed a proxy system called the Knowledge Sharing System (KSS) that monitors the behavior of a community of users. Through this system we hope to: 1. Develop mechanisms for sharing browsing expertise among a community of users; and 2. Better understand the access patterns of a group of people with common interests, and develop good schemes for sharing this information.

CS-TN-98-63

Report Number: CS-TN-98-64

Institution: Stanford University, Department of Computer Science

Title: A Standard Textual Interchange Format for the Object Exchange Model (OEM)

Author: Goldman, Roy

Author: Chawathe, Sudarshan

Author: Crespo, Arturo

Author: McHugh, Jason

Date: May 1998

Abstract: The Object Exchange Model (OEM) serves as the basic data model in numerous projects of the Stanford University Dabase Group, including Tsimmis, Lore and C. This document first defines and explains the model, and then it describes a syntax for textually encoding OEM. By adopting this syntax as a standard across all of our OEM projects, we hope to encourage interoperability and also to provide a consistent view of OEM to interested parties outside Stanford.

CS-TN-98-64

Report Number: CS-TN-98-65

Institution: Stanford University, Department of Computer Science

Title: Responsive Interaction for a Large Web Application The Meteor Shower Architecture in the WebWriter II Editor

Author: Crespo, Arturo

Author: Chang, Bay-Wei

Author: Bier, Eric A.

Date: May 1998

Abstract: Traditional server-based web applications allow access to server-hosted resources, but often exhibit poor responsiveness due to server load and network delays. Client-side web applications, on the other hand, provide excellent interactivity at the expense of limited access to server resources. The WebWriter II Editor, a direct manipulation HTML editor that runs in a web browser, uses both server-side and client-side processing in order to achieve the advantages of both. In particular, this editor downloads the document data structure to the browser and performs all operations locally. The user interface is based on HTML frames and includes individual frames for previewing the document and displaying general and specific control panels. All editing is done by JavaScript code residing in roughly twenty HTML pages that are downloaded into these frames as needed. Such a client-server architecture, based on frames, client-side data structures, and multiple JavaScript-enhanced HTML pages appears promising for a wide variety of applications. This paper describes this architecture, the Meteor Shower Application Architecture, and its use in the WebWriter II Editor.

CS-TN-98-65

Report Number: CS-TN-98-66

Institution: Stanford University, Department of Computer Science

Title: Archival Storage for Digital Libraries

Author: Crespo, Arturo

Author: Garcia-Molina, Hector

Date: May 1998

Abstract: We propose an architecture for Digital Library Repositories that assures long-term archival storage of digital objects. The architecture is formed by a federation of independent but collaborating sites, each managing a collection of digital objects. The architecture is based on the following key components: use of signatures as object handles, no deletions of digital objects, functional layering of services, the presence of an awareness service in all layers, and use of disposable auxiliary structures. Long-term persistence of digital objects is achieved by creating replicas at several sites.

CS-TN-98-66

Report Number: CS-TN-98-67

Institution: Stanford University, Department of Computer Science

Title: A GUI-Based Version of the SenseMaker Interface for Information Exploration

Author: Baldonado, Michelle Q Wang

Author: Winograd, Terry

Date: May 1998

Abstract: SenseMaker is an interface for information exploration. The original HTML version of the interface relied on tables for display and forms for interaction. The new Java version is GUI-based. This video illustrates the new SenseMaker interface by presenting a hypothetical scenario of a user carrying out an information-exploration task.

CS-TN-98-67

Report Number: CS-TN-98-68

Institution: Stanford University, Department of Computer Science

Title: Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext

Author: James, Frankie

Date: May 1998

Abstract: This paper discusses the results of a 2 by 4 mixed design experiment testing various ways of presenting HTML structures in audio. Four interface styles were tested: (1) one speaker, minimal sound effects, (2) one speaker, many sound effects, (3) many speakers, minimal sound effects, and (4) many speakers, many sound effects. The results obtained were both specific to the interfaces used (i.e., that the use of three different speakers to present heading levels was confusing) and more general (for example, natural sounds are more distinguishable and easier to remember than tones). A short discussion of typical HTML usage is also presented.

CS-TN-98-68

Report Number: CS-TN-98-69

Institution: Stanford University, Department of Computer Science

Title: Distinguishability vs. Distraction in Audio HTML Interfaces

Author: James, Frankie

Date: May 1998

Abstract: In this paper, we present the findings and conclusions from a user study on audio interfaces. In the experiment we discuss, we studied a framework for choosing sounds for audio interfaces by comparing a prototype interface against two existing audio browsers. Our findings indicate that our initial framework, which was described as a separation between recognizable and non-recognizable sounds, could be better interpreted in the context of the distinguishability and distraction level of various types of sounds. We propose a new definition of how a sound can be called distracting and how to avoid this when creating audio interfaces.

CS-TN-98-69

Report Number: CS-TN-98-70

Institution: Stanford University, Department of Computer Science

Title: Merging Ranks from Heterogeneous Internet Sources

Author: Garcia-Molina, Hector

Author: Gravano, Luis

Date: May 1998

Abstract: Many sources on the Internet and elsewhere rank the objects in query results according to how well these objects match the original query. For example, a real-estate agent might rank the available houses according to how well they match the user's preferred location and price. In this environment, ``meta-brokers'' usually query multiple autonomous, heterogeneous sources that might use varying result- ranking strategies. A crucial problem that a meta-broker then faces is extracting from the underlying sources the top objects for a user query according to the meta-broker's ranking function. This problem is challenging because these top objects might not be ranked high by the sources where they appear. In this paper we discuss strategies for solving this ``meta-ranking'' problem. In particular, we present a condition that a source must satisfy so that a meta-broker can extract the top objects for a query from the source without examining its entire contents. Not only is this condition necessary but it is also sufficient, and we show an efficient algorithm to extract the top objects from sources that satisfy the given condition.

CS-TN-98-70

Report Number: CS-TN-98-71

Institution: Stanford University, Department of Computer Science

Title: Stanford Digital Library Interoperability Protocol

Author: Hassan, Scott

Author: Paepcke, Andreas

Date: May 1998

Abstract: Description of Stanford's interoperability protocol for interacting with search related proxy objects on the InfoBus.

CS-TN-98-71

Report Number: CS-TN-98-72

Institution: Stanford University, Department of Computer Science

Title: The Stanford InfoBus and Its Service Layers: Augmenting the Internet with Higher-Level Information Management Protocols

Author: Roscheisen, Martin

Author: Baldonado, Michelle

Author: Chang, Kevin

Author: Gravano, Luis

Author: Ketchpel, Steven

Author: Paepcke, Andreas

Date: May 1998

Abstract: The Stanford InfoBus is a prototype infrastructure developed as part of the Stanford Digital Libraries Project to extend the current Internet protocols with a suite of higher-level information management protocols. This paper surveys the five service layers pro vided by the Stanford InfoBus: protocols for managing items and collections (DLIOP), metadata (SMA), search (STARTS), payment (UPAI), and rights and obligations (FIRM).

CS-TN-98-72

Report Number: CS-TN-98-73

Institution: Stanford University, Department of Computer Science

Title: Proposal for I**3 Client Server Protocol

Author: Garcia-Molina, Hector

Author: Paepcke, Andreas

Date: May 1998

Abstract: This document proposes a CORBA-based protocol for submitting queries to servers and for obtaining the results. It is a subset of the Stanford Digital Library Interoperability protocol (DLIOP).

CS-TN-98-73

Report Number: CS-TN-98-74

Institution: Stanford University, Department of Computer Science

Title: Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System

Author: Chang, Chen-Chuan K.

Author: Garcia-Molina, Hector

Author: Paepcke, Andreas

Date: May 1998

Abstract: Searching over heterogeneous information sources is difficult in part because of the non- uniform query languages. Our approach is to allow users to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final results. In this paper we introduce the architecture and associated mechanism for query translation. In particular, we discuss techniques for rewriting predicates in Boolean queries into native subsuming forms, which is a basis of translating complex queries. In addition, we present experimental results for evaluating the cost of post-filtering. We also discuss the drawbacks of this approach and cases when it may not be effective. We have implemented prototype versions of these mechanisms and demonstrated them on heterogeneous Boolean systems.

CS-TN-98-74

Report Number: CS-TN-98-75

Institution: Stanford University, Department of Computer Science

Title: An Extensible Constructor Tool for the Rapid, Interactive Design of Query Synthesizers

Author: Baldonado, Michelle

Author: Katz, Seth

Author: Paepcke, Andreas

Author: Chang, Chen-Chuan K.

Author: Garcia-Molina, Hector

Author: Winograd, Terry

Date: May 1998

Abstract: We describe an extensible constructor tool that helps information experts (e.g., librarians) create specialized query synthesizers for heterogeneous digital-library environments. A query synthesizer provides a graphical user interface in which a digital-library patron can specify a high-level, fielded, multi-source query. Furthermore, a query synthesizer interacts with a query translator and an attribute translator to transform high-level queries into sets of source-specific queries. We discuss how the constructor can facilitate discovery of available attributes (e.g., title), collation of schemas from different sources, selection of input widgets for a synthesizer (e.g., a text box or a drop-down list widget to support input of controlled vocabulary),, and other design aspects. We also describe a prototype constructor we implemented, based on the Stanford InfoBus and metadata architecture.

CS-TN-98-75

Report Number: CS-TN-98-76

Institution: Stanford University, Department of Computer Science

Title: Interoperability for Digital Libraries Worldwide

Author: Paepcke, Andreas

Author: Chang, Chen-Chuan K.

Author: Garcia-Molina, Hector

Author: Winograd, Terry

Date: May 1998

Abstract: Discusses the history and current directions of interoperability in different parts of computing systems relevant to Digital Libraries

CS-TN-98-76

Report Number: CS-TN-98-77

Institution: Stanford University, Department of Computer Science

Title: A Linguistic Characterization of Bounded Oracle Computation and Probabilistic Polynomial Time

Author: Mitchell, J.

Author: Mitchell, M.

Author: Scedrov, A.

Date: May 4, 1998

Abstract: We present a higher-order functional notation for polynomial-timecomputation with arbitrary 0,1-valued oracle. This provides a linguistic characterization for classes such as NP and BPP, as well as a notation for probabilistic polynomial-time functions. The language is derived fromHofmann's adaptation of Bellantoni-Cook safe recursion, extended to oraclecomputation via work derived from that of Kapron and Cook. Like Hofmann'slanguage, ours is an applied version of typed lambda calculus withcomplexity bounds enforced by a type system. The type system uses a modaloperator to distinguish between two types of numerical expressions, onlyone of which is allowed in recursion indices. The proof that the languagecaptures precisely oracle polynomial time is model-theoretic, usingadaptations of various techniques from category theory.

CS-TN-98-77

Report Number: CS-TN-98-78

Institution: Stanford University, Department of Computer Science

Title: A Probabilistic Poly-time Framework for Protocol Analysis

Author: Lincoln, P.

Author: Mitchell, J.

Author: Mitchell, M.

Author: Scedrov, A.

Date: April 3, 1998

Abstract: We develop a framework for analyzing security protocolsin which protocol adversaries may be arbitrary probabilistic polynomial-time processes. In this framework, protocols are written in a restricted formof pi-calculus and security may expressed as a form or observational equivalence, a standard relation from programming language theory that involves quantifying over possible environments that might interact with the protocol.Using an asymptotic notion of probabilistic equivalence, we relateobservational equivalence to polynomial-time statistical tests and discusssome example protocols to illustrate the potential strengths of our approach.

CS-TN-98-78

Report Number: CS-TN-98-79

Institution: Stanford University, Department of Computer Science

Title: 2D BubbleUp: Managing Parallel Disks for Media Servers

Author: Chang, Edward

Author: Garcia-Molina, Hector

Author: Li, Chen

Date: July 1998

Abstract: In this study we present a scheme called two-dimensional BubbleUp (2DB) for managing parallel disks in a multimedia server. Its goal is to reduce initial latency for interactive multimedia applications, while balancing disk loads to maintain high throughput. The 2DB scheme consists of a data placement and a request scheduling policy. The data placement policy replicates frequently accessed data and places them cyclically throughout the disks. The request scheduling policy attempts to maintain free ``service slots'' in the immediate future. These slots can then be used to quickly service newly arrived requests. Through examples and simulation, we show that our scheme significantly reduces initial latency and maintains throughput comparable to that of the traditional schemes.

CS-TN-98-79

Report Number: CS-TN-98-80

Institution: Stanford University, Department of Computer Science

Title: MEDIC: A Memory & Disk Cache for Multimedia Clients

Author: Chang, Edward

Author: Garcia-Molina, Hector

Date: July 1998

Abstract: In this paper we propose an integrated memory and disk cache for a multimedia client. The cache cushions the multimedia decoder from input rate fluctuations and mismatches, and because data can be cached to disk, the acceptable fluctuations can be very large. This gives the media server much greater flexibility for load balancing, and lets the client operate efficiently when the network rate is much larger or smaller than the media display rate. We analyze the memory requirements for this cache, and analytically derive safe values for its control parameters. Using a realistic case study, we study the interaction between memory size, peak input rate, and disk performance, and show that a relatively modest amount of main memory can support a wide range of scenarios.

CS-TN-98-80

Report Number: CS-TN-98-81

Institution: Stanford University, Department of Computer Science

Title: How to build a DLITE component

Author: Cousins, Steve B.

Date: July 1998

Abstract: This paper describes how to build a DLITE component.

CS-TN-98-81

Report Number: CS-TN-98-82

Institution: Stanford University, Department of Computer Science

Title: Minimizing Memory Requirements in Media Servers

Author: Chang, Edward

Author: Chen, Yi-Yin

Date: July 1998

Abstract: Poor memory management policies lead to lower throughput and excessive memory requirements. This problem is aggravated in multimedia databases by the large volume and real-time data requirements. This study explores the temporal and spatial relationships among concurrent media streams. Specifically, we propose adding proper delays to space out IOs in a media server to give more room for buffer sharing among streams. Memory requirements can be reduced by trading time for space. We present and prove theorems that state the optimal IO schedules for reducing memory requirements for two cases: streams with the same required display rate and different display rates. We also show how the theorems can be put in practice to improve system performance.

CS-TN-98-82

Report Number: CS-TN-98-83

Institution: Stanford University, Department of Computer Science

Title: Stanford DLITE User Study

Author: Mortensen, Mark

Date: July 1998

Abstract: User tests were conducted on the DLITE digital workspace. These consisted of observed use of the DLITE system, followed by an interview with the test administrator. The tests themselves were carried out both remotely over a network, and locally in the digital libraries lab on subjects with moderate computer knowledge. Initial tests resulted in system failures that caused DLITE to crash or become totally unusable. In subsequent tests, users noted a number of areas of DLITE that caused confusion. In particular, the instantiation of queries, the purpose and functioning of the graphics in the upper-left corner of objects, and the obscuring of objects in the workspace when dragging large components. Given the reactions of users in the post-test interview, these problems do not appear to be flaws in the design of DLITE, but implementation errors not intrinsic to the model upon which the functionality is based.

CS-TN-98-83

Report Number: CS-TN-98-84

Institution: Stanford University, Department of Computer Science

Title: Lessons from Developing Audio HTML Interfaces

Author: James, Frankie

Date: July 1998

Abstract: In this paper, we discuss our previous research on the establishment of guidelines and principles for choosing sounds to use in an audio interface to HTML, called the AHA framework. These principles, along with issues related to the target audience such as user tasks, goals, and interests are factors that can help us to choose specific sounds for the interface. We conclude by describing scenarios of two potential users and the interfaces that would seem to be appropriate for them.

CS-TN-98-84

Report Number: CS-TN-98-85

Institution: Stanford University, Department of Computer Science

Title: Conjunctive Constraint Mapping for Data Translation

Author: Chang, Chen-Chuan K.

Author: Garcia-Molina, Hector

Date: July 1998

Abstract: In this paper we present a mechanism for translating information in heterogeneous digital library environments. We model information as a set of conjunctive constraints that are satisfied by real- world objects (e.g, documents, their metadata). Through application of semantic rules and value transformation functions, constraints are mapped into ones understood and supported in another context. Our machinery can also deal with hierarchically structured information.

CS-TN-98-85

Report Number: CS-TN-98-86

Institution: Stanford University, Department of Computer Science

Title: The Earth Mover's Distance as a Metric for Image Retrieval

Author: Rubner, Yossi

Author: Tomasi, Carlo

Author: Guibas, Leonidas J.

Date: September 1998

Abstract: We introduce a metric between two distributions that we call the Earth Mover's Distance (EMD). The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense. We show that the EMD has attractive properties for content-based image retrieval. The most important one, as we show, is that it matches perceptual similarity better than other distances used for image retrieval. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances.

CS-TN-98-86

Report Number: CS-TN-98-87

Institution: Stanford University, Department of Computer Science

Title: Scheduling Algebra

Author: Glabbeek, R.J. van

Author: Rittgen, P.

Date: December 1998

Abstract: The goal of this paper is to develop an algebraic theory of process scheduling. We specify a syntax for denoting processes composed of actions with given durations. Subsequently, we propose axioms for transforming any specification term of a scheduling problem into a term of all valid schedules. Here a schedule is a process in which all (implementational) choices (e.g. precise timing) are resolved. In particular, we axiomatize an operator restricting attention to the efficient schedules. These schedules turn out to be representable as trees, because in an efficient schedule actions start only at time zero or when a resource is released, i.e. upon termination of the action binding a required resource. All further delay would be useless. Nevertheless, we do not consider resource constraints explicitly here. We show that a normal form exists for every term of the algebra and establish soundness of our axiom system with respect to a schedule semantics, as well as completeness for efficient processes.

CS-TN-98-87

Report Number: CS-TN-99-88

Institution: Stanford University, Department of Computer Science

Title: Truth Revelation in Rapid, Approximately Efficient Combinatorial Auctions

Author: Lehmann, Daniel

Author: O'Callaghan, Liadan Ita

Author: Shoham, Yoav

Date: July 1999

Abstract: Some important classical mechanisms considered in Microeconomics and Game Theory require the solution of a difficult optimization problem. This is true of mechanisms for combinatorial auctions, which have in recent years assumed practical importance, and in particular of the gold standard for combinatorial auctions, the Generalized Vickrey Auction (GVA). Traditional analysis of these mechanisms - in particular, their truth revelation properties - assumes that the optimization problems are solved precisely. In reality, these optimization problems can usually be solved only in an approximate fashion. We investigate the impact on such mechanisms of replacing exact solutions by approximate ones. Specifically, we look at a particular greedy optimization method, which has empirically been shown to perform well. We show that the GVA payment scheme does not provide for a truth revealing mechanism. We introduce another scheme that does guarantee truthfulness for a restricted class of players. We demonstrate the latter property by identifying sufficient conditions for a combinatorial auction to be truth-revealing, conditions which have applicability beyond the specific auction studied here.

CS-TN-99-88

Report Number: CS-TN-99-89

Institution: Stanford University, Department of Computer Science

Title: Extending Greedy Multicast Routing to Delay Sensitive Applications

Author: Goel, Ashish

Author: Munagala, Kamesh

Date: July 1999

Abstract: Given a weighted undirected graph G(V,E) and a subset R of V, a Steiner tree is a subtree of G that contains each vertex in R. We present an online algorithm for finding a Steiner tree that simultaneously approximates the shortest path tree and the minimum weight Steiner tree, when the vertices in the set R are revealed in an online fashion. This problem arises naturally while trying to construct source-based multicast trees of low cost and good delay. The cost of the tree we construct is within an O(log |R|) factor of the optimal cost, and the path length from the root to any terminal is at most O(1) times the shortest path length. The algorithm needs to perform at most one reroute for each node in the tree. Our algorithm extends the results of Khuller etal and Awerbuch etal, who looked at the offline problem. We conduct extensive simulations to compare the performance of our algorithm (in terms of cost and delay) with that of two popular multicast routing strategies: shortest path trees and the online greedy Steiner tree algorithm.

CS-TN-99-89

Report Number: CS-TN-99-90

Institution: Stanford University, Department of Computer Science

Title: Simulation of Iterative Matching for Combined Input and Output Queueing

Author: Pichai, Srinivasan

Author: Mudulodu, Sriram

Date: August 1999

Abstract: Since its introduction the Stable Marriage problem has been a subject of interest in mathematics and computer science. Recently this algorithm has found application in the area of switch scheduling algorithms for high performance switches. Inputs and Output ports of the switch compute their preference lists based on expected departure times for an ideal output queued switch. The stable matching as computed by the Galey-Shapley for this set of preferences determines the configuration of the interconnection fabric. The nature of the stable matching enables the emulation of an output-queued switch with combined input and output queueing using a speedup factor of 2. However it is important to compute the stable match efficiently for high performance. Hence parallel iterative versions of the algorithm have been proposed. In this report we investigate the convergence time of the parallel stable matching algorithm. The definition of the preference lists imposes special constraints on the problem and this reduces the worst case complexity of the algorithm. Simulations have shown that convergence time for the average case is also considerably lower than the general version of the algorithm.

CS-TN-99-90

Report Number: CS-TN-00-92

Institution: Stanford University, Department of Computer Science

Title: Cost-Distance: Two Metric Network Design

Author: Meyerson, Adam

Author: Munagala, Kamesh

Author: Plotkin, Serge

Date: February 2000

Abstract: We present the Cost-Distance problem: finding a Steiner tree which optimizes the sum of edge costs along one metric and the sum of source-sink distances along an unrelated second metric. We give the first known O(log k) randomized approximation scheme for Cost-Distance, where k is the number of sources. We reduce many common network design problems to Cost-Distance, obtaining (in some cases) the first known logarithmic approximation for them. These problems include single-sink buy-at-bulk with variable pipe types between different sets of nodes, and facility location with buy-at-bulk type costs on edges. Our algorithm is also the algorithm of choice for several previous network design problems, due to its ease of implementation and fast running time.

CS-TN-00-92

Report Number: CS-TN-00-93

Institution: Stanford University, Department of Computer Science

Title: Hierarchical Placement and Network Design Problems.

Author: Guha, Sudipto

Author: Meyerson, Adam

Author: Munagala, Kamesh

Date: May 2000

Abstract: In this paper, we give the first constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer satisfies a fixed percentage of the demand (bounded miss rates). We present a constant approximation to the minimum total cost of placing the caches and routing demand through the layers. We extend this model to cover more general layered caching scenarios, giving the first constant approximation to the well studied multi-level facility location problem. We consider a facility location variant, the Load Balanced Facility Location problem in which every demand is served by a unique facility and each open facility must serve at least a certain amount of demand. By combining Load Balanced Facility Location with our results on hierarchical caching, we give the first constant approximation for the Access Network Design problem.

CS-TN-00-93

Report Number: CS-TN-00-94

Institution: Stanford University, Department of Computer Science

Title: Web Caching using Access Statistics

Author: Meyerson, Adam

Author: Munagala, Kamesh

Author: Plotkin, Serge

Date: May 2000

Abstract: We present the problem of caching web pages under the assumption that each user has a fixed, known demand vector for the pages. Such demands could be computed using access statistics. We wish to place web pages in the caches in order to optimize the latency from user to page, under the constraints that each cache has limited memory and can support a limited total number of requests. When C caches are present with fixed locations, we present a constant factor approximation to the latency while exceeding capacity constraints by log C. We improve this result to a constant factor provided no replication of web pages is allowed. We present a constant factor approximation where the goal is to minimize the maximum latency. We also consider the case where we can place our own caches in the network for a cost, and produce a constant approximation to the sum of cache cost plus weighted average latency. Finally, we extend our results to incorporate page update latency, temporal variation in request rates, and economies of scale in cache costs.

CS-TN-00-94

Report Number: CS-TN-00-95

Institution: Stanford University, Department of Computer Science

Title: Facility Location with Demand Dependent Costs and Generalized Clustering

Author: Guha, Sudipto

Author: Meyerson, Adam

Author: Munagala, Kamesh

Date: May 2000

Abstract: We solve the vaiant of facility location problem in which the costs of facilities depend on the demand served, more specifically decrease with the demand served. We show application of this problem to generalized clustering problems which does not penalize large clusters.

CS-TN-00-95

Report Number: CS-TN-00-96

Institution: Stanford University, Department of Computer Science

Title: Improved Combinatorial Algorithms for Single Sink Edge Installation Problems.

Author: Guha, Sudipto

Author: Meyerson, Adam

Author: Munagala, Kamesh

Date: June 2000

Abstract: We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per unit length to route demands at a set of sources to a single sink. The distances in the underlying network form a metric. This result improves the previous bound of log |S|, where S is the set of sources. We also present an improved constant approximation to the related Access Network Design problem. Our algorithms are randomized and fully combinatorial. They can be derandomized easily at the cost of a constant factor loss in the approximation ratio.

CS-TN-00-96

Report Number: CS-TN-00-97

Institution: Stanford University, Department of Computer Science

Title: Hardware Support for Tamper-Resistant and Copy-Resistant Software

Author: Boneh, Dan

Author: Lie, David

Author: Lincoln, Pat

Author: Mitchell, John

Author: Mitchell, Mark

Date: July 2000

Abstract: Although there have been many attempts to develop code transformations that yield tamper-resistant software, no reliable software-only methods are known. Motivated by numerous potential applications, we investigate a prototype hardware mechanism that supports software tamper-resistance with an atomic decrypt-and-execute operation. Our hardware architecture uses a novel combination of standard architectural units. As usual, security has its costs. In this design, the most difficult security tradeoffs involve testability and performance.

CS-TN-00-97