Institution: Stanford University, Department of Computer Science

Title: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.

Author: Gray, Cary G.

Author: Cheriton, David R.

Date: January 1990

Abstract: Caching introduces the overhead and complexity of ensuring consistency, reducing some of its performance benefits. In a distributed system, caching must deal with the additional complications of commumcation and host failures. Leases are proposed as a time-based mechanism that provides efficient consistent access to cached data in distributed systems. Non-Byzantine failures affect performance, not correctness ,with their effect minimized by short leases. An analytic model and an evaluation for file access in the V system show that leases of short duration provide good performance. The impact of leases on performance grows more significant in systems of larger scale and higher processor performance.

CS-TR-90-1298

Report Number: CS-TR-90-1304

Institution: Stanford University, Department of Computer Science

Title: A model of object-identities and values

Author: Matsushima, Toshiyuki

Author: Wiederhold, Gio

Date: February 1990

Abstract: An algebraic formalization of the object-orlented data model is proposed. The formalism reveals that the semantics of the object-oriented model consists of two portions. One is expressed by an algebraic construct, which has essentially a value-oriented semantics. The other is expressed by object-identities, which characterize the essential difference of the object-oriented model and value-oriented models, such as the relational model and the logical database model. These two portions are integrated by a simple commutativity of modeling functions. The formalism includes the expression of integrity constraints in its construct, which provides the natural integration of the logical database model and the object-oriented database model.

CS-TR-90-1304

Report Number: CS-TR-90-1305

Institution: Stanford University, Department of Computer Science

Title: A comparative evaluation of nodal and supernodal parallel sparse matrix factorization: detailed simulation results

Author: Rathberg, Edward

Author: Gupta, Anoop

Date: February 1990

Abstract: In this paper we consider the problem of factoring a large sparse system of equations on a modestly parallel shared-memory multiprocessor with a non-trivial memory hierarchy. Using detailed multiprocessor simulation, we study the behavior of the parallel sparse factorization scheme developed at the Oak Ridge National Laboratory. We then extend the Oak Ridge scheme to incorporate the notion of supernodal elimination. We present detailed analyses of the sources of performance degradation for each of these schemes. We measure the impact of interprocessor communication costs, processor load imbalance, overheads introduced in order to distribute work, and cache behavior on overall parallel performance. For the three benchmark matrices which we study, we find that the supernodal scheme gives a factor of 1.7 to 2.7 performance advantage for 8 processors and a factor of 0.9 to 1.6 for 32 processors. The supemodal scheme exhibits higher performance due mainly to the fact that it executes many fewer memory operations and produces fewer cache misses. However, the natural task grain size for the supernodal scheme is much larger than that of the Oak Ridge scheme, making effective distnbution of work more difficult, especially when the number of processors is large.

CS-TR-90-1305

Report Number: CS-TR-90-1307

Institution: Stanford University, Department of Computer Science

Title: Real-time logics: complexity and expressiveness

Author: Alur, Rajeev

Author: Henzinger, Thomas A.

Date: March 1990

Abstract: The theory of the natural numbers with linear order and monadic predicates underlies propositional linear temporal logic. To study temporal logics for real-time systems, we combine this classical theory of infinite state sequences with a theory of time, via a monotonic function that maps every state to its time. The resulting theory of timed state sequences is shown to be decidable, albeit nonelementary, and its expressive power is characterized by omega-regular sets. Several more expressive variants are proved to be highly undecidable. This framework allows us to classify a wide variety of real-time logics according to their complexity and expressiveness. In fact, it follows that most formalisms proposed in the literature cannot be decided. We are, however, able to identify two elementary real-time temporal logics as expressively complete fragments of the theory of timed state sequences, and give tableau-based decision procedures. Consequently, these two formalisms are well-suited for the specification and verification of real-time systems.

CS-TR-90-1307

Report Number: CS-TR-90-1312

Institution: Stanford University, Department of Computer Science

Title: A validation structure based theory of plan modification and reuse

Author: Kambhampati, Subbarao

Author: Hendler, James A.

Date: June 1990

Abstract: A framework for the flexible and conservative modification of plans enables a planner to modify its plans in response to incremental changes in their specifications, to reuse its existing plans in new problem situations, and to efficiently replan in response to execution time failures. We present a theory of plan modification applicable to hierarchical nonlinear planning. Our theory utilizes the validation structure of stored plans to yield a flexible and conservative plan modification framework. The validation structure, which constitutes a hierarchical explanation of correctness of the plan with respect to the planner's own knowledge of the domain, is annotated on the plan as a by-product of initial planning. Plan modification is formalized as a process of removing inconsistencies in the validation structure of a plan when it is being reused in a new (changed) planning situation. The repair of these inconsistencies involves removing unnecessary parts of the plan and adding new non-primitive tasks to the plan to establish missing or failing validations. The resultant partially reduced plan (with a consistent validation structure) is sent to the planner for complete reduction. We discuss the development of this theory in the PRIAR system, present an empirical evaluation of this theory, and characterize its completeness, coverage, efficiency and limitations.

CS-TR-90-1312

Report Number: CS-TR-90-1313

Institution: Stanford University, Department of Computer Science

Title: Book review: Potokovye Algoritmy (Flow Algorithms) by G. M. Adel'son-Vel'ski, E. A. Dinic, and A. V. Karzanov.

Author: Goldberg, Andrew V.

Author: Gusfield, Dan

Date: June 1990

Abstract: This is a review of the book "Flow Algorithms" by Adel'son-Vel'ski, Dinic, and Karzanov, well-known researchers in the area of algorithm design and analysis. This remarkable book, published in 1975, is written in Russian and has never been translated into English. What is remarkable about the book is that it describes many major results obtained in the Soviet Union (and originally published in papers by 1976) that were independently discovered later (and in some cases much later) in the West. The book also contains some minor results that we believe are still unknown in the West. The book is well-written and a pleasure to read, at least for someone fluent in Russian. Although the book is fifteen years old and we believe that all the major results contained in it are known in the West by now, the book is still of great historical importance. Hence a complete review is in order. [from the Introduction]

CS-TR-90-1313

Report Number: CS-TR-90-1314

Institution: Stanford University, Department of Computer Science

Title: Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems

Author: Koza, John R.

Date: June 1990

Abstract: Many seemingly different problems in artificial intelligence, symbolic processing, and machine learning can be viewed as requiring discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a most fit individual computer program. The new "genetic programming" paradigm described herein provides a way to search for this most fit individual computer program. In this new "genetic programming" paradigm, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (recombination) operator appropriate for genetically mating computer programs. In this paper, the process of formulating and solving problems using this new paradigm is illustrated using examples from various areas. Examples come from the areas of machine learning of a function; planning; sequence induction; function function identification (including symbolic regression, empirical discovery, "data to function" symbolic integration, "data to function" symbolic differentiation); solving equations, including differential equations, integral equations, and functional equations); concept formation; automatic programming; pattern recognition, time-optimal control; playing differential pursuer-evader games; neural network design; and finding a game-playing strategyfor a discrete game in extensive form.

CS-TR-90-1314

Report Number: CS-TR-90-1318

Institution: Stanford University, Department of Computer Science

Title: Techniques for improving the performance of sparse matrix factorization on multiprocessor workstations

Author: Rothberg, Edward

Author: Gupta, Anoop

Date: June 1990

Abstract: In this paper we look at the problem of factoring large sparse systems of equations on high-performance multiprocessor workstations. While these multiprocessor workstations are capable of very high peak floating point computation rates, most existing sparse factorization codes achieve only a small fraction of this potential. A major limiting factor is the cost of memory accesses performed during the factorization. ln this paper, we describe a parallel factorization code which utilizes the supernodal structure of the matrix to reduce the number of memory references. We also propose enhancements that significantly reduce the overall cache miss rate. The result is greatly increased factorization performance. We present experimental results from executions of our codes on the Silicon Graphics 4D/380 multiprocessor. Using eight processors, we find that the supernodal parallel code achieves a computation rate of approximately 40 MFLOPS when factoring a range of benchmark matrices. This is more than twice as fast as the parallel nodal code developed at the Oak Ridge National Laboratory running on the SGI 4D/380.

CS-TR-90-1318

Report Number: CS-TR-90-1321

Institution: Stanford University, Department of Computer Science

Title: Tools and rules for the practicing verifier

Author: Manna, Z ohar

Author: Pnueli, Amir

Date: July 1990

Abstract: The paper presents a minimal proof theory which is adequate for proving the main important temporal properties of reactive programs. The properties we consider consist of the classes of invariance, response, and precedence properties. For each of these classes we present a small set of rules that is complete for verifying properties belonging to this class. We illustrate the application of these rules by analyzing and verifying the properties of a new algorithm for mutual exclusion.

CS-TR-90-1321

Report Number: CS-TR-90-1323

Institution: Stanford University, Department of Computer Science

Title: Protograms

Author: Mozes, Eyal

Author: Shoham, Yoav

Date: July 1990

Abstract: Motivated largely by tasks that require control of complex processes in a dynamic environment, we introduce a new computational construct called a protogram. A protogram is a program specifying an abstract course of action, a course that allows for a range of specific actions, from which a choice is made through interaction with other protograms. We discuss the intuition behind the notion, and then explore some of the details involved in implementing it. Specifically, we (a) describe a general scheme of protogram interaction, (b) describe a protogram interpreter that has been implemented, dealing with some special cases, (c) describe three applications of the protogram interpreter, one in data processing and two in robotics (both currently only implemented as simulations), (d) describe some more general possible implementations of a protogram interpreter, and (e) discuss how protograms can be useful for the Gofer project. We also briefly discuss the origins of protograms in psychology and linguistics, compare protograms to blackboard and subsumption architectures, and discuss directions for future research.

CS-TR-90-1323

Report Number: CS-TR-90-1324

Institution: Stanford University, Department of Computer Science

Title: On the complexity of monotonic inheritance with roles

Author: Guerreiro, Ramiro A. de T.

Author: Hemerly, S.

Author: Shoham, Yoav

Date: July 1990

Abstract: We investigate the complexity of reasoning with monotonic inheritance hierarchies that contain, beside ISA edges, also ROLE (or FUNCTION) edges. A ROLE edge is an edge labelled with a name such as spouse of or brother of. We call such networks ISAR networks. Given a network with n vertices and m edges, we consider two problems: ($P_1$) determining whether the network implies an isa relation between two particular nodes, and ($P_2$) determining all isa relations implied by the network. As is well known, without ROLE edges the time complexity of $P_1$, is O(m), and the time complexity of $P_2$ is O($n^3$). Unfortunately, the results do not extend naturally to ISAR networks, except in a very restricted case. For general ISAR network we first give an polynomial algorithm by an easy reduction to proposional Horn theory. As the degree of the polynomial is quite high (O(m$n^4$) for $P_1$, O(m$n^6$) for $P_2$), we then develop a more direct algorithm. For both $P_1$ and $P_2$ its complexity is O($n^3 + m^2$). Actually, a finer analysis of the algorithm reveals a complexity of O(nr(log r) + $n^2$r+ $n^3), where r is the number of different ROLE labels. One corolary is that if we fix the number of ROLE labels, the complexity of our algorithm drops back to O($n^3$).

CS-TR-90-1324

Report Number: CS-TR-90-1329

Institution: Stanford University, Department of Computer Science

Title: An interleaving model for real time.

Author: Henzinger, Thomas A.

Author: Manna, Z ohar

Author: Pnueli, Amir

Date: September 1990

Abstract: The interleaving model is both adequate and sufficiently abstract to allow for the practical specification and verification of many properties of concurrent systems. We incorporate real time into this model by defining the abstract notion of a real-time transition system as a conservative extension of traditional transition systems: qualitative fairness requirements are replaced (and superseded) by quantitative lower-bound and upper-bound real-time requirements for transitions. We present proof rules to establish lower and upper real-time bounds for response properties of real-time transition systems. This proof system can be used to verify bounded-invariance and bounded-response properties, such as timely terrnination of shared-variables multi-process systems, whose semantics is defined in terms of real-time transition systems.

CS-TR-90-1329

Report Number: CS-TR-90-1330

Institution: Stanford University, Department of Computer Science

Title: Parallel ICCG on a hierarchical memory multiprocessor - addressing the triangular solve bottleneck

Author: Rothberg, Edward

Author: Gupta, Anoop

Date: October 1990

Abstract: The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly used iterative method for solving large sparse systems of equations. In this paper, we study the parallel solution of sparse triangular systems of equations, the most difficult aspect of implementing the ICCG method on a multiprocessor. We focus on shared-memory multiprocessor architectures with deep memory hierarchies. On such architectures we find that previously proposed parallelization approaches result in little or no speedup. The reason is that these approaches cause significant increases in the amount of memory system traffic as compared to a sequential approach. Increases of as much as a factor of 10 on four processors were observed. In this paper we propose new techniques for limiting these increases, including data remappings to increase spatial locality, new processor synchronization techniques to decrease the use of auxiliary data structures, and data partitioning techniques to reduce the amount of interprocessor communication. With these techniques, memory system traffic is reduced to as little as one sixth of its previous volume. The resulting speedups are greatly improved as well, although they are still much less than linear. We discuss the factors that limit further speedups. We present both simulation results and results of experiments on an SGI 4D/340 multiprocessor.

CS-TR-90-1330

Report Number: CS-TR-90-1337

Institution: Stanford University, Department of Computer Science

Title: A simplifier for untyped lambda expressions

Author: Galbiati, Louis

Author: Talcott, Carolyn

Date: October 1990

Abstract: Many applicative programming languages are based on the call-by-value lambda calculus. For these languages tools such as compilers, partial evaluators, and other transformation systems often make use of rewriting systems that incorporate some form of beta reduction. For purposes of automatic rewriting it is important to develop extensions of beta-value reduction and to develop methods for guaranteeing termination. This paper describes an extension of beta-value reduction and a method based on abstract interpretation for controlling rewriting to guarantee termination. The main innovations are (1) the use of rearrangement rules in combination with beta-value reduction to increase the power of the rewriting system and (2) the definition of a non-standard interpretation of expressions, the generates relation, as a basis for designing terminating strategies for rewriting.

CS-TR-90-1337

Report Number: CS-TR-90-1340

Institution: Stanford University, Department of Computer Science

Title: Programming in QLisp

Author: Mason, Ian A.

Author: Pehoushek, Joseph D.

Author: Talcott, Carolyn L.

Author: Weening, Joseph S.

Date: October 1990

Abstract: Qlisp is an extension of Common Lisp, to support parallel programming. It was initially designed by John McCarthy and Richard Gabriel in 1984. Since then it has been under development both at Stanford University and Lucid, Inc. and has been implemented on several commercial shared-memory parallel computers. Qlisp is a queue-based, shared-memory, multi-processing language. This report is a tutorial introduction to the Stanford dialect of Qlisp.

CS-TR-90-1340

Report Number: CS-TR-90-1342

Institution: Stanford University, Department of Computer Science

Title: Modeling concurrency with geometry

Author: Pratt, Vaughan

Date: November 1990

Abstract: The phenomena of branching time and true or noninterleaving concurrency find their respective homes in automata and schedules. But these two models of computation are formally equivalent via Birkhoff duality, an equivalence we expound on here in tutorial detail. So why should these phenomena prefer one over the other? We identify dimension as the culprit: 1-dimensional automata are skeletons permitting only interleaving concurrency, whereas rrue n-fold concurrency resides in transitions of dimension n. The truly concurrent automaton dual to a schedule is not a skeletal distributive lattice but a solid one! We introduce true nondeterminism and define it as monoidal homotopy; from this perspective nondeterminism in ordinary automata arises from forking and joining creating nontrivial homotopy. The automaton dual to a poset schedule is simply connected whereas that dual to an event structure schedule need not be, according to monoidal homotopy though not to group homotopy. We conclude with a formal definition of higher dimensional automaton as an n-complex or n-category, whose two essential axioms are associativity of concatenation within dimension and an interchange principle between dimensions.

CS-TR-90-1342

Report Number: CS-TR-90-1343

Institution: Stanford University, Department of Computer Science

Title: Action logic and pure induction

Author: Pratt, Vaughan

Date: November 1990

Abstract: In Floyd-Hoare logic, programs are dynamic while assertions are static (hold at states). In action logic the two notions become one, with programs viewed as on-the-fly assertions whose truth is evaluated along intervals instead of at states. Action logic is an equational theory ACT conservatively extending the equational theory REG of regular expressions with operations preimplication a --> b (had a then b) and postimplication b <-- a (b if-ever a). Unlike REG, ACT is finitely based, makes $a^*$ reflexive transitive closure, and has an equivalent Hilbert system. The crucial axiom is that of pure induction, ${(a --> a)}^*$ = a --> a.

CS-TR-90-1343

Report Number: CS-TR-90-1344

Institution: Stanford University, Department of Computer Science

Title: ParaDiGM: a highly scalable shared-memory multi-computer architecture

Author: Cheriton, David R.

Author: Goosen, Hendrik A.

Author: Boyle, Patrick D.

Date: November 1990

Abstract: ParaDiGM is a highly scalable shared-memory multi-computer architecture. It is being developed to demonstrate the feasibility of building a relatively low-cost shared-memory parallel computer that scales to large configurations, and yet provides sequential programs with performance comparable to a high-end microprocessor. A key problem is building a scalable memory hierarchy. In this paper we describe the ParaDiGm architecture, highlighting the innovations of our approach and presenting results of our evaluation of the design. We envision that scalable shared-memory multiprocessors like ParaDiGM will soon become the dominant form of parallel processing, even for very large-scale computation, providing a uniform platform for parallel programming systems and applications.

CS-TR-90-1344

Report Number: CS-TR-90-1345

Institution: Stanford University, Department of Computer Science

Title: Nonholonomic motion planning versus controllability via the multibody car system example

Author: Laumond, Jean-Paul

Date: December 1990

Abstract: A multibody car system is a non-nilpotent, non-regular, triangularizable and well-controllable system. One goal of the current paper is to prove this obscure assertion. But its main goal is to explain and enlighten what it means. Motion planning is an already old and classical problem in Robotics. A few years ago a new instance of this problem has appeared in the literature: motion planning for nonholonomic systems. While useful tools in motion planning come from Computer Science and Mathematics (Computational Geometry, Real Algebraic Geometry), nonholonomic motion planning needs some Control Theory and more Mathematics (Differential Geometry). First of all, this paper tries to give a computational reading of the tools from Differential Geometric Control Theory required by planning. Then it shows that the presence of obstacles in the real world of a real robot challenges Mathematics with some difficult questions which are topological in nature, and have been solved only recently, within the framework of Sub-Riemannian Geometry. This presentation is based upon a reading of works recently developed by (1) Murray and Sastry, (2) Lafferiere and Sussmann, and (3) Bellaiche, Jacobs and Laumond.

CS-TR-90-1345

Report Number: CS-TR-91-1350

Institution: Stanford University, Department of Computer Science

Title: A programming and problem solving seminar.

Author: Chang, Edward

Author: Phillips, Steven J.

Author: Ullman, Jeffrey D.

Date: February 1991

Abstract: This report contains transcripts of the classroom discussions of Stanford's Computer Science problem solving course for Ph.D. students, CS304, during Winter quarter 1990, and the first CS204 class for undergraduates, in the Spring of 1990. The problems, and the solutions offered by the classes, span a large range of ideas in computer science. Since they constitute a study both of programming and research paradigms, and of the problem solving process, these notes may be of interest to students of computer science, as well as computer science educators. The present report is the ninth in a series of such transcripts, continuing the tradition established in STAN-CS-77-606 (Michael J. Clancy, 1977), STAN-CS-79-707 (Chris Van Wyk, 1979), STAN-CS-81-863 (Allan A. Miller, 1981), STAN-CS-83-989 (Joseph S. Weening, 1983), STAN-CS-83-990 (John D. Hobby, 1983), STAN-CS-85-1055 (Ramsey W. Haddad, 1985), STAN-CS-87-1154 (Tomas G. Rokicki, 1987), and STAN-CS-89-1269 (Kenneth A. Ross, 1989).

CS-TR-91-1350

Report Number: CS-TR-91-1351

Institution: Stanford University, Department of Computer Science

Title: Sequence vs. pipeline parallel multiple joins in Paradata

Author: Z hu, Liping

Author: Keller, Arthur M.

Author: Wiederhold, Gio

Date: February 1991

Abstract: In this report we analyze and compare hash-join based parallel multi-join algorithms for sequenced and pipelined processing. The BBN Butterfly machine serves as the host for the performance analysis. The sequenced algorithm handles the multiple join operations in a conventional sequenced manner, except that it distributes the work load of each operation among all processors. The pipelined algorithms handle the different join operations in parallel, by dividing the processors into several groups, with the data flowing through these groups. The detailed timing tests revealed the bus/memory contention that grows linearly with the number of processors. The existence of such a contention leads to an optimal region for the number of processors, given the join operands fixed. We present the analytical and experimental formulae for both algorithms, which incorporate this contention. We discuss the way of finding an optimal point, and give the heuristics for choosing the best processor's partition in pipelined processing. The study shows that the pipelined algorithms produce the first joined result sooner than the sequenced algorithm and need less memory to store the intermediate result. The sequenced algorithm, on the other hand, takes less time to finish the whole join operations.

CS-TR-91-1351

Report Number: CS-TR-91-1359

Institution: Stanford University, Department of Computer Science

Title: The benefits of relaxing punctuality

Author: Alur, Rajeev

Author: Feder, Tomas

Author: Henzinger, Thomas A.

Date: May 1991

Abstract: The most natural, compositional way of modeling real-time systems uses a dense domain for time. The satisfiability of real-time constraints that are capable of expressing punctuality in this model is, however, known to be undecidable. We introduce a temporal language that can constrain the time difference between events only with finite (yet arbitrary) precision and show the resulting logic to be EXPSPACE-complete. This result allows us to develop an algorithm for the verification of timing properties of real-time systems with a dense semantics.

CS-TR-91-1359

Report Number: CS-TR-91-1360

Institution: Stanford University, Department of Computer Science

Title: Sooner is safer than later.

Author: Henzinger, Thomas A.

Date: May 1991

Abstract: It has been repeatedly observed that the standard safety-liveness classification of properties of reactive systems does not fit for real-time properties. This is because the implicit "liveness" of time shifts the spectrum towards the safety side. While, for example, response--that "something good" will happen, eventually--is a classical liveness property, bounded response--that "something good" will happen soon, within a certain amount of time--has many characteristics of safety. We account for this phenomenon formally by defining safety and liveness relative to a given condition, such as the progress of time.

CS-TR-91-1360

Report Number: CS-TR-91-1369

Institution: Stanford University, Department of Computer Science

Title: Approximating matchings in parallel

Author: Fischer, Ted

Author: Goldberg, Andrew V.

Author: Plotkin, Serge

Date: June 1991

Abstract: We show that for any constant k > O, a matching with cardinality at least 1 - 1/(k+1) times the maximum can be computed in NC.

CS-TR-91-1369

Report Number: CS-TR-91-1370

Institution: Stanford University, Department of Computer Science

Title: An NQTHM mechanization of "An Exercise in the Verification of Multi-Process Programs"

Author: Nagayama, Misao

Author: Talcott, Carolyn

Date: June 1991

Abstract: This report presents a formal verification of the local correctness of a mutex algorithm using the Boyer-Moore theorem prover. The formalization follows closely an informal proof of Manna and Pnuelli. The proof method of Manna and Pnueli is to first extract from the program a set of states and induced transition system. One then proves suitable invariants. There are two variants of the proof. In the first (atomic) variant, compound tests involving quantification over a finite set are viewed as atomic operations. In the second (molecular) variant, this assumption is removed, making the details of the transitions and proof somewhat more complicated. The original Manna-Pnueli proof was formulated in terms of finite sets. This led to concise and elegant informal proof, however one that is not easy to mechanize in the Boyer-Moore logic. In the mechanized version we use a dual isomorphic representation of program states based on finite sequences. Our approach was to outline the formal proof of each invariant, making explicit the case analyses, assumptions and properties of operations used. The outline served as our guide in developing the formal proof. The resulting sequence of events follows the informal plan quite closely. The main difficulties encountered were in discovering the precise form of the lemmas and hints necess to guide the theorem prover.

CS-TR-91-1370

Report Number: CS-TR-91-1374

Institution: Stanford University, Department of Computer Science

Title: Polynomial dual network simplex algorithms

Author: Orlin, James B.

Author: Plotkin, Serge A.

Author: Tardos, Eva

Date: August 1991

Abstract: We show how to use polynomial and strongly polynomial capacity scaling algorithms for the transshipment problem to design a polynomial dual network simplex pivot rule. Our best pivoting strategy leads to an O(m2 log n) bound on the number of pivots, where n and m denotes the number of nodes and arcs in the input network. If the demands are integral and at most B, we also give an O(m(m + n log n) min(log nB, m log n))-time implementation of a strategy that requires somewhat more pivots.

CS-TR-91-1374

Report Number: CS-TR-91-1375

Institution: Stanford University, Department of Computer Science

Title: Fast approximation algorithms for multicommodity flow problems

Author: Leighton, Tom

Author: Makedon, Fillia

Author: Plotkin, Serge

Author: Stein, Clifford

Author: Tardos, Eva

Author: Tragoudas, Spyros

Date: August 1991

Abstract: In this paper, we describe the first polynomial-time combinatorial algorithms for approximately solving the multicommodity flow problem. Our algorithms are significantly faster than the best previously known algorithms, that were based on linear programming. For a k-commodity multicommodity flow problem, the running time of our randomized algorithm is (up to log factors) the same as the time needed to solve k single-commodity flow problems, thus giving the surprising result that approximately computing a k-commodity maximum-flow is not much harder than computing about k single-commodity maximum-flows in isolation. Given any multicommodity flow problem as input, our algorithm is guaranteed to provide a feasible solution to a modified flow problem in which all capacities are increased by a (1 + epsilon)-factor, or to provide a proof that there is no feasible solution to the original problem. We also describe faster approximation algorithms for multicommodity flow problems with a special structure, such as those that arise in the "sparsest cut" problems and the uniform concurrent flow problems if k <= the square root of m.

CS-TR-91-1375

Report Number: CS-TR-91-1377

Institution: Stanford University, Department of Computer Science

Title: An evaluation of left-lookikng, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical memory machines

Author: Rothberg, Edward

Author: Gupta, Anoop

Date: August 1991

Abstract: In this paper we present a comprehensive analysis of the performance of a variety of sparse Cholesky factorization methods on hierarchical-memory machines. We investigate methods that vary along two different axes. Along the first axis, we consider three different high-level approaches to sparse factorization: left-looking, right-looking, and multifrontal. Along the second axis, we consider the implementation of each of these high-level approaches using different sets of primitives. The primitives vary based on the structures they manipulate. One important structure in sparse Cholesky factorization is a single column of the matrix. We first consider primitives that manipulate single columns. These are the most commonly used primitives for expressing the sparse Cholesky computation. Another important structure is the supernode, a set of columns with identical non-zero structures. We consider sets of primitives that exploit the supemodal structure of the matrix to varying degrees. We find that primitives that manipulate larger structures greatly increase the amount of exploitable data reuse, thus leading to dramatically higher perfommance on hierarchical-memory machines. We observe performance increases of two to three times when comparing methods based on primitives that make extensive use of the supernodal structure to methods based on primitives that manipulate columns. We also find that the overall approach (left-looking, right-looking, or multifrontal) is less important for performance than the particular set of primitives used to implement the approach.

CS-TR-91-1377

Report Number: CS-TR-91-1381

Institution: Stanford University, Department of Computer Science

Title: Implementing hypertext database relationships through aggregations and exceptions

Author: Hara, Yoshinori

Author: Keller, Arthur M.

Author: Rathmann, Peter K.

Author: Wiederhold, Gio

Date: September 1991

Abstract: In order to combine hypertext with database facilities, we show how to extract an effective storage structure from given instance relationships. The schema of the structure recognizes clusters and exceptions. Extracting high-level structures is useful for providing a high performance browsing environment as well as efficient physical database design, especially when handling large amounts of data. This paper focuses on a clustering method, ACE, which generates aggregations and exceptions from the original graph structure in order to capture high level relationships. The problem of minimizing the cost function is NP-complete. We use a heuristic approach based on an extended Kernighan-Lin algorithm. We demonstrate our method on a hypertext application and on a standard random graph, compared with its analytical model. The storage reductions of input database size in main memory were 77.2% and 12.3%, respectively. It was also useful for secondary storage organization for efflcient retrieval.

CS-TR-91-1381

Report Number: CS-TR-91-1383

Institution: Stanford University, Department of Computer Science

Title: Temporal proof methodologies for real-time systems

Author: Henzinger, Thomas A.

Author: Manna, Z ohar

Author: Pnueli, Amir

Date: September 1991

Abstract: We extend the specification language of temporal logic, the corresponding verification framework, and the underlying computational model to deal with real-time properties of reactive systems. The abstract notion of timed transition systems generalizes traditional transition systems conservatively: qualitative fairness requirements are replaced (and superseded) by quantitative lower-bound and upper-bound timing constraints on transitions. This framework can model real-time systems that communicate either through shared variables or by message passing and real-time issues such as time-outs, process priorities (interrupts), and process scheduling. We exhibit two styles for the specification of real-time systems. While the first approach uses bounded versions of temporal operators, the second approach allows explicit references to time through a special clock variable. Corresponding to the two styles of specification, we present and compare two fundamentally different proof methodologies for the verification of timing requirements that are expressed in these styles. For the bounded-operatoT style, we provide a set of proof rules for establishing bounded-invariance and bounded-response properties of timed transition systems. This approach generalizes the standard temporal proof rules for verifying invariance and response properties conservatively. For the explicit-clock style, we exploit the observation that every time-bounded property is a safety property and use the standard temporal proof rules for establishing safety properties.

CS-TR-91-1383

Report Number: CS-TR-91-1387

Institution: Stanford University, Department of Computer Science

Title: Assembling polyhedra with single translations

Author: Wilson, Randall

Author: Schweikard, Achim

Date: October 1991

Abstract: The problem of partitioning an assembly of polyhedral objects into two subassemblies that can be separated arises in assembly planning. We describe an algorithm to compute the set of all translations separating two polyhedra with n vertices in O(n4) steps and show that this is optimal. Given an assembly of k polyhedra with a total of n vertices, an extension of this algorithm identifies a valid translation and removable subassembly in O(k2 n4) steps if one exists. Based on the second algorithm a polynomial time method for finding a complete assembly sequence consisting of single translations is derived. An implementation incorporates several changes to achieve better average-case performance; experimental results obtained for composite objects consisting of isothetic polyhedra are described.

CS-TR-91-1387

Report Number: CS-TR-91-1389

Institution: Stanford University, Department of Computer Science

Title: The AGENT0 manual

Author: Torrance, Mark C.

Author: Viola, Paul A.

Date: April 1991

Abstract: This document describes an implementation of AOP, an interpreter for programs written in a language called AGENTO. AGENTO is a first stab at a programming language for the paradigm of Agent-Oriented Programming. It is currently under development at Stanford under the direction of Yoav Shoham. This implementation is the work of Paul A. Viola of MIT and Mark C. Torrance of Stanford.

CS-TR-91-1389

Report Number: CS-TR-91-1391

Institution: Stanford University, Department of Computer Science

Title: A logic for perception and belief

Author: Shoham, Yoav

Author: del Val, Alvaro

Date: September 1991

Abstract: We present a modal logic for reasoning about perception and belief, captured respectively by the operators P and B. The B operator is the standard belief operator used in recent years, and the P operator is similarly defined. The contribution of the paper is twofold. First, in terms of P we provide a definition of perceptual indistinguishability, such as arises out of limited visual acuity. The definition is concise, intuitive (we find), and avoids traditional paradoxes. Second, we explore the bimodal B--P system. We argue that the relationship between the two modalities varies among settings: The agent may or may not have confidence in its perception, may or may not be accurate in it, and so on. We therefore define a number of agent types corresponding to these various assumptions, and for each such agent type we provide a sound and complete axiomatization of the B--P system.

CS-TR-91-1391

Report Number: CS-TR-91-1392

Institution: Stanford University, Department of Computer Science

Title: A classification of update methods for replicated databases

Author: Ceri, Stefano

Author: Houtsma, Maurice A. W.

Author: Keller, Arthur M.

Author: Samarati, Pierangela

Date: October 1991

Abstract: In this paper we present a classification of the methods for updating replicated databases. The main contribution of this paper is to present the various methods in the context of a structured taxonomy, which accommodates very heterogeneous methods. Classes of update methods are presented through their general properties, such as the invariants that hold for them. Methods are reviewed both in their normal and abnormal behaviour (e.g., after a network partition). We show that several methods presented in the literature, sometimes in independent papers with no cross-reference, are indeed very much related, for instance because they share the same basic technique. We also show in what sense they diverge from the basic technique. This classification can serve as a basis for choosing the method that is most suitable to a specific application. It can also be used as a guideline to researchers who aim at developing new mechanisms.

CS-TR-91-1392

Report Number: CS-TR-91-1394

Institution: Stanford University, Department of Computer Science

Title: Application-controlled physical memory using external page-cache management

Author: Harty, Kieran

Author: Cheriton, David R.

Date: October 1991

Abstract: Next generation computer systems will have gigabytes of physical memory and processors in the 100 MIPS range or higher. Contrary to some conjectures, this trend requires more sophisticated memory management support for memory-bound computations such as scientific simulations and systems such as large-scale database systems, even though memory management for most programs will be less of a concern. We describe the design, implementation and evaluation of a virtual memory system that provides application control of physical memory using external page-cache management. In this approach, a sophisticated application is able to monitor and control the amount of physical memory it has available for execution, the exact contents of this memory, and the scheduling and nature of page-in and page-out using the abstraction of a physical page cache provided by the kernel. We claim that this approach can significantly improve performance for many memory-bound applications while reducing kernel complexity, yet does not complicate other applications or reduce their performance.

CS-TR-91-1394

Report Number: CS-TR-92-1401

Institution: Stanford University, Department of Computer Science

Title: The performance impact of data reuse in parallel dense Cholesky factorization

Author: Rothberg, Edward

Author: Gupta, Anoop

Date: January 1992

Abstract: This paper explores performance issues for several prominent approaches to parallel dense Cholesky factorization. The primary focus is on issues that arise when blocking techniques are integrated into parallel factorization approaches to improve data reuse in the memory hierarchy. We first consider panel-oriented approaches, where sets of contiguous columns are manipulated as single units. These methods represent natural extensions of the column-oriented methods that have been widely used previously. On machines with memory hierarchies, panel-oriented methods significantly increase the achieved performance over column-oriented methods. However, we find that panel- oriented methods do not expose enough concurrency for problems that one might reasonably expect to solve on moderately parallel machines, thus significantly limiting their performance. We then explore block-oriented approaches, where square submatrices are manipulated instead of sets of columns. These methods greatly increase the amount of available concurrency, thus alleviating the problems encountered with panel-oriented methods. However, a number of issues, including scheduling choices and block- placement issues, complicate their implementation. We discuss these issues and consider approaches that solve the resulting problems. The resulting block-oriented implementation yields high processor utilization levels over a wide range of problem sizes.

CS-TR-92-1401

Report Number: CS-TR-92-1412

Institution: Stanford University, Department of Computer Science

Title: Toward agent programs with circuit semantics

Author: Nilsson, Nils J.

Date: January 1992

Abstract: New ideas are presented for computing and organizing actions for autonomous agents in dynamic environments - environments in which the agent's current situation cannot always be accurately discerned and in which the effects of actions cannot always be reliably predicted. The notion of "circuit semantics" for programs based on "teleo-reactive trees" is introduced. Program execution builds a combinational circuit which receives sensory inputs and controls actions. These formalisms embody a high degree of inherent conditionality and thus yield programs that are suitably reactive to their environments. At the same time, the actions computed by the programs are guided by the overall goals of the agent. The paper also speculates about how programs using these ideas could be automatically generated by artificial intelligence planning systems and adapted by learning methods.

CS-TR-92-1412

Report Number: CS-TR-92-1419

Institution: Stanford University, Department of Computer Science

Title: Fast approximation algorithms for fractional packing and covering problems

Author: Plotkin, Serge A.

Author: Shmoys, David B.

Author: Tardos, Eva

Date: November 1993

Abstract: This paper presents fast algorithms that find approximate solutions for a general class of problems, which we call fractional packing and covering problems. The only previously known algorithms for solving these problems are based on general linear programming techniques. The techniques developed in this paper greatly outperform the general methods in many applications, and are extensions of a method previously applied to find approximate solutions to multicommodity flow problems. Our algorithm is a Lagrangean relaxation technique; an important aspect of our results is that we obtain a theoretical analysis of the running time of a Lagrangean relaxation-based algorithm. We give several applications of our algorithms. The new approach yields several orders of magnitude of improvement over the best previously known running times for the scheduling of unrelated parallel machines in both the preemptive and the non-preemptive models, for the job shop problem, for the cutting-stock problem, and for the minimum cost multicommodity flow problem.

CS-TR-92-1419

Report Number: CS-TR-92-1423

Institution: Stanford University, Department of Computer Science

Title: Time-lapse snapshots

Author: Dwork, Cynthia

Author: Herlihy, Maurice

Author: Plotkin, Serge A.

Author: Waarts, Orli

Date: November 1993

Abstract: Abstract. A snapshot scan algorithm takes an "instantaneous" picture of a region of shared memory that may he updated by concurrent processes. Many complex shared memory algorithms can be greatly simplified by structuring them around the snapshot scan abstraction. Unforinnately, the substantial decrease in conceptual complity is quite often counterbalanced by an increase in computational complexity. In this paper, we introduce the notion of a weak snapshot scan, a slightly weaker primitive that has a more efficient implementation. We propose the following methodology for using this abstraction: first, design and verify an algorithm using the more powerful snapshot scan, and second, replace the more powerful but less efficient snapshot with the weaker but more efficient snapshot, and show that the weaker abstraction nevertheless suffices to ensure the correctness of the enclosing algorithm. We give two examples of algorithms whose performance can be enhanced while retaining a simple modular structure: bounded concurrent timestamping, and bounded randomized consensus. The resulting timestamping protocol is the fastest known bounded concurrent timestamping protocol. The resulting randomized consensus protocol matches the computational complexity of the best known protocol that uses only bouned values.

CS-TR-92-1423

Report Number: CS-TR-92-1426

Institution: Stanford University, Department of Computer Science

Title: Proceedings of the ACM SIGPLAN Workshop on Continuations CW92

Author: Danvy, Olivier (ed.)

Author: Talcott, Carolyn (ed.)

Date: November 1993

Abstract: The notion of continuation is ubiquitous in many different areas of computer science, including logic, constructive mathematics, programming languages, and programming. This workshop aims at providing a forum for discussion of: new results and work in progress; work aimed at a better understanding of the nature of continuations; applications of continuations, and the relation of continuations to other areas of logic and computer science. This technical report serves as informal proceedings for CW92. It consists of submitted manuscripts bound together according to the program order.

CS-TR-92-1426

Report Number: CS-TR-92-1431

Institution: Stanford University, Department of Computer Science

Title: Aggressive transmissions over redundant paths for time critical messages

Author: Kao, Ben

Author: Garcia-Molina, Hector

Author: Barbara, Daniel

Date: October 1993

Abstract: Fault tolerant computer systems have redundant paths connecting their components. Given these paths, it is possible to use aggressive techniques to reduce the average value and variability of the response time for critical messages. One technique is to send a copy of a packet over an alternate path before it is known if the first copy failed or was delayed. A second technique is to split a single stream of packets over multiple paths. We analize both approaches and show that they can provide significant improvements over conventional, conservative mechanisms.

CS-TR-92-1431

Report Number: CS-TR-92-1432

Institution: Stanford University, Department of Computer Science

Title: Overview of multidatabase transaction management

Author: Breitbart, Yuri

Author: Garcia-Molina, Hector

Author: Silberschatz, Avi

Date: October 1993

Abstract: A multidatabase system (MDBS) is a facility that allows users access to data located in multiple autonomous database management systems (DBMSs). In such a system, global transactions are executed under the control of the MDBS. Independently, local transactions are executed under the control of the local DBMSs. Each local DBMS integrated by the MDBS may employ a different transaction management scheme. In addition, each local DBMS has complete control over all transactions (global and local) executing at its site, including the ability to abort at any point any of the transactions executing at its site. Typically, no design or internal DBMS structure changes are allowed in order to accommodate the MDBS. Furthermore, the local DBMSs may not be aware of each other, and, as a consequence, cannot coordinate their actions. Thus, traditional techniques for ensuring transaction atomicity and consistency in homogeneous distributed database systems may not be appropriate for an MDBS environment. The objective of this paper is to provide a brief review of the most current work in the area of multidatabase transaction management. We first define the problem and argue that the multidatabase research will become increasingly important in the coming years. We then outline basic research issues in multidatabase transaction management and review recent results in the area. We conclude the paper with a discussion of open problems and practical implications of this research.

CS-TR-92-1432

Report Number: CS-TR-92-1435

Institution: Stanford University, Department of Computer Science

Title: Lecture notes on approximation algorithms: Volume I

Author: Motwani, Rajeev

Date: October 1993

Abstract: These lecture notes are based on the course CS351 (Dept. of Computer Science, Stanford University) offered during the academic year 1991-92. The notes below correspond to the first half of the course. The second half consists of topics such as AL4X SNP. cliques, and colorings, as well as more specialized material covering topics such as geometric problems, Steiner trees and multicommodity flows. The second half is being revised to incorporate the implications of recent results in approximation algorithms and the complexity of approximation problems. Please let me know if you would like to be on the mailing list for the second half. Comments, criticisms and corrections are welcome, please send them by electronic mail to rajeev@cs.Stanford.edu.

CS-TR-92-1435

Report Number: CS-TR-92-1441

Institution: Stanford University, Department of Computer Science

Title: Motion planning in stereotaxic radiosurgery

Author: Schweikard, Achim

Author: Adler, John R.

Author: Latombe, Jean-Claude

Date: September 1992

Abstract: Stereotaxic radiosurgery is a procedure which uses a beam of radiation as an ablative surgical instrument to destroy brain tumors. The beam is produced by a linear accelerator which is moved by a jointed mechanism. Radiation is concentrated by crossfiring at the tumor from multiple directions and the amount of energy deposited in normal brain tissues is reduced. Because access to the tumor is obstructed along some directions by critical regions (e.g., brainstem, optic nerves) and most tumors are not shaped like spheres, planning the path of the beam is often difficult and time-consuming. This paper describes a computer-based planner developed to assist the surgeon generate a satisfactory path, given the spatial distribution of the brain tissues obtained with medical imaging. Experimental results with the implemented planner are presented, including a comparison with manually generated paths. According to these results, automatic planning significantly improves energy deposition. It can also shorten the overall treatment, hence reducing the patient's pain and allowing the radiosurgery equipment to be used for more patients. Stereotaxic radiosurgery is an example of so-called "bloodless surgery". Computer-based planning techniques are expected to facilitate further development of this safer, less painful, and more cost effective type of surgery.

CS-TR-92-1441

Report Number: CS-TR-92-1446

Institution: Stanford University, Department of Computer Science

Title: Independent updates and incremental agreement in replicated databases

Author: Ceri, Stefano

Author: Houtsma, Maurice A. W.

Author: Keller, Arthur M.

Author: Samarati, Pierangela

Date: October 1992

Abstract: Update propagation and transaction atomicity are major obstacles to the development of replicated databases. Many practical applications, such as automated teller machine (ATM) networks, flight reservation, and part inventory control, do not really require these properties. In this paper we present an approach for incrementally updating a distributed, replicated database without requiring multi-site atomic commit protocols. We prove that the mechanism is correct, as it asymptotically performs all the updates on all the copies. Our approach has two important characteristics: it is progressive, and non-blocking. Progressive means that the transaction's coordinator always commits, possibly together with a group of other sites. The update is later propagated asynchronously to the remaining sites. Non-blocking means that each site can take unilateral decisions at each step of the algorithm. Sites which cannot commit updates are brought to the same final state by means of a reconciliation mechanism. This mechanism uses the history logs, which are stored locally at each site, to bring sites to agreement. It requires a small auxiliary data structure, called reception vector, to keep track of the time until which the other sites are guaranteed to be up-to-date. Several optimizations to the basic mechanism are also discussed.

CS-TR-92-1446

Report Number: CS-TR-92-1452

Institution: Stanford University, Department of Computer Science

Title: Deadline assignment in a distributed soft real-time system

Author: Kao, Ben

Author: Garcia-Molina, Hector

Date: October 1992

Abstract: In a distributed environment, tasks often have processing demands on multiple different sites. A distributed task is usually divided up into several subtasks, each one to be executed at some site in order. In a real-time system, an overall deadline is usually specified by an application designer indicating when a distributed task is to be finished. However, the problem of how a global deadline is automatically translated to the deadline of each individual subtask has not been well studied. This paper examines (through simulations) four strategies for subtask deadline assignment in a distributed soft real-time environment.

CS-TR-92-1452

Report Number: CS-TR-93-1491

Institution: Stanford University, Department of Computer Science

Title: Subtask Deadline Assignment for Complex Distributed Soft Real-Time Tasks

Author: Kao, Ben

Author: Garcia-Molina, Hector

Date: October 1993

Abstract: Complex distributed tasks often involve parallel execution of subtasks at different nodes. To meet the deadline of a global task, all of its parallel subtasks have to be finished on time. Comparing to a local task (which involves execution at only one node), a global task may have a much harder time making its deadline because it is fairly likely that at least one of its subtasks run into an overloaded node. Another problem with complex distributed tasks occurs when a global task consists of a number of serially executing subtasks. In this case, we have the problem of dividing up the end-to-end deadline of the global task and assigning them to the intermediate subtasks. In this paper, we study both of these problems. Different algorithms for assigning deadlines to subtasks are presented and evaluated.

CS-TR-93-1491

Report Number: CS-TR-93-1494

Institution: Stanford University, Department of Computer Science

Title: Index Structures for Information Filtering Under the Vector Space Model

Author: Yan, Tak W.

Author: Garcia-Molina, Hector

Date: December 1993

Abstract: With the ever increasing volumes of information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term interests), handled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The efficiency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select only the significant ones to index. We evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. We also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases.

CS-TR-93-1494

Report Number: CS-TR-93-1499

Institution: Stanford University, Department of Computer Science

Title: The Sandwich Theorem

Author: Knuth, Donald E.

Date: January 1994

Abstract: This report contains expository notes about a function vartheta(G) that is popularly known as the Lovasz number of a graph G. There are many ways to define vartheta(G), and the surprising variety of different characterizations indicates in itself that vartheta(G) should be interesting. But the most interesting property of vartheta(G) is probably the fact that it can be computed efficiently, although it lies "sandwiched" between other classic graph numbers whose computation is NP-hard. I have tried to make these notes self-contained so that they might serve as an elementary introduction to the growing literature on Lovasz's fascinating function.

CS-TR-93-1499

Report Number: CS-TR-94-1500

Institution: Stanford University, Department of Computer Science

Title: 1993 Publications Summary for the Stanford Database Group

Author: Siroker, Marianne

Date: January 1994

Abstract: This Technical Report contains the first page of papers written by members of the Stanford Database Group during 1993. Readers interested in the full papers can fetch electronic copies via FTP.

CS-TR-94-1500

Report Number: CS-TR-94-1501

Institution: Stanford University, Department of Comuputer Science

Title: Deriving Properties of Belief Update from Theories of Action

Author: Val, Alvaro del

Author: Shoham, Yoav

Date: February 1994

Abstract: We present an approach to database update as a form of non monotonic temporal reasoning, the main idea of which is the (circumscriptive) minimization of changes with respect to a set of facts declared ``persistent by default.'' The focus of the paper is on the relation between this approach and the update semantics recently proposed by Katsuno and Mendelzon. Our contribution in this regard is twofold: - We prove a representation theorem for KM semantics in terms of a restricted subfamily of the operators defined by our construction. - We show how the KM semantics can be generalized by relaxing our construction in a number of ways, each justified in certain intuitive circumstances and each corresponding to one specific postulate. It follows that there are reasonable update operators outside the KM family. Our approach is not dependent for its plausibility on this connection with KM semantics. Rather, it provides a relatively rich and flexible framework in which the frame and ramification problems can be solved in a systematic way by reasoning about default persistence of facts.

CS-TR-94-1501

Report Number: CS-TR-94-1502

Institution: Stanford University, Department of Computer Science

Title: Natural Language Parsing as Statistical Pattern Recognition

Author: Magerman, David M.

Date: February 1994

Abstract: Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. By basing the disambiguation criteria selection on entropy reduction rather than human intuition, this parser development method is able to consider more sentences than a human grammarian can when making individual disambiguation rules. In experiments, the decision tree parser significantly outperforms a grammarian's rule-based parser, achieving an accuracy rate of 78% compared to the rule-based parser's 69%.

CS-TR-94-1502

Report Number: CS-TR-94-1503

Institution: Stanford University, Department of Computer Science

Title: Deciding whether to plan to react

Author: Dabija, Vlad G.

Date: February 1994

Abstract: Intelligent agents that operate in real-world real-time environments have limited resources. An agent must take these limitations into account when deciding which of two control modes - planning versus reaction - should control its behavior in a given situation. The main goal of this thesis is to develop a framework that allows a resource-bounded agent to decide at planning time which control mode to adopt for anticipated possible run-time contingencies. Using our framework, the agent: (a) analyzes a complete (conditional) plan for achieving a particular goal; (b) decides which of the anticipated contingencies require and allow for preparation of reactive responses at planning time; and (c) enhances the plan with prepared reactions for critical contingencies, while maintaining the size of the plan, the planning and response times, and the use of all other critical resources of the agent within task-specific limits. For a given contingency, the decision to plan or react is based on the characteristics of the contingency, the associated reactive response, and the situation itself. Contingencies that may occur in the same situation compete for reactive response preparation because of the agent's limited resources. The thesis also proposes a knowledge representation formalism to facilitate the acquisition and maintenance of knowledge involved in this decision process. We also show how the proposed framework can be adapted for the problem of deciding, for a given contingency, whether to prepare a special branch in the conditional plan under development or to leave the contingency for opportunistic treatment at execution time. We make a theoretical analysis of the properties of our framework and then demonstrate them experimentally. We also show experimentally that this framework can simulate several different styles of human reactive behaviors described in the literature and, therefore, can be useful as a basis for describing and contrasting such behaviors. Finally we demonstrate that the framework can be applied in a challenging real domain. That is: (a) the knowledge and data needed for the decision making within our framework exist and can be acquired from experts, and (b) the behavior of an agent that uses our framework improves according to response time, reliability and resource utilization criteria.

CS-TR-94-1503

Report Number: CS-TR-94-1504

Institution: Stanford University, Department of Computer Science

Title: An Algebraic Approach to Rule Analysis in Expert Database Systems

Author: Baralis, Elena

Author: Widom, Jennifer

Date: February 1994

Abstract: Expert database systems extend the functionality of conventional database systems by providing a facility for creating and automatically executing Condition-Action rules. While Condition-Action rules in database systems are very powerful, they also can be very difficult to program, due to the unstructured and unpredictable nature of rule processing. We provide methods for static analysis of Condition-Action rules; our methods determine whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (has a guaranteed unique final state). Our methods are based on previous methods for analyzing rules in active database systems. We improve considerably on the previous methods by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not. Our improved analysis is based on a ``propagation'' algorithm, which uses a formal approach based on an extended relational algebra to accurately determine when the action of one rule can affect the condition of another. Our algebraic approach yields methods that are applicable to a broad class of expert database rule languages.

CS-TR-94-1504

Report Number: CS-TR-94-1505

Institution: Stanford University, Department of Computer Science

Title: Using a Position History-Based Protocol for Distributed Object Visualization

Author: Singhal, Sandeep K.

Author: Cheriton, David R.

Date: February 1994

Abstract: Users of distributed virtual reality applications interact with users located across the network. Similarly, distributed object visualization systems store dynamic data at one host and render it in real-time at other hosts. Because data in both systems is animated and exhibits unpredictable behavior, providing up-to-date information about remote objects is expensive. Remote hosts must instead apply extrapolation between successive update packets to render the object's true animated behavior. This paper describes and analyzes a ``position history-based'' protocol in which hosts apply several recent position updates to track the position of remote objects. The history-based approach offers smooth, accurate visualizations of remote objects while providing a scalable solution.

CS-TR-94-1505

Report Number: CS-TR-94-1506

Institution: Stanford University, Department of Computer Science

Title: Optimized Memory-Based Messaging: Leveraging the Memory System for High-Performance Communication

Author: Cheriton, David R.

Author: Kutter, Robert A.

Date: February 1994

Abstract: Memory-based messaging, passing messages between programs using shared memory, is a recognized technique for efficient communication that takes advantage of memory system performance. However, the conventional operating system support for this approach is inefficient, especially for large-scale multiprocessor interconnects, and is too complex to effectively support in hardware. This paper describes hardware and software optimizations for memory-based messaging that efficiently exploit the mechanisms of the memory system to provide superior communication performance. We describe the overall model of optimized memory-based messaging, its implementation in an operating system kernel and hardware support for this approach in a scalable multiprocessor architecture. The optimizations include address-valued signals, message-oriented memory consistency and automatic signaling on write. Performance evaluations show these extensions provide a three-to-five-fold improvement in communication performance over a comparable software-only implementation.

CS-TR-94-1506

Report Number: CS-TR-94-1507

Institution: Stanford University, Department of Computer Science

Title: Bibliography Department of Computer Science Technical Reports, 1963-1993

Author: Mashack, Thea

Date: March 1994

Abstract: This Bibliography lists all the reports published by the Department of Computer Science from 1963 through 1993

CS-TR-94-1507

Report Number: CS-TR-94-1508

Institution: Stanford University, Department of Computer Science

Title: Inverse Kinematics of a Human Arm

Author: Kondo, Koichi

Date: March 1994

Abstract: This paper describes a new inverse kinematics algorithm for a human arm. Potential applications of this algorithm include computer-aided design and concurrent engineering from the viewpoint of human factors. For example, it may be used to evaluate a new design in terms of its usability and to automatically generate instruction videos. The inverse kinematics algorithm is based on a sensorimotor transformation model developed in recent neurophysiological experiments. This method can be applied to both static arm postures and human manipulation motions.

CS-TR-94-1508

Report Number: CS-TR-94-1509

Institution: Stanford University, Department of Computer Science

Title: Global Price Updates Help

Author: Goldberg, Andrew V.

Author: Kennedy, Robert

Date: March 1994

Abstract: Periodic global updates of dual variables have been shown to yield a substantial speed advantage in implementations of push-relabel algorithms for the maximum flow and minimum cost flow problems. In this paper, we show that in the context of the bipartite matching and assignment problems, global updates yield a theoretical improvement as well. For bipartite matching, a push-relabel algorithm that matches the best bound when global updates are used achieves a bound that is worse by a square root of n factor without the updates. A similar result holds for the assignment problem.

CS-TR-94-1509

Report Number: CS-TR-94-1510

Institution: Stanford University, Department of Computer Science

Title: Key Objects in Garbage Collection

Author: Hayes, Barry

Date: March 1994

Abstract: When the cost of global garbage collection in a system grows large, the system can be redesigned to use generational collection. The newly-created objects usually have a much shorter half-life than average, and by concentrating the collector's efforts on them a large fraction of the garbage can be collected at a tiny fraction of the cost. The objects that survive generational collection may still become garbage, and the current practice is to perform occasional global garbage collections to purge these objects from the system, and again, the cost of doing these collections may become prohibitive when the volume of memory increases. Previous research has noted that the objects that survive generational collection often are born, promoted, and collected in large clusters. In this dissertation I show that carefully selected semantically or structurally important key objects can be drawn from the clusters and collected separately; when a key object becomes unreachable, the collector can take this as a hint to collect the cluster from which the key was drawn. To gauge the effectiveness of key objects, their use was simulated in ParcPlace's Objectworks\Smalltalk system. The objects selected as keys were those that, as young objects, had pointers to them stored into old objects. The collector attempts to create a cluster for each key by gathering together all of the objects reachable from that key and >From no previous key. Using this simple heuristic for key objects, the collector finds between 41% and 92% of the clustered garbage in a suite of simple test programs. Except for one program in the suite, about 95% of the time these key objects direct the collector to a cluster that is garbage. The exception should be heeded in improving the heuristics. In a replay of an interactive session, key object collection finds 59% of the clustered garbage and 66% of suggested targets are indeed garbage.

CS-TR-94-1510

Report Number: CS-TR-94-1511

Institution: Stanford University, Department of Computer Science

Title: Co-Learning and the Evolution of Social Acitivity

Author: Shoham, Yoav

Author: Tennenholtz, Moshe

Date: April 1994

Abstract: We introduce the notion of co-learning, which refers to a process in which several agents simultaneously try to adapt to one another's behavior so as to produce desirable global system properties. Of particular interest are two specific co-learning settings, which relate to the emergence of conventions and the evolution of cooperation in societies, respectively. We define a basic co-learning rule, called Highest Cumulative Reward (HCR), and show that it gives rise to quite nontrivial system dynamics. In general, we are interested in the eventual convergence of the co-learning system to desirable states, as well as in the efficiency with which this convergence is attained. Our results on eventual convergence are analytic; the results on efficiency properties include analytic lower bounds as well as empirical upper bounds derived from rigorous computer simulations.

CS-TR-94-1511

Report Number: CS-TR-94-1512

Institution: Stanford University, Department of Computer Science

Title: Abstraction Planning in Real Time

Author: Washington, Richard

Date: April 1994

Abstract: When a planning agent works in a complex, real-world domain, it is unable to plan for and store all possible contingencies and problem situations ahead of time. The agent needs to be able to fall back on an ability to construct plans at run time under time constraints. This thesis presents a method for planning at run time that incrementally builds up plans at multiple levels of abstraction. The plans are continually updated by information from the world, allowing the planner to adjust its plan to a changing world during the planning process. All the information is represented over intervals of time, allowing the planner to reason about durations, deadlines, and delays within its plan. In addition to the method, the thesis presents a formal model of the planning process and uses the model to investigate planning strategies. The method has been implemented, and experiments have been run to validate the overall approach and the theoretical model.

CS-TR-94-1512

Report Number: CS-TR-94-1513

Institution: Stanford University, Department of Computer Science

Title: Construction of Normative Decision Models Using Abstract Graph Grammars

Author: Egar, John W.

Date: May 1994

Abstract: This dissertation addresses automated assistance for decision analysis in medicine. In particular, I have investigated graph grammars as a representation for encoding how decision-theoretic models can be constructed from an unordered list of concerns. The modeling system that I have used requires a standard vocabulary to generate decision models; the models generated are qualitative, and require subsequent assessment of probabilities and utility values. This research has focused on the modeling of the qualitative structure of problems given a standard vocabulary and given that subsequent assessment of probabilities and utilities is possible. The usefulness of the graph-grammar representation depends on the graph-grammar formalism's ability to describe a broad spectrum of qualitative decision models, on its ability to maintain a high quality in the models it generates, and on its clarity in describing topological constraints to researchers who design and maintain the actual grammar. I have found that graph grammars can be used to generate automatically decision models that are comparable to those produced by decision analysts.

CS-TR-94-1513

Report Number: CS-TR-94-1514

Institution: Stanford University, Department of Computer Science

Title: Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads

Author: Hailperin, Max

Date: May 1994

Abstract: This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability.

CS-TR-94-1514

Report Number: CS-TR-94-1515

Institution: Stanford University, Department of Computer Science

Title: Retrieving Semantically Distant Analogies

Author: Wolverton, Michael

Date: June 1994

Abstract: Techniques that have traditionally been useful for retrieving same-domain analogies from small single-use knowledge bases, such as spreading activation and indexing on selected features, are inadequate for retrieving cross-domain analogies from large multi-use knowledge bases. Blind or near-blind search techniques like spreading activation will be overwhelmed by combinatorial explosion as the search goes deeper into the KB. And indexing a large multi-use KB on salient features is impractical, largely because a feature that may be useful for retrieval in one task may be useless for another task. This thesis describes Knowledge-Directed Spreading Activation (KDSA), a method for retrieving analogies in a large semantic network. KDSA uses task-specific knowledge to guide a spreading activation search to a case or concept in memory that meets a desired similarity condition. The thesis also describes a specific instantiation of this method for the task of innovative design. KDSA has been validated in two ways. First, a theoretical model of knowledge base search demonstrates that KDSA is tractable for retrieving semantically distant analogies under a wide range of knowledge base configurations. Second, an implemented system that uses KDSA to find analogies for innovative design shows that the method is able to retrieve semantically distant analogies for a real task. Experiments with that system show trends as the knowledge base size grows that suggest the theoretical model's prediction of large knowledge base tractability is accurate.

CS-TR-94-1515

Report Number: CS-TR-94-1516

Institution: Stanford University, Department of Computer Science

Title: A Framework for Reasoning Precisely with Vague Concepts

Author: Goyal, Nita

Date: May 1994

Abstract: Many knowledge-based systems need to represent vague concepts such as ``old'' and ``tall''. The practical approach of representing vague concepts as precise intervals over numbers (e.g., ``old'' as the interval [70,110]) is well-accepted in Artificial Intelligence. However, there have been no systematic procedures, but only ad hoc methods to delimit the boundaries of intervals representing the vague predicates. A key observation is that the vague concepts and their interval boundaries are constrained by the underlying domain knowledge. Therefore, any systematic approach to assigning interval boundaries must take the domain knowledge into account. Hence, in the dissertation, we present a framework to represent the domain knowledge and exploit it to reason about the interval boundaries via a query language. This framework is comprised of a constraint language to represent logical constraints on vague concepts, as well as numerical constraints on the interval boundaries; a query language to request information about the interval boundaries; and an algorithm to answer the queries. The algorithm preprocesses the constraints by extracting the numerical information from the logical constraints and combines them with the given numerical constraints. We have implemented the framework and applied it to medical domain to illustrate its usefulness.

CS-TR-94-1516

Report Number: CS-TR-94-1517

Institution: Stanford University, Department of Computer Science

Title: Reactive, Generative and Stratified Models of Probabilistic Processes

Author: Glabbeek, Rob J. van

Author: Smolka, Scott A.

Author: Steffen, Bernhard

Date: July 1994

Abstract: We introduce three models of probabilistic processes, namely, reactive, generative and stratified. These models are investigated within the context of PCCS, an extension of Milner's SCCS in which each summand of a process summation expression is guarded by a probability and the sum of these probabilities is 1. For each model we present a structural operational semantics of PCCS and a notion of bisimulation equivalence which we prove to be a congruence. We also show that the models form a hierarchy: the reactive model is derivable from the generative model by abstraction from the relative probabilities of different actions, and the generative model is derivable from the stratified model by abstraction from the purely probabilistic branching structure. Moreover the classical nonprobabilistic model is derivable from each of these models by abstraction from all probabilities.

CS-TR-94-1517

Report Number: CS-TR-94-1518

Institution: Stanford University, Department of Computer Science

Title: STeP: The Stanford Temporal Prover

Author: Manna, Z ohar

Author: Anuchitanukul, Anuchit

Author: Bjorner, Nikolaj

Author: Browne, Anca

Author: Chang, Edward

Author: Colon, Michael

Author: de Alfaro, Luca

Author: Devarajan, Harish

Author: Sipma, Henny

Author: Uribe, Tomas

Date: June 1994

Abstract: We describe the Stanford Temporal Prover (STeP), a system being developed to support the computer-aided formal verification of concurrent and reactive systems based on temporal specifications. Unlike systems based on model-checking, STeP is not restricted to finite-state systems. It combines model checking and deductive methods to allow the verification of a broad class of systems, including programs with infinite data domains, N-process programs, and N-component circuit designs, for arbitrary N. In short, STeP has been designed with the objective of combining the expressiveness of deductive methods with the simplicity of model checking. The verification process is for the most part automatic. User interaction occurs mostly at the highest, most intuitive level, primarily through a graphical proof language of verification diagrams. Efficient simplification methods, decision procedures, and invariant generation techniques are then invoked automatically to prove resulting first-order verification conditions with minimal assistance. We describe the performance of the system when applied to several examples, including the N-process dining philosopher's program, Szymanski's N-process mutual exclusion algorithm, and a distributed N-way arbiter circuit.

CS-TR-94-1518

Report Number: CS-TR-94-1519

Institution: Stanford University, Department of Computer Science

Title: Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces

Author: Kavraki, Lydia

Author: Svestka, Petr

Author: Latombe, Jean-Claude

Author: Overmars, Mark

Date: August 1994

Abstract: A new motion planning method for robots in static workspaces is presented. This method proceeds according to two phases: a learning phase and a query phase. In the learning phase, a probabilistic roadmap is constructed and stored as a graph whose nodes correspond to collision-free configurations and edges to feasible paths between these configurations. These paths are computed using a simple and fast local planner. In the query phase, any given start and goal configurations of the robot are connected to two nodes of the roadmap; the roadmap is then searched for a path joining these two nodes. The method is general and easy to implement. It can be applied to virtually any type of holonomic robot. It requires selecting certain parameters (e.g., the duration of the learning phase) whose values depend on the considered scenes, that is the robots and their workspaces. But these values turn out to be relatively easy to choose. Increased efficiency can also be achieved by tailoring some components of the method (e.g., the local planner) to the considered robots. In this paper the method is applied to planar articulated robots with many degrees of freedom. Experimental results show that path planning can be done in a fraction of a second on a contemporary workstation (approximately 150 MIPS), after learning for relatively short periods of time (a few dozen seconds).

CS-TR-94-1519

Report Number: CS-TR-94-1520

Institution: Stanford University, Department of Computer Science

Title: Adaptive Optimization for SELF: Reconciling High Performance with Exploratory Programming

Author: Holzle, Urs

Date: August 1994

Abstract: Crossing abstraction boundaries often incurs a substantial run-time overhead in the form of frequent procedure calls. Thus, pervasive use of abstraction, while desirable from a design standpoint, may lead to very inefficient programs. Aggressively optimizing compilers can reduce this overhead but conflict with interactive programming environments because they introduce long compilation pauses and often preclude source-level debugging. Thus, programmers are caught on the horns of two dilemmas: they have to choose between abstraction and efficiency, and between responsive programming environments and efficiency. This dissertation shows how to reconcile these seemingly contradictory goals. Four new techniques work together to achieve this: - Type feedback achieves high performance by allowing the compiler to inline message sends based on information extracted from the runtime system. - Adaptive optimization achieves high responsiveness without sacrificing performance by using a fast compiler to generate initial code while automatically recompiling heavily used program parts with an optimizing compiler. - Dynamic deoptimization allows source-level debugging of optimized code by transparently recreating non-optimized code as needed. - Polymorphic inline caching speeds up message dispatch and, more significantly, collects concrete type information for the compiler. With better performance yet good interactive behavior, these techniques reconcile exploratory programming, ubiquitous abstraction, and high performance.

CS-TR-94-1520

Report Number: CS-TR-94-1521

Institution: Stanford University, Department of Computer Science

Title: Chu Spaces : A Model for Concurrency

Author: Gupta, Vineet

Date: August 1994

Abstract: A Chu space is a binary relation between two sets. In this thesis we show that Chu spaces form a non-interleaving model of concurrency which extends event structures while endowing them with an algebraic structure whose natural logic is linear logic. We provide several equivalent definitions of Chu spaces, including two pictorial representations. Chu spaces represent processes as automata or schedules, and Chu duality gives a simple way of converting between schedules and automata. We show that Chu spaces can represent various concurrency concepts like conflict, temporal precedence and internal and external choice, and they distinguish between causing and enabling events. We present a process algebra for Chu spaces including the standard combinators like parallel composition, sequential composition, choice, interaction, restriction, and show that the various operational identities between these hold for Chu spaces. The solution of recursive domain equations is possible for most of these operations, giving us an expressive specification and programming language. We define a history preserving equivalence between Chu spaces, and show that it preserves the causal structure of a process.

CS-TR-94-1521

Report Number: CS-TR-94-1522

Institution: Stanford University, Department of Computer Science

Title: Compositional Verification of Reactive and Real-time Systems

Author: Chang, Edward

Date: December 1993

Abstract: This thesis presents a compositional methodology for the verification of reactive and real-time systems. The correctness of a given system is established from the correctness of the system's components, each of which may be treated as a system itself and further reduced. When no further reduction is possible or desirable, global techniques for verification may be used to verify the bottom-level components. Transition modules are introduced as a suitable compositional model of computation. Various composition operations are defined on transition modules, including parallel composition, sequential composition, and iteration. A restricted assumption-guarantee style of specification is advocated, wherein the environment assumption is stated as a restriction on the environment's next-state relation. Compositional proof rules are provided in accordance with the safety-progress hierarchy of temporal properties. The compositional framework is then extended naturally to real-time transition modules and discrete-time metric temporal logic.

CS-TR-94-1522

Report Number: CS-TR-94-1523

Institution: Stanford University, Department of Computer Science

Title: On Implementing Push-Relabel Method for the Maximum Flow Problem

Author: Cherkassky, Boris V.

Author: Goldberg, Andrew V.

Date: September 1994

Abstract: We study efficient implementations of the push-relabel method for the maximum flow problem. The resulting codes are faster than the previous codes, and much faster on some problem families. The speedup is due to the combination of heuristics used in our implementation. We also exhibit a family of problems for which all known methods seem to have almost quadratic time growth rate.

CS-TR-94-1523

Report Number: CS-TR-94-1524

Institution: Stanford University, Department of Computer Science

Title: Continuous Verification by Discrete Reasoning

Author: de Alfaro, Luca

Author: Manna, Z ohar

Date: September 1994

Abstract: Two semantics are commonly used for the behavior of real-time and hybrid systems: a discrete semantics, in which the temporal evolution is represented as a sequence of snapshots describing the state of the system at certain times, and a continuous semantics, in which the temporal evolution is represented by a series of time intervals, and therefore corresponds more closely to the physical reality. Powerful verification rules are known for temporal logic formulas based on the discrete semantics. This paper shows how to transfer the verification techniques of the discrete semantics to the continuous one. We show that if a temporal logic formula has the property of finite variability, its validity in the discrete semantics implies its validity in the continuous one. This leads to a verification method based on three components: verification rules for the discrete semantics, axioms about time, and some temporal reasoning to bring the results together. This approach enables the verification of properties of real-time and hybrid systems with respect to the continuous semantics.

CS-TR-94-1524

Report Number: CS-TR-94-1525

Institution: Stanford University, Department of Computer Science

Title: Differential BDDs

Author: Anuchitanukul, Anuchit

Author: Manna, Z ohar

Date: September 1994

Abstract: In this paper, we introduce a class of Binary Decision Diagrams (BDDs) which we call Differential BDDs (DBDDs), and two transformations over DBDDs, called Push-up and Delta transformations. In DBDDs and its derived classes such as Push-up DBDDs or Delta DBDDs, in addition to the ordinary node-sharing in the normal Ordered Binary Decision Diagrams (OBDDs), some isomorphic substructures are collapsed together forming an even more compact representation of boolean functions. The elimination of isomorphic substructures coincides with the repetitive occurrences of the same or similar small components in many applications of BDDs such as in the representation of hardware circuits. The reduction in the number of nodes, from OBDDs to DBDDs, is potentially exponential while boolean manipulations on DBDDs remain efficient.

CS-TR-94-1525

Report Number: CS-TR-94-1526

Institution: Stanford University, Department of Computer Science

Title: Combining Experiential and Theoretical Knowledge in the Domain of Semiconductor Manufacturing

Author: Mohammed, John Llewelyn

Date: September 1994

Abstract: Semiconductor Manufacturing is characterized by complexity and continual, rapid change. These characteristics reduce the effectiveness of traditional diagnostic expert systems: the knowledge represented cannot adapt to changes in the manufacturing plan because the dependence of the knowledge on the plan is not explicitly represented. It is impractical to manually encode all the dependencies in a complex plan. We address this problem in two ways. First, we employ model-based techniques to encode theoretical knowledge, so that symbolic simulation of a new manufacturing plan can automatically glean diagnostic information. Our representation is sufficiently detailed to capture the plan's inherent causal dependencies, yet sufficiently abstract to make symbolic simulation practical. This theoretical knowledge can adapt to changes in the manufacturing plan. However, the expressiveness and tractability of our representational machinery limit the range of phenomena that we can represent. Second, we describe Generic Rules, which combine the expressiveness of heuristic rules with the robustness of theoretical models. Generic Rules are general patterns for heuristic rules, associated with model-based restrictions on the situations in which the patterns can be instantiated to form rules for new contexts. In this way, theoretical knowledge is employed to encode the dependence of heuristic knowledge on the manufacturing plan.

CS-TR-94-1526

Report Number: CS-TR-94-1527

Institution: Stanford University, Department of Computer Science

Title: From Knowledge to Belief

Author: Koller, Daphne

Date: October 1994

Abstract: When acting in the real world, an intelligent agent must make decisions under uncertainty. The standard solution requires it to assign degrees of belief to the relevant assertions. These should be based on the agent's knowledge. For example, a doctor deciding on the treatment for a patient should use information about that patient, statistical correlations between symptoms and diseases, default rules, and more. The random-worlds method induces degrees of belief from very rich knowledge bases, expressed in a language that augments first-order logic with statistical statements and default rules (interpreted as qualitative statistics). The method is based on the principle of indifference, treating all possible worlds as equally likely. It naturally derives important patterns of reasoning such as specificity, inheritance, indifference to irrelevant information, and a default assumption of independence. Its expressive power and intuitive semantics allow it to deal well with examples that are too complex for most other reasoning systems. We use techniques from finite model theory to analyze the computational aspects of random worlds. The problem of computing degrees of belief is undecidable in general. However, for unary knowledge bases, a tight connection to the principle of maximum entropy often allows us to compute degrees of belief.

CS-TR-94-1527

Report Number: CS-TR-94-1528

Institution: Stanford University, Department of Computer Science

Title: Architecture-Altering Operations for Evolving the Architecture of a Multi-Part Program in Genetic Programming

Author: Koza, John R.

Date: October 1994

Abstract: Previous work described a way to evolutionarily select the architecture of a multi-part computer program >From among preexisting alternatives in the population while concurrently solving a problem during a run of genetic programming. This report describes six new architecture-altering operations that provide a way to evolve the architecture of a multi-part program in the sense of actually changing the architecture of programs dynamically during the run. The new architecture-altering operations are motivated by the naturally occurring operation of gene duplication as described in Susumu Ohno's provocative 1970 book Evolution by Means of Gene Duplication as well as the naturally occurring operation of gene deletion. The six new architecture-altering operations are branch duplication, argument duplication, branch creation, argument creation, branch deletion and argument deletion. A connection is made between genetic programming and other techniques of automated problem solving by interpreting the architecture-altering operations as providing an automated way to specialize and generalize programs. The report demonstrates that a hierarchical architecture can be evolved to solve an illustrative symbolic regression problem using the architecture- altering operations. Future work will study the amount of additional computational effort required to employ the architecture-altering operations.

CS-TR-94-1528

Report Number: CS-TR-94-1529

Institution: Stanford University, Department of Computer Science

Title: A knowledge-based method for temporal abstraction of clinical data

Author: Shahar, Yuval

Date: October 1994

Abstract: This dissertation describes a domain-independent method specific to the task of abstracting higher-level concepts from time-stamped data. The framework includes a model of time, parameters, events and contexts. I applied my framework to several domains of medicine. My goal is to create, from time-stamped patient data, interval-based temporal abstractions such as "severe anemia for 3 weeks in the context of administering AZ T." The knowledge-based temporal-abstraction method decomposes the task of abstracting higher-level abstractions from input data into five subtasks. These subtasks are solved by five domain-independent temporal-abstraction mechanisms. The temporal-abstraction mechanisms depend on four domain-specific knowledge types. I implemented the knowledge-based temporal-abstraction method in the RESUME system. RESUME accepts input and returns output at all levels of abstraction; accepts input out of temporal order, modifying a view of the past or of the present, as necessary; generates context-sensitive, controlled output; and maintains several possible concurrent interpretations of the data. I evaluated RESUME in the domains of protocol-based care, monitoring of children's growth, and therapy of diabetes. A formal specification of a domain's temporal-abstraction knowledge supports acquisition, maintenance, reuse, and sharing of that knowledge.

CS-TR-94-1529

Report Number: CS-TR-94-1530

Institution: Stanford University, Department of Computer Science

Title: On Computing Multi-Arm Manipulation Trajectories

Author: Koga, Yoshihito

Date: October 1994

Abstract: This dissertation considers the manipulation task planning problem of automatically generating the trajectories for several cooperating robot arms to manipulate a movable object to a goal location among obstacles. The planner must reason that the robots may need to change their grasp of the object to complete the task, for example, by passing it from one arm to another. Furthermore, the computed velocities and accelerations of the arms must satisfy the limits of the actuators. Past work strongly suggests that solving this problem in a rigorous fashion is intractable. We address this problem in a practical two-phase approach. In step one, using a heuristic we compute a collision-free path for the robots and the movable object. For the case of multiple robot arms with many degrees of freedom, this step may fail to find the desired path even though it exists. Despite this limitation, experimental results of the implemented planner (for solving step one) show that it is efficient and reliable; for example, the planner is able to find complex manipulation motions for a system with seventy eight degrees of freedom. In step two, we then find the time-parameterization of the path such that the dynamic constraints on the robot are satisfied. In fact, we find the time-optimal solution for the given path. We show simulation results for various complex examples.

CS-TR-94-1530

Report Number: CS-TR-94-1531

Institution: Stanford University, Department of Computer Science

Title: On-Line Manipulation Planning for Two Robot Arms in a Dynamic Environment

Author: Li, Tsai-Yen

Author: Latombe, Jean-Claude

Date: December 1994

Abstract: In a constantly changing and partially unpredictable environment, robot motion planning must be on-line. The planner receives a continuous flow of information about occurring events and generates new plans, while previously planned motions are being executed. This paper describes an on-line planner for two cooperating arms whose task is to grab parts of various types on a conveyor belt and transfer them to their respective goals while avoiding collision with obstacles. Parts arrive on the belt in random order, at any time. Both goals and obstacles may be dynamically changed. This scenario is typical of manufacturing cells serving machine-tools, assembling products, or packaging objects. The proposed approach breaks the overall planning problem into subproblems, each involving a low-dimensional configuration or configuration-time space, and orchestrates very fast primitives solving these subproblems. The resulting planner has been implemented and extensively tested in a simulated environment, as well as with a real dual-arm system. Its competitiveness has been evaluated against an oracle making (almost) the best decision at any one time; the results show that the planner compares extremely well.

CS-TR-94-1531

Report Number: CS-TR-94-1532

Institution: Stanford University, Department of Computer Science

Title: Planning the Collision-Free Paths of an Actively Flexible Manipulator

Author: Banon, Jose

Date: December 1994

Abstract: Most robot manipulators consist of a small sequence of rigid links connected by articulated joints. However, robot dexterity is considerably enhanced when the number of joints is large or infinite. Additional joints make it possible to manipulate objects in cluttered environments where non-redundant robots are useless. In this paper we consider a simulated actively flexible manipulator (AFM), i.e. a manipulator whose flexibility can be directly controlled by its actuators. We propose an efficient method for planning the collision-free paths of an AFM in a three-dimensional workspace. We implemented this method on a graphic workstation and experimented with it on several examples.

CS-TR-94-1532

Report Number: CS-TR-94-1533

Institution: Stanford University, Department of Computer Science

Title: Randomized Query Processing in Robot Motion Planning

Author: Raghavan, L. Kavraki, J-C. Latombe, R. Motwani, P.

Date: December 1994

Abstract: The subject of this paper is the analysis of a randomized preprocessing scheme that has been used for query processing in robot motion planning. The attractiveness of the scheme stems from its general applicability to virtually any motion-planning problem, and its empirically observed success. In this paper we initiate a theoretical basis for explaining this empirical success. Under a simple assumption about the configuration space, we show that it is possible to perform a preprocessing step following which queries can be answered quickly. En route, we pose and give solutions to related problems on graph connectivity in the evasiveness model, and art-gallery theorems.

CS-TR-94-1533

Report Number: CS-TR-95-1534

Institution: Stanford University, Department of Computer Science

Title: Partial Information Based Integrity Constraint Checking

Author: Gupta, Ashish

Date: January 1995

Abstract: Integrity constraints are useful for specifying consistent states of a database, especially in distributed database systems where data may be under the control of multiple database managers. Constraints need to be checked when the underlying database is updated. Integrity constraint checking in a distributed environment may involve a distributed transaction and the expenses associated with it: two phase commit protocols, distributed concurrency control, network communication costs, and multiple interface layers if the databases are heterogeneous. The information used for constraint checking may include the contents of base relations, constraint specifications, updates to the databases, schema restrictions, stored aggregates etc. We propose using only a subset of the information potentially available for constraint checking. Thus, only data that is local to a site may be used for constraint checking thus avoiding distributed transactions. The approach is useful also in centralized systems because relatively inexpensively accessible subsets may be used for constraint checking. We discuss constraint checking for the following three subsets of the afore mentioned information. 1. Constraint Subsumption: How to check one constraint C using a set of other constraint specifications {C0,...,Cn} and no data, and the knowledge that the constraints in set {C0,...,Cn} hold in the database? 2. Irrelevant Updates. How to check a constraint C using the database update, a set of other constraints {C0,...,Cn, and the knowledge that the constraints {C,C0,...,Cn} all hold before the update? 3. Local Checking. How to check a constraint C using the database update, the contents of the updated relation, a set of other constraints {C0,...,Cn}, and the knowledge that the constraints {C,C0,...,Cn} all hold before the update? Local checking is the main focus and the main contribution of this thesis.

CS-TR-95-1534

Report Number: CS-TR-95-1535

Institution: Stanford University, Department of Computer Science

Title: Random Networks in Configuration Space for Fast Path Planning

Author: Kavraki, Lydia E.

Date: January 1995

Abstract: In the main part of this dissertation we present a new path planning method which computes collision-free paths for robots of virtually any type moving among stationary obstacles. This method proceeds according to two phases: a preprocessing phase and a query phase. In the preprocessing phase, a probabilistic network is constructed and stored as a graph whose nodes correspond to collision-free configurations and edges to feasible paths between these configurations. In the query phase, any given start and goal configurations of the robot are connected to two nodes of the network; the network is then searched for a path joining these two nodes. We apply our method to articulated robots with many degrees of freedom. Experimental results show that path planning can be done in a fraction of a second on a contemporary workstation ($\approx$ 150 MIPS), after relatively short preprocessing times (a few dozen to a few hundred seconds). In the second part of this dissertation, we present a new method that uses the the Fast Fourier Transform to compute the obstacle map required by certain path planning algorithms. In the final part of this dissertation, we consider a problem from assembly planning. In assembly planning we are interested in generating feasible sequences of motions that construct a mechanical product from its individual parts. We prove that the monotone assembly partitioning problem in the plane is NP-complete.

CS-TR-95-1535

Report Number: CS-TR-95-1536

Institution: Stanford University, Department of Computer Science

Title: Locomotion With A Unit-Modular Reconfigurable Robot

Author: Yim, Mark

Date: January 1995

Abstract: A unit-modular robot is a robot that is composed of modules that are all identical. Here we study the design and control of unit-modular dynamically reconfigurable robots. This is based upon the design and construction of a robot called Polypod. We further choose statically stable locomotion as the task domain to evaluate the design and control strategy. The result is the creation of many unique locomotion modes. To gain insight into the capabilities of robots like Polypod we examine locomotion in general by building a functional taxonomy of locomotion. We show that Polypod is capable of generating all classes of statically stable locomotion, a feature unique to Polypod. Next, we propose methods to evaluate vehicles under different operating conditions such as different terrain conditions. We then evaluate and compare each mode of locomotion on Polypod. This study leads to interesting insights into the general characteristics of the corresponding classes of locomotion. Finally, since more modules are expected to increase robot capability, it is important to examine the limit to the number of modules that can be put together in a useful form. We answer this question by investigating the issues of structural stability, actuator strength, computation and control requirements.

CS-TR-95-1536

Report Number: CS-TR-95-1537

Institution: Stanford University, Department of Computer Science

Title: Real-Time Modification of Collision-Free Paths

Author: Quinlan, Sean

Date: January 1995

Abstract: The modification of collision-free paths is proposed as the basis for a new framework to close the gap between global path planning and real-time sensor-based robot control. A physically-based model of a flexible string-like object, called an elastic band, is used to determine the modification of a path. The initial shape of the elastic is the free path generated by a planner. Subjected to artificial forces, the elastic band deforms in real time to a short and smooth path that maintains clearance from the obstacles. The elastic continues to deform as changes in the environment are detected by sensors, enabling the robot to accommodate uncertainties and react to unexpected and moving obstacles. While providing a tight connection between the robot and its environment, the elastic band preserves the global nature of the planned path. The greater part of this thesis deals with the design and implementation of elastic bands, with emphasis on achieving real-time performance even for robots with many degrees of freedom. To achieve these goals, we propose the concept of bubbles of free-space---a region of free-space around a given configuration of the robot generated from distance information. We also develop a novel algorithm for efficiently computing the distance between non-convex objects and a real-time algorithm for calculating a discrete approximation to the time-optimal parameterization of a path. These various developments are combined in a system that demonstrates the elastic band framework for a Puma 560 manipulator.

CS-TR-95-1537

Report Number: CS-TR-95-1538

Institution: Stanford University, Department of Computer Science

Title: 1994 Publications Summary of the Stanford Database Group

Author: Hammer, Joachim

Date: January 1995

Abstract: This technical report contains the first four pages of papers written by members of the Stanford Database Group during 1994. We believe that the first four pages convey the main ideas behind each paper better than a simple title and abstract does.

CS-TR-95-1538

Report Number: CS-TR-95-1539

Institution: Stanford University, Department of Computer Science

Title: Reasoning About Uncertainty in Robot Motion Planning

Author: Lazanas, Anthony

Date: August 1994

Abstract: In this thesis, we investigate the effects of uncertainty on the difficulty of robot motion planning, and we study the tradeoff between physical and computational complexity. We present a formulation of the general robot motion planning with uncertainty problem, so that a complete, correct, polynomial planner can be derived. The key idea is the existence of reduced uncertainty regions in the workspace (landmark regions). Planning is performed using the preimage backchaining method. We extend the standard definition of a ``nondirectional preimage'' to the case where a motion command depends on an arbitrary number of control parameters. The resulting multi-dimensional preimage can be represented with a polynomial number of 2-D slices, each computed for a critical combination of values of the parameters. We present implemented algorithms for one parameter (the commanded direction of motion) and for two parameters (the commanded direction of motion and the directional uncertainty). Experimentation with the algorithm using a real mobile robot has been successful. By engineering the workspace, we have been able to satisfy all the assumptions of our planning model. As a result, the robot has been able to operate for long periods of time with no failures.

CS-TR-95-1539

Report Number: CS-TR-95-1540

Institution: Stanford University, Department of Computer Science

Title: Model-Matching and Individuation for Model-Based Diagnosis

Author: Murdock, Janet L.

Date: January 1995

Abstract: In model-based systems that reason about the physical world, models are attached to portions of the physical system. To make model-based systems more extensible and re-usable, this thesis explores automating model-matching. Models address particular individuals, portions of the physical world identified as separate entities. If the set of models is not fixed, one cannot carve the physical system into a fixed set of individuals. Our goals are to develop methods for matching and individuating and identify characteristics of physical equipment and models required by those methods. Our approach is to identify a set of characteristics, build a system which used them, and test re-usability and extensibility. If the system correctly defines individuals and matches models, even when models calls for individuals not previously defined, then we can conclude that we have identified some subset of the characteristics required. The system matches models to a series of equipment descriptions, simulating re-use. We also add a number of models, extending the system, having it match the new models. Our investigation shows characteristics required are the 3-dimensional space and how the space is filled by functional components, phases, materials, and parameters.

CS-TR-95-1540

Report Number: CS-TR-95-1541

Institution: Stanford University, Department of Computer Science

Title: Random Sampling in Graph Optimization Problems

Author: Karger, David R.

Date: February 1995

Abstract: The representative random sample is a central concept of statistics. It is often possible to gather a great deal of information about a large population by examining a small sample randomly drawn from it. This approach has obvious advantages in reducing the investigator's work, both in gathering and in analyzing the data. We apply the concept of a representative sample to combinatorial optimization. Our focus is optimization problems on undirected graphs. Highlights of our results include: The first (randomized) linear time minimum spanning tree algorithm; A (randomized) minimum cut algorithm with running time roughly O(n^2) as compared to previous roughly O(n^3) time bounds, as well as the first algorithm for finding all approximately minimal cuts and multiway cuts; An efficient parallelization of the minimum cut algorithm, providing the first parallel (RNC) algorithm for minimum cuts; A derandomization finding minimum cut in NC; Provably accurate approximations to network reliability; Very fast approximation algorithms for minimum cuts, s-t minimum cuts, and maximum flows; Significantly improved polynomial-time approximation bounds for network design problems; For coloring 3-colorable graphs, improvements in the approximation bounds from O(n^{3/8}) to O(n^{1/4}); An analysis of random sampling in Matroids.

CS-TR-95-1541

Report Number: CS-TR-95-1542

Institution: Stanford University, Department of Computer Science

Title: Parallel Genetic Programming on a Network of Transputers

Author: Koza, John R.

Author: Andre, David

Date: January 1995

Abstract: This report describes the parallel implementation of genetic programming in the C programming language using a PC 486 type computer (running Windows) acting as a host and a network of transputers acting as processing nodes. Using this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at a cost that is intermediate between the two. A comparison is made of the computational effort required to solve the problem of symbolic regression of the Boolean even-5-parity function with different migration rates. Genetic programming required the least computational effort with an 8% migration rate. Moreover, this computational effort was less than that required for solving the problem with a serial computer and a panmictic population of the same size. That is, apart from the nearly linear speed-up in executing a fixed amount of code inherent in the parallel implementation of genetic programming, parallelization delivered more than linear speed-up in solving the problem using genetic programming.

CS-TR-95-1542

Report Number: CS-TR-95-1543

Institution: Stanford University, Department of Computer Science

Title: Stereo Without Search

Author: Tomasi, Carlo

Author: Manduchi, Roberto

Date: February 1995

Abstract: Search is not inherent in the correspondence problem. We propose a representation of images, called intrinsic curves, that combines the ideas of associative storage of images with connectedness of the representation: intrinsic curves are the paths that a set of local image descriptors trace as an image scanline is traversed from left to right. Curves become surfaces when full images are considered instead of scanlines. Because only the path in the space of descriptors is used for matching, intrinsic curves lose track of space, and are invariant with respect to disparity under ideal circumstances. Establishing stereo correspondences then becomes a trivial lookup problem. We also show how to use intrinsic curves to match real images in the presence of noise, brightness bias, contrast fluctuations, and moderate geometric distortion, and we show how intrinsic curves can be used to deal with image ambiguity and occlusions. We carry out experiments on single-scanline matching to prove the feasibility of the approach and illustrate its main features.

CS-TR-95-1543

Report Number: CS-TR-95-1544

Institution: Stanford University, Department of Computer Science

Title: On Diameter Verification and Boolean Matrix Multiplication.

Author: Basch, Julien

Author: Khanna, Sanjeev

Author: Motwani, Rajeev

Date: February 1995

Abstract: We present a practical algorithm that verifies whether a graph has diameter 2 in time O(n^{3} / log^{2} n}). A slight adaptation of this algorithm yields a boolean matrix multiplication algorithm which runs in the same time bound; thereby allowing us to compute transitive closure and verification of the diameter of a graph for any constant $d$ in O(n^{3} / log^{2} n}) time.

CS-TR-95-1544

Report Number: CS-TR-95-1545

Institution: Stanford University, Department of Computer Science

Title: Approximation Algorithms for the Largest Common Subtree Problem.

Author: Khanna, Sanjeev

Author: Motwani, Rajeev

Author: Yao, Frances F.

Date: February 1995

Abstract: The largest common subtree problem is to find a largest subtree which occurs as a common subgraph in a given collection of trees. We show that in case of bounded degree trees, we can achieve an approximation ratio of O(( n*loglog n ) / log^{2} n). In case of unbounded degree nodes, we give an algorithm with approximation ratio O(( n*(loglog n)^{2}) / log^{2} n) when the trees are unlabeled. An approximation ratio of O(( n*(loglog n)^{2} ) / log^{2} n) is also achieved for the case of labeled unbounded degree trees provided the number of distinct labels is O(log^{O(1)} n).

CS-TR-95-1545

Report Number: CS-TR-95-1546

Institution: Stanford University, Department of Computer Science

Title: Symbolic Approximations for Verifying Real-Time Systems

Author: Wong-Toi, Howard

Date: December 1994

Abstract: Real-time systems are appearing in more and more applications where their proper operation is critical, e.g. transport controllers and medical equipment. However they are extremely difficult to design correctly. One approach to this problem is the use of formal description techniques and automatic verification. Unfortunately automatic verification suffers from the state-explosion problem even without considering timing information. This thesis proposes a state-based approximation scheme as a heuristic for efficient yet accurate verification. We first describe a generic iterative approximation algorithm for checking safety properties of a transition system. Successively more accurate approximations of the reachable states are generated until the specification is provably satisfied or not. The algorithm automatically decides where the analysis needs to be more exact, and uses state partitioning to force the approximations to converge towards a solution. The method is complete for finite-state systems. The algorithm is applied to systems with hard real-time bounds. State approximations are performed over both timing information and control information. We also approximate the system's transition structure. Case studies include some timing properties of the MAC sublayer of the Ethernet protocol, the tick-tock service protocol, and a timing-based communication protocol where the sender's and receiver's clocks advance at variable rates.

CS-TR-95-1546

Report Number: CS-TR-95-1547

Institution: Stanford University, Department of Computer Science

Title: Sharp, Reliable Predictions using Supervised Mixture Models

Author: Roy, H. Scott

Date: March 1995

Abstract: This dissertation develops a new way to make probabilistic predictions from a database of examples. The method looks for regions in the data where different predictions are appropriate, and it naturally extends clustering algorithms that have been used with great success in exploratory data analysis. In probabilistic terms, the new method looks at the same models as before, but it only evaluates them for the conditional probability they assign to a single feature rather than the joint probability they assign to all features. A good models is therefore forced to classify the data in a way that is useful for a single, desired prediction, rather than just identifying the strongest overall pattern in the data. The results of this dissertation extend the clean, Bayesian approach of the unsupervised AutoClass system to the supervised learning problems common in everyday practice. Highlights include clear probabilistic semantics, prediction and use of discrete, categorical, and continuous data, priors that avoid the overfitting problem, an explicit noise model to identify unreliable predictions, and the ability to handle missing data. A computer implementation, MultiClass, validates the ideas with performance that exceeds neural nets, decision trees, and other current supervised machine learning systems.

CS-TR-95-1547

Report Number: CS-TR-95-1548

Institution: Stanford University, Department of Computer Science

Title: Routing and Admission Control in General Topology Networks

Author: Gawlick, Rainer

Author: Kamath, Anil

Author: Plotkin, Serge

Author: Ramakrishnan, K. G.

Date: May 1995

Abstract: Emerging high speed Broadband Integrated Services Digital Networks (B-ISDN) will carry traffic for services such as video-on-demand and video teleconferencing -- that require resource reservation along the path on which the traffic is sent. As a result, such networks will need effective {\em admission control} algorithms. The simplest approach is to use greedy admission control; in other words, accept every resource request that can be physically accommodated. However, in the context of symmetric loss networks (networks with a complete graph topology), non-greedy admission control has been shown to be more effective than greedy admission control. This paper suggests a new {\em non-greedy} routing and admission control algorithm for {\em general topology} networks. In contrast to previous algorithms, our algorithm does not require advance knowledge of the traffic patterns. Our algorithm combines key ideas from a recently developed theoretical algorithm with a stochastic analysis developed in the context of reservation-based algorithms. We evaluate the performance of our algorithm using extensive simulations on an existing commercial network topology and on variants of that topology. The simulations show that our algorithm outperforms greedy admission control over a broad range of network environments. The simulations also illuminate some important characteristics of our algorithm. For example, we characterize the importance of the implicit routing effects of the admission control part of our algorithm.

CS-TR-95-1548

Report Number: CS-TR-95-1549

Institution: Stanford University, Department of Computer Science

Title: Dynamic Selection of Models

Author: Rutledge, Geoffrey William

Date: March 1995

Abstract: This dissertation develops an approach to high-stakes, model-based decision making under scarce computation resources, bringing together concepts and techniques from the disciplines of decision analysis, statistics, artificial intelligence, and simulation. A method is developed and implemented to solve a time-critical decision problem in the domain of critical-care medicine. This method selects models that balance the prediction accuracy and the need for rapid action. Under a computation-time constraint, the optimal model for a model-based control application is a model that maximizes the tradeoff of model benefit (a measure of how accurately the model predicts the effects of alternative control settings) and model cost (a measure of the length of the model-induced computation delay). This work describes a real-time algorithm that selects, from a graph of models (GoM), a model that is accurate and that is computable within a time constraint. The DSM algorithm is a metalevel reasoning strategy that relies on a dynamic-selection-of-models (DSM) metric to guide the search through a GoM that is organized according to the simplifying assumptions of the models. The DSM metric balances an estimate of the probability that a model will achieve the required prediction accuracy and the cost of the expected model-induced computation delay. The DSM algorithm provides an approach to automated reasoning about complex systems that applies at any level of computation-resource or computation-time constraint. The DSM algorithm is implemented in Konan, a program that performs dynamic selection of patient-specific models from a GoM of quantitative physiologic models. Konan selects models that allow a model-based control application (a ventilator-management advisor) to make real-time decisions for the control settings of a mechanical ventilator.

CS-TR-95-1549

Report Number: CS-TR-95-1550

Institution: Stanford University, Department of Computer Science

Title: Theory and Design of a Hybrid Pattern Recognition System

Author: Drakopoulos, John A.

Date: May 1995

Abstract: Pattern recognition methods can be divided into four different categories: statistical or probabilistic, structural, possibilistic or fuzzy, and neural methods. A formal analysis shows that there is a computational complexity versus representational power trade-off between probabilistic and possibilistic or fuzzy set measures, in general. Furthermore, sigmoidal theory shows that fuzzy set membership can be represented effectively by sigmoidal functions. Those results and the formalization of sigmoidal functions and subsequently multi-sigmoidal functions and neural networks led to the development of a hybrid pattern recognition system called tFPR. tFPR is a hybrid fuzzy, neural, and structural pattern recognition system that uses fuzzy sets to represent multi-variate pattern classes that can be either static or dynamic depending on time or some other parameter space. The membership functions of the fuzzy sets that represent pattern classes are modeled in three different ways. Simple sigmoidal configurations are used for simple patterns, a structural pattern recognition method is used for dynamic patterns, and multi-sigmoidal neural networks are used for pattern classes for which is difficult to obtain a formal definition. Although efficiency is a very important consideration in tFPR, the main issues are knowledge acquisition and knowledge representation (in terms of pattern class descriptions).

CS-TR-95-1550

Report Number: CS-TR-95-1551

Institution: Stanford University, Department of Computer Science

Title: Two Methods for Checking Formulas of Temporal Logic

Author: McGuire, Hugh W.

Date: June 1995

Abstract: This dissertation presents two methods for determining satisfiability or validity of formulas of Discrete Metric Annotated Linear Temporal Logic. This logic is convenient for representing and verifying properties of reactive and concurrent systems, including software and electronic circuits. The first method presented here is an algorithm for automatically deciding whether any given propositional temporal formula is satisfiable. This new algorithm efficiently extends the classical `semantic tableau'-algorithm to formulas with temporal operators which refer to the past or are metric. Then, whereas classical proofs of correctness for such algorithms are existential, the proof here is constructive; it shows that for any given formula being checked, any model of the formula is embedded in the tableau. The second method presented in this dissertation is a deduction-calculus for determining the validity of predicate temporal formulas. This new deduction-calculus employs a refined, conservative version of classical approaches involving translation from temporal forms to first-order expressions with time reified. Here, quantifications are elided, and addition is used instead of classical complicated combinations of comparisons. This scheme facilitates integration of powerful techniques such as associative-commutative unification and a Presburger decision-algorithm.

CS-TR-95-1551

Report Number: CS-TR-95-1552

Institution: Stanford University, Department of Computer Science

Title: Embedded Teaching of Reinforcement Learners

Author: Brafman, Ronen I.

Author: Tennenholtz, Moshe

Date: June 1995

Abstract: Knowledge plays an important role in an agent's ability to perform well in its environment. Teaching can be used to improve an agent's performance by enhancing its knowledge. We propose a specific model of teaching, which we call embedded teaching. An embedded teacher is an agent situated with a less knowledgeable ``student'' in a common environment. The teacher's goal is to lead the student to adopt a particular desired behavior. The teacher's ability to teach is affected by the dynamics of the common environment and may be limited by a restricted repertoire of actions or uncertainty about the outcome of actions; we explicitly represent these limitations as part of our model. In this paper, we address a number of theoretical issues including the characterization of a challenging embedded teaching domain and the computation of optimal teaching policies. We then incorporate these ideas in a series of experiments designed to evaluate our ability to teach two types of reinforcement learners.

CS-TR-95-1552

Report Number: CS-TR-95-1553

Institution: Stanford University, Department of Computer Science

Title: Modeling techniques and algorithms for probabilistic model-based diagnosis and repair

Author: Srinivas, Sampath

Date: July 1995

Abstract: Model-based diagnosis centers on the use of a behavioral model of a system to infer diagnoses of anomalous behavior. For model-based diagnosis techniques to become practical, some serious problems in the modeling of uncertainty and in the tractability of uncertainty management have to be addressed. These questions include: How can we tractably generate diagnoses in large systems? Where do the prior probabilities of component failure come from when modeling a system? How do we tractably compute low-cost repair strategies? How can we do diagnosis even if only partial descriptions of device operation are available? This dissertation seeks to bring model-based diagnosis closer to being a viable technology by addressing these problems. We develop a set of tractable algorithms and modeling techniques that address each of the problems introduced above. Our approach synthesizes the techniques used in model-based diagnosis and techniques from the field of Bayesian networks.

CS-TR-95-1553

Report Number: CS-TR-95-1554

Institution: Stanford University, Department of Computer Science

Title: The Computer Science Technical Report (CS-TR) Project: Considerations from the Library Perspective.

Author: Lasher, Rebecca

Author: Reich, Vicky

Author: Anderson, Greg

Date: July 1995

Abstract: In 1992 the Advanced Research Projects Agency (ARPA) funded a three year grant to investigate the questions related to large-scale, distributed, digital libraries. The award focused research on Computer Science Technical Reports (CS-TR) and was granted to the Corporation for National Research Initiatives (CNRI) and five research universities. The ensuing collaborative research has focused on a broad spectrum of technical, social, and legal issues, and has encompassed all aspects of a very large, heterogeneous distributed digital library environment: acquisition, storage, organization, search, retrieval, display, use and intellectual property. The initial corpus of this digital library is a coherent digital collection of CS-TRs created at the five participating universities: Carnegie Mellon, Cornell, MIT, Stanford, and the University of California at Berkeley. The Corporation for National Research Initiatives serves as a collaborator and agent for the project. This technical report summarizes the accomplishments and collaborative efforts of the CS-TR project from a librarian's perspective; to do this we address the following questions: 1. Why do librarians and computer scientists make good research partners? 2. What has been learned? 3. What new questions have been articulated? 4. How can the accomplishments be moved into a service environment? 5. What actions and activities might follow from this effort?

CS-TR-95-1554

Report Number: CS-TR-95-1555

Institution: Stanford University, Department of Computer Science

Title: Real-time Database Experiences in Network Management Application

Author: Kiriha, Yoshiaki

Date: September 1995

Abstract: This report discusses our experiences with real-time databases in the context of a network management system, in particular a MIB (Management Information Base) implementation. We propose an active and real-time MIB (ART-MIB) architecture that utilizes a real-time database system. The ART-MIB contains a variety of modules, such as transaction manager, task manager, and resource manager. Among the functionalities provided by ART-MIB, we focus on transaction scheduling within a memory based real-time database system. For the developed ART-MIB prototype, we have evaluated two typical real-time transaction scheduling algorithms: earliest deadline first (EDF) and highest value first (HVF). The main results of our performance comparison show that EDF outperforms HVF under a low load; however, HVF outperforms EDF in an overload situation. Furthermore, the fact that the performance crossover point closely depends on the magnitude of the scheduler queue, has been validated.

CS-TR-95-1555

Report Number: CS-TR-95-1556

Institution: Stanford University, Department of Computer Science

Title: Solving Unweighted and Weighted Bipartite Matching Problems in Theory and Practice

Author: Kennedy, J. Robert, Jr.

Date: August 1995

Abstract: The push-relabel method has been shown to be efficient for solving maximum flow and minimum cost flow problems in practice, and periodic global updates of dual variables have played an important role in the best implementations. Nevertheless, global updates had not been known to yield any theoretical improvement in running time. In this work, we study techniques for implementing push-relabel algorithms to solve bipartite matching and assignment problems. We show that global updates yield a theoretical improvement in the bipartite matching and assignment contexts, and we develop a suite of efficient cost-scaling push-relabel implementations to solve assignment problems. For bipartite matching, we show that a push-relabel algorithm using global updates matches the best time bound known (roughly the number of edges times the square root of the number of nodes --- better for dense graphs) and performs worse by a factor of the square root of the number of nodes without the updates. We present a similar result for the assignment problem, for which an algorithm that assumes integer costs has running time asymptotically dominated by the number of edges times the number of nodes times a scaling factor logarithmic in the number of nodes and the largest magnitude of an edge cost in the problem. The bound we obtain matches the best cost-scaling bound known. We develop cost-scaling push-relabel implementations that take advantage of the assignment problem's special structure, and compare our codes against the best codes from the literature. The results show that the push-relabel method is very promising for practical use.

CS-TR-95-1556

Report Number: CS-TR-95-1557

Institution: Stanford University, Department of Computer Science

Title: Hierarchical Models of Synchronous Circuits for Formal Verification and Substitution

Author: Wolf, Elizabeth Susan

Date: October 1995

Abstract: We develop a mathematical model of synchronous sequential circuits that supports both formal hierarchical verification and substitution. We have implemented and proved the correctness of automatic decision procedures for both of these applications using these models. For hierarchical verification, we model synchronous circuit specifications and implementations uniformly. Each of these descriptions provides both a behavioral and a structural view of the circuit or specification being modeled. We compare the behavior of a circuit model to a requirements specification in order to determine whether the circuit is an acceptable implementation of the specification. Our structural view of a circuit provides the capability to plug in one circuit component in place of another. We derive a requirements specification for the acceptable replacement components, in terms of the desired behavior of the full circuit. We also support nondeterministic specifications, which capture the minimum requirements of a circuit. Previous formalisms have relied on syntactic methods for distinguishing apparent from actual unlatched feedback loops in hierarchical hardware designs. However, these methods are not applicable to nondeterministic models. Our model of the behavior of a synchronous circuit within a single clock cycle provides a semantic method to identify cyclic dependencies even in the presence of nondeterminism.

CS-TR-95-1557

Report Number: CS-TR-95-1558

Institution: Stanford University, Department of Computer Science

Title: Designing an Academic Firewall: Policy, Practice and Experience With SURF

Author: Greenwald, Michael B.

Author: Singhal, Sandeep K.

Author: Stone, Jonathan R.

Author: Cheriton, David R.

Date: December 1995

Abstract: Corporate network firewalls are well-understood and are becoming commonplace. These firewalls establish a security perimeter that aims to block (or heavily restrict) both incoming and outgoing network communication. We argue that these firewalls are neither effective nor appropriate for academic or corporate research environments needing to maintain information security while still supporting the free exchange of ideas. In this paper, we present the Stanford University Research Firewall (SURF), a network firewall design that is suitable for a research environment. While still protecting information and computing resources behind the firewall, this firewall is less restrictive of outward information flow than the traditional model; can be easily deployed; and can give internal users the illusion of unrestricted e-mail, anonymous FTP, and WWW connectivity to the greater Internet. Our experience demonstrates that an adequate firewall for a research environment can be constructed for minimal cost using off-the-shelf software and hardware components.

CS-TR-95-1558

Report Number: CS-TR-95-1559

Institution: Stanford University, Department of Computer Science

Title: On the number of equilibrium placements of mass distributions in elliptic potential fields

Author: Kavraki, Lydia E.

Date: December 1995

Abstract: Recent papers have demonstrated the use of force fields for mechanical part orientation. The force field is realized on a plane on which the part is placed. The forces exerted on the part's contact surface translate and rotate the part to an equilibrium orientation. Part manipulation by force fields is very attractive since it requires no sensing. We describe force fields that result from elliptic potentials and induce only 2 stable equilibrium orientations for most parts. The proposed fields represent a considerable improvement over previously developed force fields which produced O(n) equilibria for polygonal parts with n vertices. The successful realization of these force fields could significantly affect part manipulation in industrial automation.

CS-TR-95-1559

Report Number: CS-TR-95-1560

Institution: Stanford University, Department of Computer Science

Title: Wrappers for Performance Enhancements and Oblivious Decision Graphs.

Author: Kohavi, Ron

Date: September 1995

Abstract: In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs). For accuracy estimation, we investigate cross-validation and the~.632 bootstrap. We show examples where they fail and conduct a large scale study comparing them. We conclude that repeated runs of five-fold cross-validation give a good tradeoff between bias and variance for the problem of model selection used in later chapters. We define the wrapper approach and use it for feature subset selection and parameter tuning. We relate definitions of feature relevancy to the set of optimal features, which is defined with respect to both a concept and an induction algorithm. The wrapper approach requires a search space, operators, a search engine, and an evaluation function. We investigate all of them in detail and introduce compound operators for feature subset selection. Finally, we abstract the search problem into search with probabilistic estimates. We introduce decision tables with a default majority rule (DTMs) to test the conjecture that feature subset selection is a very powerful bias. The accuracy of induced DTMs is surprisingly powerful, and we concluded that this bias is extremely important for many real-world datasets. We show that the resulting decision tables are very small and can be succinctly displayed. We study properties of oblivious read-once decision graphs (OODGs) and show that they do not suffer from some inherent limitations of decision trees. We describe a a general framework for constructing OODGs bottom-up and specialize it using the wrapper approach. We show that the graphs produced are use less features than C4.5, the state-of-the-art decision tree induction algorithm, and are usually easier for humans to comprehend.

CS-TR-95-1560

Report Number: CS-TR-95-1561

Institution: Stanford University, Department of Computer Science

Title: Techniques for Efficient Formal Verification Using Binary Decision Diagrams

Author: Hu, Alan John

Date: December 1995

Abstract: The appeal of automatic formal verification is that it's automatic -- minimal human labor and expertise should be needed to get useful results and counterexamples. BDD(binary decision diagram)-based approaches have promised to allow automatic verification of complex, real systems. For large classes of problems, however, (including many distributed protocols, multiprocessor systems, and network architectures) this promise has yet to be fulfilled. Indeed, the few successes have required extensive time and effort from sophisticated researchers in the field. This thesis identifies several common obstacles to BDD-based automatic formal verification and proposes techniques to overcome them by avoiding building certain problematic BDDs needed in the standard approaches and by exploiting automatically generated and user-supplied don't-care information. Several examples illustrate the effectiveness of the new techniques in enlarging the envelope of problems that can routinely be verified automatically.

CS-TR-95-1561

Report Number: CS-TR-95-1562

Institution: Stanford University, Department of Computer Science

Title: STeP: The Stanford Temporal Prover (Educational Release) User's Manual

Author: Bjorner, Nikolaj

Author: Browne, Anca

Author: Chang, Eddie

Author: Colon, Michael

Author: Kapur, Arjun

Author: Manna, Z ohar

Author: Sipma, Henny B.

Author: Uribe, Tomas E.

Date: November 1995

Abstract: The STeP (Stanford Temporal Prover) system supports the computer-aided verification of reactive and real-time systems. It combines deductive methods with algorithmic techniques to allow the verification of a broad class of systems, including infinite-state systems and parameterized N-process programs. STeP provides the visual language of verification diagrams that allow the user to construct proofs hierarchically, starting from a high-level proof sketch. The availability of automatically generated bottom-up and top-down invariants and an integrated suite of decision procedures allow most verification conditions to be checked without user intervention.

CS-TR-95-1562

Report Number: CS-TR-96-1563

Institution: Stanford University, Department of Computer Science

Title: Database Research: Achievements and Opportunities into the 21st Century

Author: Silberschatz, Avi

Author: Stonebraker, Michael

Author: Ullman, Jeffrey D.

Date: February 1996

Abstract: In May, 1995 an NSF workshop on the future of database management systems research was convened. This paper reports the conclusions of that meeting. Among the most important directions for future DBMS research recommended by the panel are: support for multimedia objects; managing distributed and loosely coupled information, as on the world-wide web; supporting new database applications such as data mining and warehousing; workflow and other complex transaction-management problems, and enhancing the ease-of-use of DBMS's for both users and system managers.

CS-TR-96-1563

Report Number: CS-TR-96-1564

Institution: Stanford University, Department of Computer Science

Title: Medical Applications of Neural Networks: Connectionist Models of Survival

Author: Ohno-Machado, Lucila

Date: March 1996

Abstract: Although neural networks have been applied to medical problems in recent years, their applicability has been limited for a variety of reasons. One of those barriers has been the problem of recognizing rare categories. In this dissertation, I demonstrate, and prove the utility of, a new method for tackling this problem. In particular, I have developed a method that allows the recognition of rare categories with high sensitivity and specificity, and will show that it is practical and robust. This method involves the construction of sequential neural networks. Rare categories occur and must be learned if practical application of neural-network technology is to be achieved. Survival analysis is one area in which this problem appears. In this work, I test the hypotheses that (1) sequential systems of neural networks produce results that are more accurate (in terms of calibration and resolution) than nonsequential neural networks; and (2) in certain circumstances, sequential neural networks produce more accurate estimates of survival time than Cox proportional hazards and logistic regression models. I use two sets of data to test the hypotheses: (1) a data set of HIV+ patients; and (2) a data set of patients followed prospectively for the development of cardiac conditions. I show that a neural network model can predict death due to AIDS more accurately than a Cox proportional hazards model. Furthermore, I show that a sequential neural network model is more accurate than a standard neural network model. I show that the predictions of logistic regression and neural networks are not significantly different, but that any of these models used sequentially is more accurate than its standard counterpart.

CS-TR-96-1564

Report Number: CS-TR-96-1565

Institution: Stanford University, Department of Computer Science

Title: A Formal Model for Bridging Heterogeneous Relational Databases in Clinical Medicine

Author: Sujansky, Walter

Date: April 1996

Abstract: This document describes the results of my thesis research, which focused on developing a standard query interface to heterogenous clinical databases. The high-level goal of this work was to *insulate* the developers of clinical computer applications from the implementation details of clinical databases, thereby facilitating the *sharing* of clinical computer applications across institutions with different database implementations. Most clinical databases store information about patients' diagnoses, laboratory results, medication orders, drug allergies, and demographic background. These data are valuable as the inputs to computer applications that provide real-time decision support, monitor the quality of care, and analyze data for research purposes. Clinical databases at different institutions, however, vary significantly in the way the databases model, represent, and retrieve clinical data. This database heterogeneity makes it impossible for a single computer application to retrieve data from the clinical databases of various institutions because the database queries included in the application must be formulated differently for each institution. Therefore, database heterogeneity makes it difficult to share computer applications across institutions with different database implementations. In my work, I have developed an *abstract* model of clinical data and an *abstract* query language that allow the developers of computer applications to formulate queries independently of the institution-specific features of clinical databases. I have also developed a database mapping language and a formal query-translation method that automatically translate the abstract queries that appear in applications into equivalent institution-specific queries. This framework ostensibly allows copies of a single computer application to be distributed to multiple institutions and to be customized automatically at each of the institutions such that the queries in each copy of the application can retrieve data from the local clinical database. This dissertation formally describes the abstract data model, the abstract query language, the mapping language, and the translation algorithm. It also presents the results of a formal evaluation that I performed to assess the feasibility and utility of this approach for sharing clinical computer applications.

CS-TR-96-1565

Report Number: CS-TR-96-1566

Institution: Stanford University, Department of Computer Science

Title: Clocked Transition Systems

Author: Manna, Z ohar

Author: Pnueli, Amir

Date: April 1996

Abstract: This paper presents a new computational model for real-time systems, called the clocked transition system model. The model is a development of our previous timed transition model, where some of the changes are inspired by the model of timed automata. The new model leads to a simpler style of temporal specification and verification, requiring no extension of the temporal language. For verifying safety properties, we present a run-preserving reduction from the new real-time model to the untimed model of fair transition systems. This reduction allows the (re)use of safety verification methods and tools, developed for untimed reactive systems, for proving safety properties of real-time systems.

CS-TR-96-1566

Report Number: CS-TR-96-1567

Institution: Stanford University, Department of Computer Science

Title: Synthesis of Reactive Programs

Author: Anuchitanukul, Anuchit

Date: April 1996

Abstract: We study various problems of synthesizing reactive programs. A reactive program is a program whose behaviors are not merely functional relationships between inputs and outputs, but sequences of actions as well as interactions between the program and its environment. The goal of program synthesis in general is to find an implementation of a program such that the behaviors of the implementation satisfy a given specification. The reactive behaviors that we study are omega-regular infinite sequences and regular finite sequences. The domain of the implementation is (finite) transition systems for closed system synthesis, and transition system modules for open system synthesis. We consider various solutions, e.g. basic, maximal, modular and exact, for any particular subclasses of the implementation language and investigate how characteristics of the program such as fairness, number of processes and composition operations, affect the synthesis algorithm. In addition to the automata-theoretic algorithms, we give a synthesis algorithm which synthesizes a program directly from the linear-time temporal logic ETL.

CS-TR-96-1567

Report Number: CS-TR-96-1568

Institution: Stanford University, Department of Computer Science

Title: Algorithms for computing intersection and union of toleranced polygons with applications

Author: Cazals, Frederic

Author: Ramkumar, G. D. S.

Date: April 1996

Abstract: Since mechanical operations are performed only up to a certain precision, the geometry of parts involved in real life products is never known precisely. Nevertheless, operations on toleranced objects have not been studied extensively. In this paper, we initiate a study of the analysis of the union and intersection of toleranced simple polygons. We provide a practical and efficient algorithm that stores in an implicit data structure the information necessary to answer a request for specific values of the tolerances without performing a computation from scratch. If the polygons are of sizes m and n, and s is the number of intersections between edges occuring for all the combinations of tolerance values, the pre-processed data structure takes O(s) space and the algorithm that computes a union/intersection from it takes O((n+m) log(s) + k' + k log(k)) time where k is the number of vertices of the union/intersection and k <= k' <= s. Although the algorithm is not output sensitive, we show that the expectations of k and k' remain within a constant factor tau, a function of the input geometry. Finally, we list interesting applications of the algorithms related to feasibility of assembly and assembly sequencing of real assemblies.

CS-TR-96-1568

Report Number: CS-TR-96-1569

Institution: Stanford University, Department of Computer Science

Title: Using Automatic Abstraction for Problem-Solving and Learning

Author: Unruh, Amy

Date: April 1996

Abstract: Abstraction is a powerful tool for controlling search combinatorics. This research presents a framework for automatic abstraction planning, and a family of associated abstraction methods, called SPATULA. The framework provides a structure within which different parameterized methods for automatic abstraction can be instantiated to generate abstraction planning behavior, and provides an integrated environment for abstract problem-solving and learning. A core idea underlying the abstraction techniques is that abstraction can arise as an obviation response to impasses in planning. Abstraction is performed at problem-solving time with respect to impasses in the current problem context, and thus the planner generates abstractions in response to specific situations. This approach is used to reduce the cost of lookahead evaluation searches, by performing abstract search in problem spaces which are automatically abstracted from the ground spaces during search. New search control rules are learned during abstract search; they constitute an abstract plan used in future situations, and produce an emergent multi-level abstraction behavior. The abstraction method has been implemented and evaluated. It has been shown to: reduce planning time, while still yielding good solutions; reduce learning time; and increase the effectiveness of learned rules by enabling them to transfer more widely.

CS-TR-96-1569

Report Number: CS-TR-96-1570

Institution: Stanford University, Department of Computer Science

Title: Optimization of SQL Queries for Parallel Machines

Author: Hasan, Waqar

Date: May 1996

Abstract: Parallel execution offers a method for reducing the response time of queries against large databases. We address the problem of parallel query optimization: Given a declarative SQL query, find a procedural parallel plan that delivers the query result in minimal time. We develop optimization algorithms using models that incorporate both sources and obstacles to speedup. We address independent, pipelined and partitioned parallelism. We incorporate inherent constraints on available parallelism and the extra cost of parallel execution. Our models are motivated by experiments with NonStop SQL, a commercial parallel DBMS. We adopt a two-phase approach to parallel query optimization: JOQR (join ordering and query rewrite), followed by parallelization. JOQR minimizes total work. Then, parallelization spreads work among processors to minimize response time. For JOQR, we model communication costs and abstract physical characteristics of data as colors. We devise tree coloring and reordering algorithms that are efficient and optimal. We model parallelization as scheduling a tree whose nodes represent operators and edges represent parallel/precedence constraints. Computation/communication costs are represented as node/edge weights. We prove worst-case bounds on the performance ratios of our algorithms and measure average cases using simulation. Our results enable the construction of SQL compilers that effectively exploit parallel machines.

CS-TR-96-1570

Report Number: CS-TR-96-1571

Institution: Stanford University, Department of Computer Science

Title: Formal Verification of Performance and Reliability of Real-Time Systems

Author: DeAlfaro, Luca

Date: June 1996

Abstract: In this paper we propose a methodology for the specification and verification of performance and reliability properties of real-time systems within the framework of temporal logic. The methodology is based on the system model of stochastic real-time systems (SRTSs), and on branching-time temporal logics that are extensions of the probabilistic logics pCTL and pCTL*. SRTSs are discrete-time transition systems that can model both probabilistic and nondeterministic behavior. The specification language extends the branching-time logics pCTL and pCTL* by introducing an operator to express bounds on the average time between events. We present model-checking algorithms for the algorithmic verification of system specifications, and we discuss their complexity.

CS-TR-96-1571

Report Number: CS-TR-96-1572

Institution: Stanford University, Department of Computer Science

Title: Caching and Non-Horn Inference in Model Elimination Theorem Provers

Author: Geddis, Donald F.

Date: June 1996

Abstract: Caching in an inference procedure holds the promise of replacing exponential search with constant-time lookup, at a cost of slightly-increased overhead for each node expansion. Caching will be useful if subgoals are repeated often enough during proofs. In experiments on solving queries using a backward chainer on Horn theories, caching appears to be very helpful on average. When trying to extend this success to first-order theories, however, intuition suggests that subgoal caches are no longer useful. The cause is that complete first-order backward chaining requires goal-goal resolutions in addition to resolutions with the database, and this introduces a context-sensitivity into the proofs for a subgoal. A cache is only feasible if the solutions are independent of context, so that they may be copied from one part of the space to another. It is shown here that a full exploration of a subgoal in one context actually provides complete information about the solutions to the same subgoal in all other contexts of the proof. In a straightforward way, individual solutions from one context may be copied over directly. More importantly, non-Horn failure caching is also feasible, so no additional solutions in the new context (that might affect the query) are possible and therefore there is no need to re-explore the space in the new context. Thus most Horn clause caching schemes may be used with minimal changes in a non-Horn setting. In addition, a new Horn clause caching scheme is proposed: postponement caching. This new scheme involves exploring the inference space as a graph instead of as a tree, so that a given literal will only occur once in the proof space. Despite the previous extension of failure caching to non-Horn theories, postponement caching is incomplete in the non-Horn case. A counterexample is presented, and possible enhancements to reclaim completeness are investigated.

CS-TR-96-1572

Report Number: CS-TR-96-1573

Institution: Stanford University, Department of Computer Science

Title: Depth Discontinuities by Pixel-To-Pixel Stereo

Author: Birchfield, Stan

Author: Tomasi, Carlo

Date: July 1996

Abstract: This report describes a two-pass binocular stereo algorithm that is specifically geared towards the detection of depth discontinuities. In the first pass, introduced in part I of the report, stereo matching is performed independently on each epipolar pair for maximum efficiency. In the second pass, described in part II, disparity information is propagated between the scanlines. Part I. Our stereo algorithm explicitly matches the pixels in the two images, leaving occluded pixels unpaired. Matching is based upon intensity alone without utilizing windows. Since the algorithm prefers piecewise constant disparity maps, it sacrifices depth accuracy for the sake of crisp boundaries, leading to precise localization of the depth discontinuities. Three features of the algorithm are worth noting: (1) unlike most stereo algorithms, it does not require texture throughout the images, making it useful in unmodified indoor settings, (2) it uses a measure of pixel dissimilarity that is provably insensitive to sampling, and (3) it prunes bad nodes during the search, resulting in a running time that is faster than that of standard dynamic programming. Part II. After the scanlines are processed independently, the disparity map is postprocessed, leading to more accurate disparities and depth discontinuities. Both the algorithm and the postprocessor are fast, producing a dense disparity map in about 1.5 microseconds per pixel per disparity on a workstation. Results on five stereo pairs are given.

CS-TR-96-1573

Report Number: CS-TR-96-1574

Institution: Stanford University, Department of Computer Science

Title: Effective Remote Modeling in Large-Scale Distributed Simulation and Visualization Environments

Author: Singhal, Sandeep K.

Date: September 1996

Abstract: A Distributed Interactive Simulation provides the illusion of a single, coherent virtual world to a group of users located at different machines connected by a network. Networked virtual environments are used for multiplayer video games, military and industrial training, and collaborative engineering. Network bandwidth, network latency, and host processing power limit the achievable size and detail of future simulations. This thesis describes network protocols and algorithms to support "remote modeling," allowing a host to model and render remote entities in large-scale distributed simulations. These techniques require fewer network resources and support more entity types than previous approaches. The Position History-Based Dead Reckoning (PHBDR) protocol provides accurate remote position modeling and minimizes dependencies on network performance and entity representation. PHBDR is a foundation for three protocols which model entity orientation, entity structural change, and entity groups. This thesis shows that a simple, efficient protocol can provide smooth, accurate remote position modeling and that it can be applied recursively to support entity orientation, structure, and aggregation at multiple levels of detail; these protocols offer performance and costs that are competitive with more complex and application-specific approaches, while providing simpler analyses of behavior by exploiting this recursive structure.

CS-TR-96-1574

Report Number: CS-TR-96-1575

Institution: Stanford University, Department of Computer Science

Title: Routing and Admission Control in General Topology Networks with Poisson Arrivals

Author: Kamath, Anil

Author: Palmon, Omri

Author: Plotkin, Serge

Date: October 1996

Abstract: Emerging high speed networks will carry traffic for services such as video-on-demand and video teleconferencing -- that require resource reservation along the path on which the traffic is sent. High bandwidth-delay product of these networks prevents circuit rerouting, i.e. once a circuit is routed on a certain path, the bandwidth taken by this circuit remains unavailable for the duration (holding time) of this circuit. As a result, such networks will need effective routing and admission control strategies. Recently developed online routing and admission control strategies have logarithmic competitive ratios with respect to the admission ratio (the fraction of admitted circuits). Such guarantees on performance are rather weak in the most interesting case where the rejection ratio of the optimum algorithm is very small or even 0. Unfortunately, these guarantees can not be improved in the context of the considered models, making it impossible to use these models to identify algorithms that are going to perform well in practice. In this paper we develop routing and admission control strategies for a more realistic model, where the requests for virtual circuits between any two points arrive according to a Poisson process and where the circuit holding times are exponentially distributed. Our model is close to the one that was developed to analyse and tune the (currently used) strategies for managing traffic in long-distance telephone networks. We strengthen this model by assuming that the rates of the Poisson processes (the ``traffic matrix'') are unknown to the algorithm and are chosen by the adversary. Our strategy is competitive with respect to the expected rejection ratio. More precisely, it achieves expected rejection ratio of at most R+epsilon, where R is the optimum expected rejection ratio. The expectations are taken over the distribution of the request sequences, and epsilon=Sqrt(r log n), where r is the maximum fraction of an edge bandwidth that can be requested by a single circuit.

CS-TR-96-1575

Report Number: CS-TR-96-1576

Institution: Stanford University, Department of Computer Science

Title: Query Reformulation under Incomplete Mappings

Author: Huyn, Nam

Date: December 1996

Abstract: This paper focuses on some of the important new translatability issues that arise in the problem of interoperation between two database schemas when mappings between these schemas are inherently more complex than traditional views or pure Datalog programs can capture. In many cases, sources cannot be redesigned, and mappings among them exhibit some form of incompleteness under which the question of whether a query can be translated across different schemas is not immediately obvious. The notion of query we consider here is the traditional one, in which the answers to a query are required to be definite: answers cannot be disjunctive or conditional and must refer only to domain constants. In this paper, mappings are modeled by Horn programs that allow existential variables, and queries are modeled by pure Datalog programs. We then consider the problem of eliminating functional terms from the answers to a Horn query where function symbols are allowed. We identify a class of Horn queries called "term-bounded" that are equivalent to pure Datalog queries. We present an algorithm that rewrites a term-bounded query into an "equivalent" pure Datalog query. Equivalence is defined here as yielding the same function-free answer.

CS-TR-96-1576

Report Number: CS-TR-96-1577

Institution: Stanford University, Department of Computer Science

Title: A More Aggressive Use Of Views To Extract Information

Author: Huyn, Nam

Date: December 1996

Abstract: Much recent work has focussed on using views to evaluate queries. More specifically, queries are rewritten to refer to views instead of the base relations over which the queries were originally written. The motivation is that the views represent the only ways in which some information source may be accessed. Another use of views that has been overlooked becomes important especially when no equivalent rewriting of a query in terms of views is possible: even though we cannot use the views to get all the answers to the query, we can still use them to deduce as many answers as possible. In many global information applications, the notion of equivalence used is often too restrictive. We propose a notion of pseudo-equivalence that allows more queries to be rewritten usefully: we show that if a query has an equivalent rewriting, the query also has a pseudo-equivalent rewriting. The converse is not true in general. In particular, when the views are conjunctive, we show that all Datalog queries over the source do have a pseudo-equivalent Datalog query over the views. We reduce the problem of finding pseudo-equivalent queries to that of rewriting Horn queries with Skolem functions as Datalog queries. We present an algorithm for the class of term-bounded Horn queries. We discuss extending the problem to larger classes of Horn queries, other non-Horn queries that result from ``inverting'' Datalog views and adding functional dependencies. The theory and methods developed in our work have important uses in query mediation between heterogeneous sources, automatic join discovery and view updates.

CS-TR-96-1577

Report Number: CS-TR-96-1578

Institution: Stanford University, Department of Computer Science

Title: State Reduction Methods for Automatic Formal Verification

Author: Ip, C. Norris

Date: December 1996

Abstract: Validation of industrial designs is becoming more challenging as technology advances. One of the most suitable debugging aids is automatic formal verification. This thesis presents several techniques for reducing the state explosion problem, that is, reducing the number of states that are examined. A major contribution of this thesis is the design of simple extensions to the Murphi description language, which enable us to convert two existing abstraction strategies into two fully automatic algorithms, making these strategies easy to use and safe to apply. These two algorithms rely on two facts about high-level designs: they frequently exhibit structural symmetry, and their behavior is often independent of the exact number of replicated components they contain. Another contribution is the design of a new state reduction algorithm, which relies on reversible rules (transitions that do not lose information) in a system description. This new reduction algorithm can be used simultaneously with the other two algorithms. These techniques, implemented in the Murphi verification system, have been applied to many applications, such as cache coherence protocols and distributed algorithms. In the cases of two important classes of infinite systems, infinite state graphs can be automatically converted to small finite state graphs.

CS-TR-96-1578

Report Number: CS-TR-97-1579

Institution: Stanford University, Department of Computer Science

Title: From the Valley of Heart's Delight to Silicon Valley: A Study of Stanford University's Role in the Transformation

Author: Tajnai, Carolyn

Date: January 1997

Abstract: This study examines the role of Stanford University in the transformation from the Valley of Heart's Delight to the Silicon Valley. At the dawn of the Twentieth Century, California's Santa Clara County was an agricultural paradise. Because of the benign climate and thousands of acres of fruit orchards, the area became known as the Valley of Heart's Delight. In the early 1890's, Leland and Jane Stanford donated land in the valley to build a university in memory of their son. Thus, Leland Stanford, Jr., University was founded. In the early 1930's, there were almost no jobs for young Stanford engineering graduates. This was about to change. Although there was no organized plan to help develop the economic base of the area around Stanford University, the concern about the lack of job opportunities for their graduates motivated Stanford faculty to begin the chain of events that led to the birth of Silicon Valley. Stanford University's role in the transformation of the Valley of Heart's Delight into Silicon Valley is history, but it is enduring history. Stanford continues to effect the local economy by spawning new and creative ideas, dreams, and ambitions.

CS-TR-97-1579

Report Number: CS-TR-97-1580

Institution: Stanford University, Department of Computer Science

Title: STARTS: Stanford Protocol Proposal for Internet Retrieval and Search

Author: Gravano, Luis

Author: Chang, Kevin

Author: Garcia-Molina, Hector

Author: Paepcke, Andreas

Date: January 1997

Abstract: Document databases are available everywhere, both within the internal networks of the organizations and on the Internet. The database contents are often "hidden" behind search interfaces. These interfaces vary from database to database. Also, the algorithms with which the associated search engines rank the documents in the query results are usually incompatible across databases. Even individual organizations use search engines from different vendors to index their internal document collections. These organizations could benefit from unified query interfaces to multiple search engines, for example, that would give users the illusion of a single big document database. Building such "metasearchers" is nowadays a hard task because different search engines are largely incompatible and do not allow for interoperability. To improve this situation, the Digital Library project at Stanford has coordinated among search-engine vendors and other key players to reach informal agreements for unifying basic interactions in these three areas. This is the final writeup of our informal "standards" effort. This draft is based on feedback from people from Excite, Fulcrum, GILS, Harvest, Hewlett-Packard Laboratories, Infoseek, Microsoft Network, Netscape, PLS, Verity, and WAIS, among others.

CS-TR-97-1580

Report Number: CS-TR-97-1581

Institution: Stanford University, Department of Computer Science

Title: Towards Interoperability in Digital Libraries: Overview and Selected Highlights of the Stanford Digital Library Project

Author: Paepcke, Andreas

Author: Cousins, Steve B.

Author: Garcia-Molina, Hector

Author: Hassan, Scott W.

Author: Ketchpel, Steven K.

Author: Roscheisen, Martin

Author: Winograd, Terry

Date: January 1997

Abstract: We outline the scope of the Stanford Digital Library Project which covers five areas: user interface work, technologies for locating information and library services, the emerging economic perspective of digital libraries, infrastructure technology and the use of agent technologies to support all of these aspects. We describe technical details for two specific efforts that have been realized in prototype implementions. First, we describe how we employ distributed object technology to move towards an implementation of our InfoBus vision. The InfoBus consists of translation services and wrappers around existing protocols to cope with the problem of interoperability and the distributed nature of emerging digital library services. We model autonomous, heterogeneous library services as CORBA proxy objects. This allows the construction of unified but extensible method-based interfaces for client programs to interact through. We describe how distributed objects enable the design of communication protocols that leave implementors a large degree of freedom. This is a benefit because the resulting implementations can allow users to choose among multiple performance profile tradeoffs while staying within the confines of the protocol. The second effort we cover describes InterPay which uses the object approach for an architecture that helps manage heterogeneity in payment mechanisms among autonomous services. The architecture is organized into three layers. The top layer contains elements involved in the task-level interaction with the services. The middle layer is responsible for enforcing user-specified payment policies. The lowest layer manages the mechanics of diverse online payment schemes.

CS-TR-97-1581

Report Number: CS-TR-97-1582

Institution: Stanford University, Department of Computer Science

Title: Shared Web Annotations as a Platform for Third-Party Value-Added, Information Providers: Architecture, Protocols, and Usage Examples

Author: Roscheisen, Martin

Author: Mogensen, Christian

Author: Winograd, Terry

Date: January 1997

Abstract: In this paper, we present an architecture, called "ComMentor", which provides a platform for third-party providers of lightweight super-structures to material provided by conventional content providers. It enables people to share structured in-place annotations about arbitrary on-line documents. The system is part of a general "virtual document" architecture ("PCD BRIO") in which--with the help of lightweight distributed meta information--documents are dynamically synthesized from distributed sources depending on the user context and the meta-information which has been attached to them. The meta-information is managed independently of the documents themselves on separate meta-information servers, both in terms of storage and authority. A wide range of useful scenarios can be readily realized on this platform. We give examples of how a more personalized content presentation can be achieved by leveraging the database storage of the uniform meta-information and generating documents dynamically for a particular user perspective. These include structured discussion about paper drafts, collaborative filtering, seals of approval, tours, shared "hotlists" with section-based visibility control, usage indicators, co-presence, and value-added trails. Our object model and request interface for the prototype implementation are defined in technical detail in the appendix.

CS-TR-97-1582

Report Number: CS-TR-97-1583

Institution: Stanford University, Department of Computer Science

Title: Boolean Query Mapping Across Heterogeneous Information Sources (Extended Version)

Author: Chang, Kevin Chen-Chuan

Author: Garcia-Molina, Hector

Author: Paepcke, Andreas

Date: January 1997

Abstract: Searching over heterogeneous information sources is difficult because of the non-uniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. In this paper we introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems.

CS-TR-97-1583

Report Number: CS-TR-97-1584

Institution: Stanford University, Department of Computer Science

Title: Grassroots: a System Providing a Uniform Framework for Communicating, Structuring, Sharing Information, and Organizing People

Author: Kamiya, Kenichi

Author: Roscheisen, Martin

Author: Winograd, Terry

Date: January 1997

Abstract: People keep pieces of information in diverse collections such as folders, hotlists, e-mail inboxes, newsgroups, and mailing lists. These collections mediate various types of collaborations including communicating, structuring, sharing information, and organizing people. Grassroots is a system that provides a uniform framework to support people's collaborative activities mediated by collections of information. The system seamlessly integrates functionalities currently found in such disparate systems as e-mail, newsgroups, shared hotlists, hierarchical indexes, hypermail, etc. Grassroots co-exists with these systems in that its users benefit from the uniform image provided by Grassroots, but other people can continue using other mechanisms, and Grassroots leverages from them. The current Grassroots prototype is based on an http-proxy implementation, and can be used with any Web browser. In the context of the design of a next-generation version of the Web, Grassroots demonstrates the utility of a uniform notification infrastructure.

CS-TR-97-1584

Report Number: CS-TR-97-1585

Institution: Stanford University, Department of Computer Science

Title: Techniques and Tools for Making Sense out of Heterogeneous Search Service Results

Author: Baldonado, Michelle Q. Wang

Author: Winograd, Terry

Date: January 1997

Abstract: We describe a set of techniques that allows users to interact with results at a higher level than the citation level, even when those results come from a variety of heterogeneous on-line search services. We believe that interactive result analysis allows users to "make sense" out of the potentially many results that may match the constraints they have supplied to the search services. The inspiration for this approach comes from reference librarians, who do not respond to patrons' questions with lists of citations, but rather give high-level answers that are tailored to the patrons' needs. We outline here the details of the methods we employ in order to meet our goal of allowing for dynamic, user-directed abstraction over result sets, as well as the prototype tool (SenseMaker) we have built based upon these techniques. We also take a brief look at the more general theory that underlies the tool, and hypothesize that it is applicable to flexible duplicate detection as well.

CS-TR-97-1585

Report Number: CS-TR-97-1586

Institution: Stanford University, Department of Computer Science

Title: Construction of a Three-dimensional Geometric Model for Segmentation and Visualization of Cervical Spine Images

Author: Pichumani, Ramani

Date: February 1997

Abstract: This report introduces a new technique for automatically extracting vertebral segments from three-dimensional computerized tomography (CT) and magnetic resonance (MR) images of the human cervical spine. An important motivation for this work is to provide accurate information for registration and for fusion of CT and MR images into a composite three-dimensional image. One of the major hurdles in performing image fusion is the difficulty of extracting and matching corresponding anatomical regions in an accurate, robust, and timely manner. The complementary properties of soft and bony tissues revealed in CT and MR imaging modalities makes it challenging to extract corresponding regions that can be correlated in an accurate and robust manner. Ambiguities in the images due to noise, distortion, limited resolution, and patient-specific structural variations also create additional challenges. Whereas fusion of CT and MR images of the cranium have already been performed, no one has yet developed an automated technique for fusing multimodality images of the spine. Unlike the head, which is relatively rigid, the spine is a complex, articulating object and is subject to structural deformation throughout the multimodal scanning process.

CS-TR-97-1586

Report Number: CS-TR-97-1587

Institution: Stanford University, Department of Computer Science

Title: Ensembles for Supervised Classification Learning

Author: Matan, Ofer

Date: March 1997

Abstract: This dissertation studies the use of multiple classifiers (ensembles or committees) in learning tasks. Both theoretical and practical aspects of combining classifiers are studied. First we analyze the representational ability of voting ensembles. A voting ensemble may perform either better or worse than each of its individual members. We give tight upper and lower bounds on the classification performance of a voting ensemble as a function of the classification performances of its individual members. Boosting is a method of combining multiple "weak" classifiers to form a "strong" classifier. Several issues concerning boosting are studied in this thesis. We study SBA, a hierarchical boosting algorithm proposed by Schapire, in terms of its representation and its search. We present a rejection boosting algorithm that trades-off exploration and exploitation: It requires fewer pattern labels at the expense of lower boosting ability. Ensembles may be useful in gaining information. We study their use to minimize labeling costs of data and to enable improvements on performance over time. For that purpose a model for on-site learning is presented. The system learns by querying "hard" patterns while classifying "easy" ones.

CS-TR-97-1587

Report Number: CS-TR-97-1588

Institution: Stanford University, Department of Computer Science

Title: Systems of Bilinear Equations

Author: Cohen, Scott

Author: Tomasi, Carlo

Date: April 1997

Abstract: How hard is it to solve a system of bilinear equations? No solutions are presented in this report, but the problem is posed and some preliminary remarks are made. In particular, solving a system of bilinear equations is reduced by a suitable transformation of its columns to solving a homogeneous system of bilinear equations. In turn, the latter has a nontrivial solution if and only if there exist two invertible matrices that, when applied to the tensor of the coefficients of the system, zero its first column. Matlab code is given to manipulate three-dimensional tensors, including a procedure that finds one solution to a bilinear system often, but not always.

CS-TR-97-1588

Report Number: CS-TR-97-1589

Institution: Stanford University, Department of Computer Science

Title: Learning Action Models for Reactive Autonomous Agents

Author: Benson, Scott Sherwood

Date: April 1997

Abstract: To be maximally effective, autonomous agents such as robots must be able both to react appropriately in dynamic environments and to plan new courses of action in novel situations. Reliable planning requires accurate models of the effects of actions---models which are often more appropriately learned through experience than designed. This thesis describes TRAIL (Teleo-Reactive Agent with Inductive Learning), an integrated agent architecture which learns models of actions based on experiences in the environment. These action models are then used to create plans that combine both goal-directed and reactive behaviors. Previous work on action-model learning has focused on domains that contain only deterministic, atomic action models that explicitly describe all changes that can occur in the environment. The thesis extends this previous work to cover domains that contain durative actions, continuous variables, nondeterministic action effects, and actions taken by other agents. Results have been demonstrated in several robot simulation environments and the Silicon Graphics, Inc. flight simulator. The main emphasis in this thesis is on the action-model learning process within TRAIL. The agent begins the learning process by recording experiences in its environment either by observing a trainer or by executing a plan. Second, the agent identifies instances of action success or failure during these experiences using a new analysis demonstrating nine possible causes of action failure. Finally, a variant of the Inductive Logic Programming algorithm DINUS is used to induce action models based on the action instances. As the action models are learned, they can be used for constructing plans whose execution contributes to additional learning experiences. Diminishing reliance on the teacher signals successful convergence of the learning process.

CS-TR-97-1589

Report Number: CS-TR-97-1590

Institution: Stanford University, Department of Computer Science

Title: Complexity Measures for Assembly Sequences

Author: Goldwasser, Michael

Date: June 1997

Abstract: Our work focuses on various complexity measures for two-handed assembly sequences. For many products, there exist an exponentially large set of valid sequences, and a natural goal is to use automated systems to select wisely from the choices. Although there has been a great deal of algorithmic success for finding feasible assembly sequences, there has been very little success towards optimizing the costs of sequences. We attempt to explain this lack of progress, by proving the inherent difficulty in finding optimal, or even near-optimal, assembly sequences. To begin, we define, "virtual assembly sequencing", a graph-theoretic problem that is a generalization of assembly sequencing, focusing on the combinatorial aspect of the family of feasible assembly sequences, while temporarily separating out the specific geometric assumptions inherent to assembly sequencing. We formally prove the hardness of finding even near-optimal sequences for most cost measures in our generalized framework. As a special case, we prove similar, strong inapproximability results for the problem of scheduling with AND/OR precedence constraints. Finally, we re-introduce the geometry, and continue by realizing several of these hardness results in rather simple geometric settings. We are able to show strong inapproximability results, for example using an assembly consisting solely of unit disks in the plane.

CS-TR-97-1590

Report Number: CS-TR-97-1592

Institution: Stanford University, Department of Computer Science

Title: Online Throughput-Competitive Algorithm for Multicast Routing and Admission Control

Author: Goel, Ashish

Author: Henzinger, Monika R.

Author: Plotkin, Serge

Date: July 1997

Abstract: We present the first polylog-competitive online algorithm for the general multicast problem in the throughput model. The ratio of the number of requests accepted by the optimum offline algorithm to the expected number of requests accepted by our algorithm is polylogarithmic in M and n, where M is the number of multicast groups and n is the number of nodes in the graph. We show that this is close to optimum by presenting an Omega(log n log M) lower bound on this ratio for any randomized online algorithm against an oblivious adversary. We also show that it is impossible to be competitive against an adaptive online adversary. As in the previous online routing algorithms, our algorithm uses edge-costs when deciding on which is the best path to use. In contrast to the previous competitive algorithms in the throughput model, our cost is not a direct function of the edge load. The new new cost definition allows us to decouple the effects of routing and admission decisions of different multicast groups.

CS-TR-97-1592

Report Number: CS-TR-97-1594

Institution: Stanford University, Department of Computer Science

Title: Interval and Point-Based Approaches to Hybrid System Verification

Author: Kapur, Arjun

Date: September 1997

Abstract: Hybrid systems are real-time systems consisting of both continuous and discrete components. This thesis presents deductive and diagrammatic methodologies for proving point-based and interval-based properties of hybrid systems, where the hybrid system is modeled in either a sampling semantics or a continuous semantics. Under a sampling semantics the behavior of the system consists of a discrete number of system snapshots, where each snapshot records the state of the system at a particular moment in time. Under a continuous semantics, the system behavior is given by a function mapping each point in time to a system state. Two continuous semantics are studied: a continuous interval semantics, where at any given point in time the system is in a unique state, and a super-dense semantics, where no such requirement is needed. We use Linear-time Temporal Logic for expressing properties under either a sampling semantics or a super-dense semantics, and we introduce Hybrid Temporal Logic for expressing properties under a continuous interval semantics. Linear-time Temporal Logic is useful for expressing point-based properties, whose validity is dependent on individual states, while Hybrid Temporal Logic is useful for expressing both interval-based properties, whose validity is dependent on intervals of time, and point-based properties. Finally, two different verification methodologies are presented: a diagrammatic approach for verifying properties specified in Linear-time Temporal Logic, and a deductive approach for verifying properties specified in Hybrid Temporal Logic.

CS-TR-97-1594

Report Number: CS-TR-97-1595

Institution: Stanford University, Department of Computer Science

Title: Maintaining data warehouses under limited source access

Author: Huyn, Nam

Date: September 1997

Abstract: A data warehouse stores views derived from data that may not reside at the warehouse. Using these materialized views, user queries can be answered quickly because querying the external sources where the base data reside is avoided. However, when the sources change, the views in the warehouse can become inconsistent with the base data and must be maintained. A variety of approaches have been proposed for maintaining these views incrementally. At the one end of the spectrum, the required view updates are computed without restricting which base relations can be used. View maintenance with this approach is simple but can be expensive, since it may involve querying the external data sources. At the other end of the spectrum, additional views are stored at the warehouse to make sure that there is enough information to maintain the views without ever having to query the data sources. While this approach saves on external source access, it may require a large amount of information to be stored and maintained at the warehouse. In this thesis, we propose an intermediate approach to warehouse maintenance based on what we call {\em Runtime View Self-Maintenance}, where the views are incrementally maintained without using all the base relations but without requiring additional views to facilitate maintenance. Under limited information, however, maintaining a view unambiguously may not always be possible. Thus, the main questions in runtime view self-maintenance are: - View self-maintainability. Under what conditions (on the given information) can a view be maintained unambiguously with respect to a given update? - View self-maintenance. If a view can be maintained unambiguously, how do we maintain it using only the given information? The information we consider using for maintaining a view includes: - At least the contents of the view itself and the update instance - Optionally, the contents of other views in the warehouse, functional dependencies the base relations are known to satisfy, a subset of the base relations, and partial contents of a base relation. Developing efficient complete solutions for the runtime self-maintenance of conjunctive-query views is the main focus and the main contribution of this thesis.

CS-TR-97-1595

Report Number: CS-TR-97-1596

Institution: Stanford University, Department of Computer Science

Title: Distributed Development of a Logic-Based Controlled Medical Terminology

Author: Campbell, Keith Eugene

Date: October 1997

Abstract: A controlled medical terminology (CMT) encodes clinical data: patient's physical signs, symptoms, and diagnoses. Application developers lack a robust CMT and the methodologies needed to coordinate terminology development within and between projects. In this dissertation, I argue that if a formal terminology model is adopted and integrated into a change-management process that supports dynamic CMTs, then CMTs can evolve from being an impediment to application development and data analysis to a valuable resource. My thesis states that such an evolutionary approach can be supported by using semantics-based methods for managing concurrent terminology development, thereby bypassing the disadvantages of traditional lock-based approaches common in database systems. By allowing developers to work concurrently on the terminology while relying on semantics-based methods to resolve the "collisions" that are inevitable in concurrent work, a scalable approach to terminology development can be supported. This dissertation discusses CMT development in terms of three research topics: 1. Representation of Clinical Data 2. Concurrency Control 3. Configuration Management

CS-TR-97-1596

Report Number: CS-TR-97-1597

Institution: Stanford University, Department of Computer Science

Title: The Earth Mover's Distance: Lower Bounds and Invariance under Translation

Author: Cohen, Scott

Author: Guibas, Leonidas

Date: November 1997

Abstract: The Earth Mover's Distance (EMD) between two finite distributions of weight is proportional to the minimum amount of work required to transform one distribution into the other. Current content-based retrieval work in the Stanford Vision Laboratory uses the EMD as a common framework for measuring image similarity with respect to color, texture, and shape content. In this report, we present some fast to compute lower bounds on the EMD which may allow a system to avoid exact, more expensive EMD computations during query processing. The effectiveness of the lower bounds is tested in a color-based retrieval system. In addition to the lower bound work, we also show how to compute the EMD under translation. In this problem, the points in one distribution are free to translate, and the goal is to find a translation that minimizes the EMD to the other distribution.

CS-TR-97-1597

Report Number: CS-TR-97-1598

Institution: Stanford University, Department of Computer Science

Title: Query Planning and Optimization in Information Integration

Author: Duschka, Oliver M.

Date: December 1997

Abstract: Information integration systems provide uniform user interfaces to varieties of different information sources. Our work focuses on query planning in such systems. Query planning is the task of transforming a user query, represented in the user's interface language and vocabulary, into queries that can be executed by the information sources. Every information source might require a different query language and might use different vocabularies. We show that query plans with a fixed number of database operations are insufficient to extract all information from the sources, if functional dependencies or limitations on binding patterns are present. Dependencies complicate query planning because they allow query plans that would otherwise be invalid. We present an algorithm that constructs query plans that are guaranteed to extract all available information in these more general cases. This algorithm is also able to handle datalog user queries. We examine further extensions of the languages allowed for user queries and for describing information sources: disjunction, recursion and negation in source descriptions, negation and inequality in user queries. For these more expressive cases, we determine the data complexity required of languages able to represent "best possible" query plans.

CS-TR-97-1598

Report Number: CS-TR-97-1599

Institution: Stanford University, Department of Computer Science

Title: Trial Banks: An Informatics Foundation for Evidence-Based Medicine

Author: PhD, Ida Sim, MD,

Date: December 1997

Abstract: Randomized clinical trials constitute one of our main sources of medical knowledge, yet trial reports are difficult to find, read, and apply to clinical care. I propose that authors report trials both as entries into electronic knowledge bases - or trial banks - and as text articles in traditional journals. Trial banks should be interoperable, and we thus require a shared ontology of clinical-trial concepts. My thesis work is the design, implementation, and evaluation of such an ontology. Using a new approach called competency decomposition, I show that my ontology design is reasonable, and that the ontology is competent for three of the four core tasks of clinical-trials interpretation for a broad range of trial types. Using this ontology, I implemented a frame-based trial bank that can be queried dynamically over the World Wide Web. Clinical researchers successfully used this system to critique trials in the trial bank. With the advent of digital publication, we have a window of opportunity to design our publication systems such that they support the transfer of evidence from the research world to the clinic. This dissertation presents foundational work for an interoperating trial-bank system that will help us achieve the day-to-day practice of evidence-based medicine.

CS-TR-97-1599

Report Number: CS-TR-97-1600

Institution: Stanford University, Department of Computer Science

Title: An Implementation of a Combinatorial Approximation Algorithm for Minimum-Cost Multicommodity Flow

Author: Goldberg, Andrew

Author: Oldham, Jeffrey D.

Author: Plotkin, Serge

Author: Stein, Cliff

Date: December 1997

Abstract: The minimum-cost multicommodity flow problem involves simultaneously shipping multiple commodities through a single network so that the total flow obeys arc capacity constraints and has minimum cost. Multicommodity flow problems can be expressed as linear programs, and most theoretical and practical algorithms use linear-programming algorithms specialized for the problems' structures. Combinatorial approximation algorithms yield flows with costs slightly larger than the minimum cost and use capacities slightly larger than the given capacities. Theoretically, the running times of these algorithms are much less than that of linear-programming-based algorithms. We combine and modify the theoretical ideas in these approximation algorithms to yield a fast, practical implementation solving the minimum-cost multicommodity flow problem. Experimentally, the algorithm solved our problem instances (to 1% accuracy) two to three orders of magnitude faster than the linear-programming package CPLEX and the linear-programming based multicommodity flow program PPRN.

CS-TR-97-1600

Report Number: CS-TR-98-1602

Institution: Stanford University, Department of Computer Science

Title: Type Systems for Object-Oriented Programming Languages

Author: Fisher, Kathleen

Date: February 1998

Abstract: Object-oriented programming languages (OOPL's) provide important support for today's large-scale software projects. Unfortunately, typed OOPL's have suffered from overly restrictive type systems that have forced programmers to use type-casts to achieve flexibility, a notorious source of hard-to-find bugs. One source of this inflexibility is the conflation of subtyping and inheritance, which reduces potential code reuse. Attempts to fix this rigidity have resulted in unsound type systems, most notably Eiffel's. This thesis develops a sound type system for a formal object-oriented language. It gains flexibility by separating subtyping and inheritance and by supporting method specialization, which allows the types of methods to be refined during inheritance. The lack of such a mechanism is a key source of type-casts in languages like C++. Abstraction primitives in this formal language support a class construct similar to the one found in C++ and Java, explaining the link between inheritance and subtyping: object types that include implementation information are a form of abstract type, and the only way to produce a subtype of an abstract type is via inheritance. Formally, the language is presented as an object calculus. The thesis proves type soundness with respect to an operational semantics via a subject reduction theorem.

CS-TR-98-1602

Report Number: CS-TR-98-1603

Institution: Stanford University, Department of Computer Science

Title: Using Complete Machine Simulation to Understand Computer System Behavior

Author: Herrod, Stephen Alan

Date: March 1998

Abstract: This dissertation describes complete machine simulation, a novel approach to understanding the behavior of modern computer systems. Complete machine simulation models all of the hardware found in modern computer systems, allowing it to investigate the behavior of highly configurable machines running commercial operating systems and important workloads such as database and web servers. Complete machine simulation extends the applicability of traditional machine simulation techniques by addressing speed and data organization challenges. To achieve the speed needed to investigate long-running workloads, complete machine simulation allows an investigator to dynamically adjust the characteristics of its hardware simulation. An investigator can select a high-speed, low-detail simulation setting to quickly pass through uninteresting portions of a workload's execution. Once the workload has reached a more interesting execution state, an investigator can switch to slower, more detailed simulation to obtain behavioral information. To efficiently organize low-level hardware simulation data into more useful information, complete machine simulation provides several mechanisms that incorporate higher-level workload knowledge into the data management process. These mechanisms are efficient and further improve simulation speed by customizing all data collection and reporting to the specific needs of an investigation.

CS-TR-98-1603

Report Number: CS-TR-98-1604

Institution: Stanford University, Department of Computer Science

Title: Theory and Applications of Steerable Functions

Author: Teo, Patrick C.

Date: March 1998

Abstract: A function is called steerable if transformed versions of the function can be expressed using linear combinations of a fixed set of basis functions. In this dissertation, we propose a framework, based on Lie group theory, for studying and constructing functions steerable under any smooth transformation group. Existing analytical approaches to steerability are consistently explained within the framework. The design of a suitable set of basis functions given any arbitrary steerable function is one of the main problems concerning steerable functions. To this end, we have developed two different algorithms. The first algorithm is a symbolic method that derives the minimal set of basis functions automatically given an arbitrary steerable function. In practice, functions that need to be steered might not be steerable with a finite number of basis functions. Moreover, it is often the case that only a small subset of transformations within the group of transformations needs to be considered. In response to these two concerns, the second algorithm computes the optimal set of k basis functions to steer an arbitrary function under a subset of the group of transformations. Lastly, we demonstrate the usefulness of steerable functions in a variety of applications.

CS-TR-98-1604

Report Number: CS-TR-98-1605

Institution: Stanford University, Department of Computer Science

Title: Learning to Surf: Multiagent Systems for Adaptive Web Page Recommendation

Author: Balabanovic, Marko

Date: March 1998

Abstract: Imagine a newspaper personalized for your tastes. Instead of a selection of articles chosen for a general audience by a human editor, a software agent picks items just for you, covering your particular topics of interest. Since there are no journalists at its disposal, the agent searches the Web for appropriate articles. Over time, it uses your feedback on recommended articles to build a model of your interests. This thesis investigates the design of "recommender systems" which create such personalized newspapers. Two research issues motivate this work and distinguish it from approaches usually taken by information retrieval or machine learning researchers. First, a recommender system will have many users, with overlapping interests. How can this be exploited? Second, each edition of a personalized newspaper consists of a small set of articles. Techniques for deciding on the relevance of individual articles are well known, but how is the composition of the set determined? One of the primary contributions of this research is an implemented architecture linking populations of adaptive software agents. Common interests among its users are used both to increase efficiency and scalability, and to improve the quality of recommendations. A novel interface infers document preferences by monitoring user drag-and-drop actions, and affords control over the composition of sets of recommendations. Results are presented from a variety of experiments: user tests measuring learning performance, simulation studies isolating particular tradeoffs, and usability tests investigating interaction designs.

CS-TR-98-1605

Report Number: CS-TR-98-1606

Institution: Stanford University, Department of Computer Science

Title: Associative Caching in Client-Server Databases

Author: Basu, Julie

Date: May 1998

Abstract: Client-server configuration is a popular architecture for modern databases. A traditional assumption in such systems is that clients have limited resources, and query processing is always performed by the server. The server is thus a potential performance bottleneck. To improve the system performance and scalability, today's powerful clients can cache data locally. In this dissertation, we study a new scheme, A*Cache, for associative client-side caching. In contrast to navigational data access using object or page identifiers, A*Cache supports content-based associative access for better data reuse. Query results are stored locally along with their description, and predicate-based reasoning is used to examine and maintain the client cache. Clients execute queries locally if the data is cached, and use update notifications generated by the server for cache maintenance. We first describe the architecture of A*Cache and its transaction execution model. We then develop new optimization techniques for improving the performance of A*Cache. Next, A*Cache performance is investigated through detailed simulation of a client-server database under many different workloads, and compared with other types of caching systems. The simulation results clearly demonstrate the effectiveness of our associative caching scheme for read-only environments, and also for read-write scenarios with moderately high data update probabilities.

CS-TR-98-1606

Report Number: CS-TR-98-1607

Institution: Stanford University, Department of Computer Science

Title: A Network-Centric Design for Relationship-Based Rights Management

Author: Roscheisen, Martin

Date: May 1998

Abstract: Networked environments such as the Internet provide a new platform for communication and information access. In this thesis, we address the question of how to articulate and enforce boundaries of control on top of this platform, while enabling collaboration and sharing in a peer-to-peer environment. We develop the concepts and technologies for a new Internet service layer, called FIRM, that enables structured rights/relationship management. Using a prototype implementation, RManage, we show how FIRM makes it possible to unify rights/relationship management from a user-centered perspective and to support full end-to-end integration of shared control state in network services and users' client applications. We present a network-centric architecture for managing control information, which generalizes previous, client/server-based models to a peer-to-peer environment. Principles and concepts from contract law are used to identify a generic way of representing the shared structure of different kinds of relationships.

CS-TR-98-1607

Report Number: CS-TR-98-1608

Institution: Stanford University, Department of Computer Science

Title: A New Perspective on Partial Evaluation and Use Analysis

Author: Katz, Morris J.

Date: June 1998

Abstract: Partial evaluators are compile time optimizers achieving performance improvements through a program modification technique called specialization. Partial evaluators produce one or more copies, or specializations, of each procedure in a source program in the output program. Specializations are distinguished by being optimized for invocation from call sites with different characteristics, for example, placing certain constraints on argument values. Specializations are created by partially executing procedures, leaving only unexecutable portions as residual code. Symbolic execution can replace variable references by the referenced values, executed primitives by their computed results, and function applications by the bodies of the applied functions, yielding inlining. One core challenge of partial evaluation is selecting what specializations to create. Attempting to produce an infinite number of specializations results in divergence. The termination mechanism of a partial evaluator decides whether or not to symbolically execute a procedure in order to create a new specialization. Creating a termination mechanism that precludes divergence is not difficult. However, crafting a termination mechanism resulting in the production of a sufficient number of appropriate specializations to produce high quality residual code while still terminating all, or most, of the time is quite challenging. This dissertation presents a new type of analysis, called use analysis, forming the basis of a termination mechanism designed to yield a better combination of residual code quality and frequent termination than the current state-of-the-art.

CS-TR-98-1608

Report Number: CS-TR-98-1601

Institution: Stanford University, Department of Computer Science

Title: Formal Verification of Probabilistic Systems

Author: Alfaro, Luca de

Date: June 1998

Abstract: This dissertation presents methods for the formal modeling and specification of probabilistic systems, and algorithms for the automated verification of these systems. Our system models describe the behavior of a system in terms of probability, nondeterminism, fairness and time. The formal specification languages we consider are based on extensions of branching-time temporal logics, and enable the expression of single-event and long-run average system properties. This latter class of properties, not expressible with previous formal languages, includes most of the performance properties studied in the field of performance evaluation, such as system throughput and average response time. Our choice of system models and specification languages has been guided by the goal of providing efficient verification algorithms. The algorithms rely on the theory of Markov decision processes, and exploit a connection between the graph-theoretical and probabilistic properties of these processes. This connection also leads to new results about classical problems, such as an extension to the solvable cases of the stochastic shortest path problem, an improved algorithm for the computation of reachability probabilities, and new results on the average reward problem for semi-Markov decision processes.

CS-TR-98-1601

Report Number: CS-TR-98-1609

Institution: Stanford University, Department of Computer Science

Title: Automated creation of clinical-practice guidelines from decision models

Author: Sanders, Gillian D.

Date: July 1998

Abstract: I developed an approach that allows clinical-practice guideline (CPG) developers to create, disseminate, and tailor CPGs, using decision models (DMs). I propose that guideline developers can use computer-based DMs that reflect global and site-specific data to generate CPGs. Such CPGs are high quality, can be tailored to specific settings, and can be modified automatically as the DM or evidence evolves. I defined conceptual models for representing CPGs and DMs, and formalized a method for mapping between these two representations. I designed a DM annotation editor that queries the decision analyst for missing knowledge. I implemented the ALCHEMIST system that encompasses the conceptual models, mapping algorithm, and the resulting tailoring abilities. I evaluated the design of both conceptual models, and the accuracy of the mapping algorithm. To show that ALCHEMIST produces high-quality CPGs, I had users rate the quality of produced CPGs using a guideline-rating key, and evaluate ALCHEMIST's tailoring abilities. ALCHEMIST automates the DM-to-CPG process and distributes the CPG over the web to allow local developers to apply, tailor, and maintain a global CPG. I argue that my framework is a method for guideline developers to create and maintain automated CPGs, and it thus promotes high-quality and cost-effective health care.

CS-TR-98-1609

Report Number: CS-TR-98-1611

Institution: Stanford University, Department of Computer Science

Title: Approximation Algorithms for Scheduling Problems

Author: Chekuri, Chandra

Date: September 1998

Abstract: This thesis describes efficient approximation algorithms for some NP-Hard deterministic machine scheduling and related problems. We study the objective functions of minimizing makespan (the time to complete all jobs) and minimizing average completion time in a variety of settings described below. 1. Minimizing average completion time and its weighted generalization for single and parallel machine problems. We introduce new techniques that either improve earlier results and/or result in simple and efficient approximation algorithms. In addition to improved results for specific problems, we give a general algorithm that converts an x approximate single machine schedule into a (2x + 2) approximate parallel machine schedule. 2. Minimizing makespan on machines with different speeds when jobs have precedence constraints. We obtain an O(log m) approximation (m is the number of machines) in O(n^3) time. 3. We introduce a class of new scheduling problems that arise from query optimization in parallel databases. The novel aspect consists of modeling communication costs in query execution. We devise algorithms for pipelined operator scheduling. We obtain a PTAS and also simpler O(n log n) time algorithms with ratios of 3.56 and 2.58. 4. Multi-dimensional generalizations of three well known problems in combinatorial optimization: multi-processor scheduling, bin packing, and the knapsack problems. We obtain several approximability and inapproximability results.

CS-TR-98-1611

Report Number: CS-TR-98-1612

Institution: Stanford University, Department of Computer Science

Title: Pleiades Project: Collected Work 1997-1998

Author: Cervesato, Iliano, (editor)

Author: Mitchell, John C., (editor)

Date: October 1998

Abstract: This report collects the papers that were written by the participants of the Pleiades Project and their collaborators from April 1997 to August 1998. Its intent is to give the reader an overview of our accomplishments during this initial phase of the project. Therefore, rather than including complete publications, we chose to reproduce only the first four pages of each paper. In order to satisfy the legitimate curiosity of readers interested in specific articles, each paper can be integrally retrieved from the World-Wide Web through the provided URL. A list of the current publications of the Pleiades Project is accessible at the URL http://theory.stanford.edu/muri/papers.html. Future articles will be posted there as they become available.

CS-TR-98-1612

Report Number: CS-TR-98-1613

Institution: Stanford University, Department of Computer Science

Title: On the synchronization of Poisson processes and queueing networks with service and synchronization nodes.

Author: Prabhakar, Balaji

Author: Bambos, Nicholas

Author: Mountford, Tom

Date: October 1998

Abstract: This paper investigates the dynamics of a synchronization node in isolation, and of networks of service and synchronization nodes. A synchronization node consists of $M$ infinite capacity buffers, where tokens arriving on $M$ distinct random input flows are stored (there is one buffer for each flow). Tokens are held in the buffers until one is available from each flow. When this occurs, a token is drawn from each buffer to form a group-token, which is instantaneously released as a synchronized departure. Under independent Poisson inputs, the output of a synchronization node is shown to converge weakly (and in certain cases strongly) to a Poisson process with rate equal to the minimum rate of the input flows. Hence synchronization preserves the Poisson property, as do superposition, Bernoulli sampling and M/M/1 queueing operations. We then consider networks of synchronization and exponential server nodes with Bernoulli routing and exogenous Poisson arrivals, extending the standard Jackson Network model to include synchronization nodes. It is shown that if the synchronization skeleton of the network is acyclic (i.e. no token visits any synchronization node twice although it may visit a service node repeatedly), then the distribution of the joint queue-length process of only the service nodes is product form (under standard stability conditions) and easily computable. Moreover, the network output flows converge weakly to Poisson processes. Finally, certain results for networks with finite capacity buffers are presented, and the limiting behavior of such networks as the buffer capacities become large is studied.

CS-TR-98-1613

Report Number: CS-TR-98-1614

Institution: Stanford University, Department of Computer Science

Title: Decomposing, Transforming and Composing Diagrams: The Joys of Modular Verification

Author: Alfaro, Luca de

Author: Manna, Z ohar

Author: Sipma, Henny

Date: October 1998

Abstract: The paper proposes a modular framework for the verification of temporal logic properties of systems based on the deductive transformation and composition of diagrams. The diagrams represent abstractions of the modules composing the system, together with information about the environment of the modules. The proof of a temporal specification is constructed with the help of diagram transformation and composition rules, which enable the gradual decomposition of the system into manageable modules, the study of the modules, and the final combination of the diagrams into a proof of the specification. We illustrate our methodology with the modular verification of a database demarcation protocol.

CS-TR-98-1614

Report Number: CS-TR-98-1615

Institution: Stanford University, Department of Computer Science

Title: Using Machine Learning to Improve Information Access

Author: Sahami, Mehran

Date: December 1998

Abstract: We address the problem of topical information space navigation. Specifically, we combine query tools with methods for automatically creating topic taxonomies in order to organize text collections. Our system, named SONIA (Service for Organizing Networked Information Autonomously), is implemented in the Stanford Digital Libraries testbed. It employs several novel probabilistic Machine Learning methods that enable the automatic creation of dynamic topic hierarchies based on the full-text content of documents. First, to generate such topical hierarchies, we employ a novel clustering scheme that outperforms traditional methods used in both Information Retrieval and Probabilistic Reasoning. Furthermore, we develop methods for classifying new articles into such automatically generated, or existing manually generated, hierarchies. Our method explicitly uses the hierarchical relationships between topics to improve classification accuracy. Much of this improvement is derived from the fact that the classification decisions in such a hierarchy can be made by considering only the presence (or absence) of a small number of features (words) in each document. The choice of relevant words is made using a novel information theoretic algorithm for feature selection. The algorithms used in SONIA are also general enough to have been successfully applied to data mining problems in different domains than text.

CS-TR-98-1615

Report Number: CS-TR-99-1617

Institution: Stanford University, Department of Computer Science

Title: Segmentation of Medical Image Volumes Using Intrinsic Shape Information

Author: Shiffman, Smadar

Date: February 1999

Abstract: I propose a novel approach to segmentation of image volumes that requires only a small amount of user intervention and that does not rely on prior global shape models. The approach, intrinsic shape for volume segmentation (IVSeg), comprises two methods. T he first method analyzes isolabel-contour maps to identify salient regions that correspond to major objects. The method detects transitions from within objects into the background by matching isolabel contours that form along the boundaries of objects as a result of multilevel thresholding with a fine partition of the intensity range. The second method searches in the entire sequence for regions that belong to an object that the user selects from one or a few sections. The method uses local overlap criter ia to determine whether regions that overlap in a given direction (coronal, sagittal, or axial) belong to the same object. For extraction of blood vessels, the method derives the criteria dynamically by fitting cylinders to regions in consecutive sections and computing the expected overlap of slices of these cylinders. In a formal evaluation study with CTA data, I showed that IVSeg reduced user editing time by a factor of 5 without affecting the results in any significant way.

CS-TR-99-1617

Report Number: CS-TR-99-1618

Institution: Stanford University, Department of Computer Science

Title: Abstraction-based Deductive-Algorithmic Verification of Reactive Systems

Author: Uribe, Tomas E.

Date: March 1999

Abstract: This thesis presents a framework that combines deductive and algorithmic methods for verifying temporal properties of reactive systems, to allow more automatic verification of general infinite-state systems and the verification of larger finite-state ones. Underlying these methods is the theory of property-preserving assertion-based abstractions, where a finite-state abstraction of the system is deductively justified and algorithmically model checked. After presenting an abstraction framework that accounts for fairness, we describe a method to automatically generate finite-state abstractions. We then show how a number of other verification methods, including deductive rules, (Generalized) Verification Diagrams, and Deductive Model Checking, can also be understood as constructing finite-state abstractions that are model checked. Our analysis leads to a better classification and understanding of these verification methods. Furthermore, it shows how the different abstractions that they construct can be combined. For this, we present an algorithmic Extended Model Checking procedure, which uses all the information that these methods produce, in a finite-state format that can be easily and incrementally combined. Besides a standard safety component, the combined abstractions include extra bounds on fair transitions, well-founded orders, and constrained transition relations for the generation of counterexamples. Thus, our approach minimizes the need for user interaction and maximizes the impact of the available automated deduction and model checking tools. Once proved, verification conditions are re-used as much as possible, leaving the temporal and combinatorial reasoning to automatic tools.

CS-TR-99-1618

Report Number: CS-TR-99-1619

Institution: Stanford University, Department of Computer Science

Title: Intelligent Alarms: Allocating Attention Among Concurrent Processes

Author: Huang, Cecil

Date: April 1999

Abstract: I have developed and evaluated a computable, normative framework for intelligent alarms: automated agents that allocate scarce attention resources to concurrent processes in a globally optimal manner. My approach is decision-theoretic, and relies on Markov decision processes to model time-varying, stochastic systems that respond to externally applied actions. Given a collection of continuing processes and a specified time horizon, my framework computes, for each process: (1) an attention allocation, which reflects how much attention the process is awarded, and (2) an activation price, which reflects the process's priority in receiving the allocated attention amount. I have developed a prototype, Simon, that computes these alarm signals for a simulated ICU. My validity experiments investigate whether sensible input results in sensible output. The results show that Simon produces alarm signals that are consistent with sound clinical judgment. To assess computability, I used Simon to generate alarm signals for an ICU that contained 144 simulated patients; the entire computation took about 2 seconds on a machine with only moderate processing capabilities. I thus conclude that my alarm framework is valid and computable, and therefore is potentially useful in a real-world ICU setting.

CS-TR-99-1619

Report Number: CS-TR-99-1620

Institution: Stanford University, Department of Computer Science

Title: Finding Color and Shape Patterns in Images

Author: Cohen, Scott

Date: May 1999

Abstract: This thesis is devoted to the Earth Mover's Distance (EMD), an edit distance between distributions, and its use within content-based image retrieval (CBIR). The major CBIR problem discussed is the pattern problem: Given an image and a query pattern, determine if the image contains a region which is visually similar to the pattern; if so, find at least one such image region. An important problem that arises in applying the EMD to CBIR is the EMD under transformation (EMD_G) problem: find a transformation of one distribution which minimizes its EMD to another, where the set of allowable transformations G is given. The problem of estimating the size/scale at which a pattern occurs in an image is phrased and efficiently solved as an EMD_G problem. For a large class of transformation sets, we also present a monotonically convergent iteration to find at least a locally optimal transformation. Our pattern problem solution is the SEDL (Scale Estimation for Directed Location) image retrieval system. Three important contributions of SEDL are (1) a general framework for finding both color and shape patterns, (2) the previously mentioned scale estimation algorithm using the EMD, and (3) a directed (as opposed to exhaustive) search strategy.

CS-TR-99-1620

Report Number: CS-TR-99-1621

Institution: Stanford University, Department of Computer Science

Title: Perceptual Metrics for Image Database Navigation

Author: Rubner, Yossi

Date: August 1999

Abstract: The increasing amount of information available in today's world raises the need to retrieve relevant data efficiently. Unlike text-based retrieval, where keywords are successfully used to index into documents, content-based image retrieval poses up front the fundamental questions how to extract useful image features and how to use them for intuitive retrieval. We present a novel approach to the problem of navigating through a collection of images for the purpose of image retrieval, which leads to a new paradigm for image database search. We summarize the appearance of images by distributions of color or texture features, and we define a metric between any two such distributions. This metric, which we call the "Earth Mover's Distance" (EMD), represents the least amount of work that is needed to rearrange the mass is one distribution in order to obtain the other. We show that the EMD matches perceptual dissimilarity better than other dissimilarity measures, and argue that it has many desirable properties for image retrieval. Using this metric, we employ Multi-Dimensional Scaling techniques to embed a group of images as points in a two- or three-dimensional Euclidean space so that their distances reflect image dissimilarities as well as possible. Such geometric embeddings exhibit the structure in the image set at hand, allowing the user to understand better the result of a database query and to refine the query in a perceptually intuitive way. By iterating this process, the user can quickly zoom in to the portion of the image space of interest. We also apply these techniques to other modalities such as mug-shot retrieval.

CS-TR-99-1621

Report Number: CS-TR-99-1622

Institution: Stanford University, Department of Computer Science

Title: Multicommodity and Generalized Flow Algorithms: Theory and Practice

Author: Oldham, Jeffrey David

Date: August 1999

Abstract: We present several simple, practical, and fast algorithms for linear programs, concentrating on network flow problems. Since the late 1980s, researchers developed different combinatorial approximation algorithms for fractional packing problems, obtaining the fastest theoretical running times to solve multicommodity minimum-cost and concurrent flow problems. A direct implementation of these multicommodity flow algorithms was several orders of magnitude slower than solving these problems using a commercial linear programming solver. Through experimentation, we determined which theoretically equivalent constructs are experimentally efficient. Guided by theory, we designed and implemented practical improvements while maintaining the same worst-case complexity bounds. The resulting algorithms solve problems orders of magnitude faster than commercial linear programming solvers and problems an order of magnitude larger. We also present simple, combinatorial algorithms for generalized flow problems. These problems generalize ordinary network flow problems by specifying a flow multiplier \mu(a) for each arc a. Using multipliers permit a flow problem to model transforming one type into another, e.g., currency exchange, and modification of the amount of flow, e.g., water evaporation from canals or accrual of interest in bank accounts. First, we show the generalized shortest paths problem can be solved using existing network flow ideas, i.e., by combining the Bellman-Ford-Moore shortest path framework and Megiddo's parametric search. Second, we combine this algorithm with fractional packing frameworks to yield the first polynomial-time combinatorial approximation algorithms for the generalized versions of the nonnegative-cost minimum-cost flow, concurrent flow, multicommodity maximum flow, and multicommodity nonnegative-cost minimum-cost flow problems. These algorithms show that generalized concurrent flow and multicommodity maximum flow have strongly polynomial approximation algorithms.

CS-TR-99-1622

Report Number: CS-TR-99-1623

Institution: Stanford University, Department of Computer Science

Title: Efficient Maintenance and Recovery of Data Warehouses

Author: Labio, Wilburt Juan

Date: August 1999

Abstract: Data warehouses collect data from multiple remote sources and integrate the information as materialized views in a local database. The materialized views are used to answer queries that analyze the collected data for patterns, and trends. This type of query processing is often called on-line analytical processing (OLAP). The warehouse views must be updated when changes are made to the remote information sources. Otherwise, the answers to OLAP queries are based on stale data. Answering OLAP queries based on stale data is clearly a problem especially if OLAP queries are used to support critical decisions made by the organization that owns the data warehouse. Because the primary purpose of the data warehouse is to answer OLAP queries, only a limited amount of time and/or resources can be devoted to the warehouse update. Hence, we have developed new techniques to ensure that the warehouse update can be done efficiently. Also, the warehouse update is not devoid of failures. Since only a limited amount of time and/or resources are devoted to the warehouse update, it is most likely infeasible to restart the warehouse update from scratch. Thus, we have developed new techniques for resuming failed warehouse updates. Finally, warehouse updates typically transfer gigabytes of data into the warehouse. Although the price of disk storage is decreasing, there will be a point in the ``lifetime" of a data warehouse when keeping and administering all of the collected is unreasonable. Thus, we have investigated techniques for reducing the storage cost of a data warehouse by selectively ``expiring'' information that is not needed.

CS-TR-99-1623

Report Number: CS-TR-99-1624

Institution: Stanford University, Department of Computer Science

Title: Non-blocking Synchronization and System Design

Author: Greenwald, Michael

Date: August 1999

Abstract: Non-blocking synchronization (NBS) has significant advantages over blocking synchronization in areas of fault-tolerance, system structure, portability, and performance. These advantages gain importance with the increased use of parallelism and multiprocessors, and as delays increase relative to processor speed. This thesis demonstrates that non-blocking synchronization is practical as the sole co-ordination mechanism in systems by showing that careful OS design eases implementation of efficient NBS, by demonstrating that DCAS (Double-Compare-and-Swap) is the necessary and sufficient primitive for implementing NBS, and by demonstrating that efficient hardware DCAS is practical for RISC processors. This thesis presents high-performance non-blocking implementations of common data-structures sufficient to implement an operating system kernel. I also present more general algorithms: non-blocking implementations of \casn\ and software transactional memory. Both have overhead proportional to the number of writes, support multi\--objects, and use a DCAS-based contention-reduction technique that is fault-tolerant and OS-independent yet performs as well as the best previously published techniques. I demonstrate that proposed OS implementations of DCAS are inefficient, and propose a design for efficient hardware DCAS specific to the R4000 but generalizable to other RISC processors.

CS-TR-99-1624

Report Number: CS-TR-99-1625

Institution: Stanford University, Department of Computer Science

Title: Pleiades Project: Collected Work 1998-1999

Author: Cervesato, Iliano (editor)

Author: Mitchell, John C. (editor)

Date: August 1999

Abstract: This report collects the papers that were written by the participants of the Pleiades Project and their collaborators from September 1998 to August 1999. Its intent is to give the reader an overview of our accomplishments during this central phase of the project. Therefore, rather than including complete publications, we chose to reproduce only the first four pages of each paper. The papers can be integrally retrieved from the World-Wide Web through the provided URLs. A list of the current publications of the Pleiades Project is accessible at the URL http://theory.stanford.edu/muri/papers.html". Future articles will be posted there as they become available.

CS-TR-99-1625