%R CS-TN-93-1 %Z Wed, 08 Dec 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Incremental Updates of Inverted Lists for Text Document Retrieval %A Tomasic, Anthony %A Garcia-Molina, Hector %A Shoens, Kurt %D December 1993 %X With the proliferation of the world's "information highways" a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index data structure. The index dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/93/1/CS-TN-93-1.pdf %R CS-TN-93-2 %Z Wed, 08 Dec 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Efficacy of GlOSS for the Text Database Discovery Problem %A Gravano, Luis %A Garcia-Molina, Hector %A Tomasic, Anthony %D December 1993 %X The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this problem based on estimating the result size of a query and a database. The method is termed GlOSS--Glossary of Servers Server. The second part of this paper evaluates GlOSS using four different semantics to answer a user's queries. Real users' queries were used in the experiments. We also describe several variations of GlOSS and compare their efficacy. In addition, we analyze the storage cost of our approach to the problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/93/2/CS-TN-93-2.pdf %R CS-TN-93-3 %Z Wed, 08 Dec 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Correct View Update Translations via Containment %A Tomasic, Anthony %D December 1993 %X One approach to the view update problem for deductive databases proves properties of translations - that is, a language specifies the meaning of an update to the intensional database (IDB) in terms of updates to the extensional database (EDB). We argue that the view update problem should be viewed as a question of the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language as a means of specifying translations of updates on the IDB into updates on the EDB. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extension to the translation language. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/93/3/CS-TN-93-3.pdf %R CS-TN-94-6 %Z Tue, 10 May 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Ascribing Beliefs %A Brafman, Ronen I. %A Tennenholtz, Moshe %D December 1993 %X Models of agents that employ formal notions of mental states are useful and often easier to construct than models at the symbol (e.g., programming language) or physical (e.g., mechanical) level. In order to enjoy these benefits, we must supply a coherent picture of mental-level models, that is, a description of the various components of the mental level, their dynamics and their inter-relations. However, these abstractions provide weak modelling tools unless (1) they are grounded in more concrete notions; and (2) we can show when it is appropriate to use them. In this paper we propose a model that grounds the mental state of the agent in its actions. We then characterize a class of {\em goal-seeking\/} agents that can be modelled as having beliefs. This paper emphasizes the task of belief ascription. On one level this is the practical task of deducing an agent's beliefs, and we look at assumptions that can help constrain the set of beliefs an agent can be ascribed, showing cases in which, under these assumptions, this set is unique. We also investigate the computational complexity of this task, characterizing a class of agents to whom belief ascription is tractable. But on a deeper level, our model of belief ascription supplies concrete semantics to beliefs, one that is grounded in an observable notion -- action. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/6/CS-TN-94-6.pdf %R CS-TN-94-7 %Z Tue, 10 May 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Overcoming Unexpected Obstacles %A McCarthy, John %D May 1994 %X The present note illustrates how logical formalizations of common sense knowledge and reasoning can achieve some of the open-endedness of human common sense reasoning. A plan is made to fly from Glasgow to Moscow and is shown by circumscription to lead to the traveller arriving in Moscow. Then a fact about an unexpected obstacle---the traveller losing his ticket---is added without changing any of the previous facts, and the original plan can no longer be shown to work if it must take into account the new fact. However, an altered plan that includes buying a replacement ticket can now be shown to work. The formalism used is a modification of one developed by Vladimir Lifschitz, and I have been informed that the modification isn't correct, and I should go back to Lifschitz's original formalism. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/7/CS-TN-94-7.pdf %R CS-TN-94-8 %Z Tue, 10 May 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Emulating Soft Real-Time Scheduling Using Traditional Operating System Schedulers %A Adelberg, Brad %A Garcia-Molina, Hector %A Kao, Ben %D May 1994 %X Real-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time problems, a traditional operating system may be the development platform of choice. This paper addresses methods of emulating real-time scheduling algorithms on top of standard time-share schedulers. We examine (through simulations) three strategies for priority assignment within a traditional multi-tasking environment. The results show that the emulation algorithms are comparable in performance to the real-time algorithms and in some instances outperform them. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/8/CS-TN-94-8.pdf %R CS-TN-94-9 %Z Tue, 07 Jun 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reasoning About The Effects of Communication On Beliefs %A Young, R. Michael %D June 1994 %X Perrault has presented a formal framework describing communicative action and the change of mental state of agents participating in the performance of speech acts. This approach, using an axiomatization in default logic, suffers from several drawbacks dealing with the persistence of beliefs and ignorance over time. We provide an example which illustrates these drawbacks and then present a second approach which avoids these problems. This second approach, an axiomatization of belief transfer in a nonmonotonic modal logic of belief and time, is a reformulation of Perrault's main ideas within a logic which uses an ignorance-based semantics to ensure that ignorance is maximized. We present an axiomatization of this logic and describe the associated techniques for nonmonotonic reasoning. We then show how this approach deals with inter-agent communications in a more intuitively appealing way. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/9/CS-TN-94-9.pdf %R CS-TN-94-10 %Z Tue, 12 Jul 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Precision and Recall of GlOSS Estimators for Database Discovery %A Tomasic, Anthony %A Gravano, Luis %A Garcia-Molina, Hector %D July 1994 %X The availability of large numbers of network information sources has led to a new problem: finding which text databases (out of perhaps thousands of choices) are the most relevant to a query. We call this the text-database discovery problem. Our solution to this problem, GlOSS--Glossary-Of-Servers Server, keeps statistics on the available databases to decide which ones are potentially useful for a given query. In this paper we present different query-result size estimators for GlOSS and we evaluate them with metrics based on the precision and recall concepts of text-document information-retrieval theory. Our generalization of these metrics uses different notions of the set of relevant databases to define different query semantics. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/10/CS-TN-94-10.pdf %R CS-TN-94-11 %Z Mon, 19 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Exact and Approximate Cut Covers of Graphs %A Motwani, Rajeev %A Naor, Joseph %D September 1994 %X We consider the minimum cut cover problem for a simple, undirected graphs G(V,E): find a minimum cardinality family of cuts C in G such that each edge e in E belongs to at least one cut c in C. The cardinality of the minimum cut cover of G is denoted by c(G). The motivation for this problem comes from testing of electronic component boards. Loulou showed that the cardinality of a minimum cut cover in the complete graph is precisely the ceiling of log n. However, determining the minimum cut cover of an arbitrary graph was posed as an open problem by Loulou. In this note we settle this open problem by showing that the cut cover problem is closely related to the graph coloring problem, thereby also obtaining a simple proof of Loulou's main result. We show that the problem is NP-complete in general, and moreover, the approximation version of this problem still remains NP-complete. Some other observations are made, all of which follow as a consequence of the close connection to graph coloring. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/11/CS-TN-94-11.pdf %R CS-TN-94-12 %Z Mon, 10 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Cross-Validated C4.5: Using Error Estimation for Automatic Parameter Selection %A John, George H. %D October 1994 %X Machine learning algorithms for supervised learning are in wide use. An important issue in the use of these algorithms is how to set the parameters of the algorithm. While the default parameter values may be appropriate for a wide variety of tasks, they are not necessarily optimal for a given task. In this paper, we investigate the use of cross-validation to select parameters for the C4.5 decision tree learning algorithm. Experimental results on five datasets show that when cross-validation is applied to selecting an important parameter for C4.5, the accuracy of the induced trees on independent test sets is generally higher than the accuracy when using the default parameter value. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/12/CS-TN-94-12.pdf %R CS-TN-94-13 %Z Tue, 18 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Formalizing Context (Expanded Notes) %A McCarthy, John %A Buvac, Sasa %D October 1994 %X These notes discuss formalizing contexts as first class objects. The basic relation is Ist(c,p). It asserts that the proposition p is true in the context c. The most important formulas relate the propositions true in different contexts. Introducing contexts as formal objects will permit axiomatizations in limited contexts to be expanded to transcend the original limitations. This seems necessary to provide AI programs using logic with certain capabilities that human fact representation and human reasoning possess. Fully implementing transcendence seems to require further extensions to mathematical logic, i.e. beyond the nonmonotonic inference methods first invented in AI and now studied as a new domain of logic. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/13/CS-TN-94-13.pdf %R CS-TN-94-14 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Generalized Projections: A Powerful Query-Optimization Technique %A Harinarayan, Venky %A Gupta, Ashish %D November 1994 %X In this paper we introduce generalized projections (GP). GPs capture aggregations, groupbys, conventional projection with duplicate elimination (Distinct), and duplicate preserving projections. We develop a technique for pushing GPs down query trees of Select-project-join queries that may use aggregations like Max, Sum, etc. and that use arbitrary functions in their selection conditions. Our technique pushes down to the lowest levels of a query tree aggregation computation, duplicate elimination, and function computation. The technique also creates aggregations in queries that did not use aggregation to begin with. Our technique is important since applying aggregations early in query processing can provide significant performance improvements. In addition to their value in query optimization, generalized projections unify set and duplicate semantics, and help better understand aggregations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/14/CS-TN-94-14.pdf %R CS-TN-94-15 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reasoning Theories: Towards an Architecture for Open Mechanized Reasoning Systems %A Giunchiglia, Fausto %A Pecchiari, Paolo %A Talcott, Carolyn %D December 1994 %X Our ultimate goal is to provide a framework and a methodology which will allow users, and not only system developers, to construct complex reasoning systems by composing existing modules, or to add new modules to existing systems, in a ``plug and play'' manner. These modules and systems might be based on different logics; have different domain models; use different vocabularies and data structures; use different reasoning strategies; and have different interaction capabilities. This paper makes two main contributions towards our goal. First, it proposes a general architecture for a class of reasoning modules and systems called Open Mechanized Reasoning Systems (OMRSs). An OMRS has three components: a reasoning theory component which is the counterpart of the logical notion of formal system, a control component which consists of a set of inference strategies, and an interaction component which provides an OMRS with the capability of interacting with other systems, including OMRSs and human users. Second, it develops the theory underlying the reasoning theory component. This development is motivated by an analysis of state of the art systems. The resulting theory is then validated by using it to describe the integration of the linear arithmetic module into the simplification process of the Boyer-Moore system, NQTHM. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/94/15/CS-TN-94-15.pdf %R CS-TN-95-16 %Z Mon, 13 Mar 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Meaning of Negative Premises in Transition System Specifications II %A Glabbeek, R.J. van %D February 1995 %X This paper reviews several methods to associate transition relations to transition system specifications with negative premises in Plotkin's structural operational style. Besides a formal comparison on generality and relative consistency, the methods are also evaluated on their taste in determining which specifications are meaningful and which are not. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/16/CS-TN-95-16.pdf %R CS-TN-95-17 %Z Mon, 13 Mar 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Ntyft/ntyxt Rules Reduce to Ntree Rules %A Glabbeek, R.J. van %D February 1995 %X Groote and Vaandrager introduced the tyft/tyxt format for Transition System Specifications (TSSs), and established that for each TSS in this format that is well-founded, the bisimulation equivalence it induces is a congruence. In this paper, we construct for each TSS in tyft/tyxt format an equivalent TSS that consists of tree rules only. As a corollary we can give an affirmative answer to an open question, namely whether the well-foundedness condition in the congruence theorem for tyft/tyxt can be dropped. These results extend to tyft/tyxt with negative premises and predicates. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/17/CS-TN-95-17.pdf %R CS-TN-95-18 %Z Mon, 13 Mar 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Effective models of polymorphism, subtyping and recursion %A Mitchell, John %A Viswanathan, Ramesh %D March 1995 %X We develop a class of models of polymorphism, subtyping and recursion based on a combination of traditional recursion theory and simple domain theory. A significant property of our primary model is that types are coded by natural numbers using any index of their supremum operator. This leads to a distinctive view of polymorphic functions that has many of the usual parametricity properties. It also gives a distinctive but entirely coherent interpretation of subtyping. An alternate construction points out some peculiarities of computability theory based on natural number codings. Specifically, the polymorphic fixed point is computable by a single algorithm at all types when we construct the model over untyped call-by-value lambda terms, but not when we use Godel numbers for computable functions. This is consistent with trends away from natural numbers in the field of abstract recursion theory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/18/CS-TN-95-18.pdf %R CS-TN-95-19 %Z Mon, 20 Mar 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast Approximation Algorithm for Minimum Cost Multicommodity Flow %A Kamath, Anil %A Palmon, Omri %A Plotkin, Serge %D March 1995 %X Minimum-cost multicommodity flow problem is one of the classical optimization problems that arises in a variety of contexts. Applications range from finding optimal ways to route information through communication networks to VLSI layout. In this paper, we describe an efficient deterministic approximation algorithm, which given that there exists a multicommodity flow of cost $B$ that satisfies all the demands, produces a flow of cost at most $(1+\delta)B$ that satisfies $(1-\epsilon)$-fraction of each demand. For constant $\delta$ and $\epsilon$, our algorithm runs in $O^*(kmn^2)$ time, which is an improvement over the previously fastest (deterministic) approximation algorithm for this problem due to Plotkin, Shmoys, and Tardos, that runs in $O^*(k^2m^2)$ time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/19/CS-TN-95-19.pdf %R CS-TN-95-20 %Z Mon, 20 Mar 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interior Point Algorithms for Exact and Approximate Solution of Multicommodity Flow Problems %A Kamath, Anil %A Palmon, Omri %D March 1995 %X In this paper, we present a new interior-point based polynomial algorithm for the multicommodity flow problem and its variants. Unlike all previously known interior point algorithms for multicommodity flow that have the same complexity for approximate and exact solutions, our algorithm improves running time in the approximate case by a polynomial factor. For many cases, the exact bounds are better as well. Instead of using the conventional linear programming formulation for the multicommodity flow problem, we model it as a quadratic optimization problem which is solved using interior-point techniques. This formulation allows us to exploit the underlying structure of the problem and to solve it efficiently. The algorithm is also shown to have improved stability properties. The improved complexity results extend to minimum cost multicommodity flow, concurrent flow and generalized flow problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/20/CS-TN-95-20.pdf %R CS-TN-95-21 %Z Tue, 02 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies %A Gravano, Luis %A Garcia-Molina, Hector %D April 1995 %X As large numbers of text databases have become available on the Internet, it is getting harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work, which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/21/CS-TN-95-21.pdf %R CS-TN-95-22 %Z Wed, 09 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Combining Register Allocation and Instruction Scheduling %A Motwani, Rajeev %A Palem, Krishna V. %A Sarkar, Vivek %A Reyen, Salem %D August 1995 %X We formulate combined register allocation and instruction scheduling within a basic block as a single optimization problem, with an objective cost function that more directly captures the primary measure of interest in code optimization --- the completion time of the last instruction. We show that although a simple instance of the combined problem is NP-hard, the combined problem is much easier to solve approximately than graph coloring, which is a common formulation used for the register allocation phase in phase-ordered solutions. Using our framework, we devise a simple and effective heuristic algorithm for the combined problem. This algorithm is called the (alpha,beta)-Combined Heuristic; parameters alpha and beta provide relative weightages for controlling register pressure and instruction parallelism considerations in the combined heuristic. Preliminary experiments indicate that the combined heuristic yields improvements in the range of 16-21% compared to the phase-ordered solutions, when the input graphs contain balanced amount of register pressure and instruction-level parallelism. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/22/CS-TN-95-22.pdf %R CS-TN-95-23 %Z Thu, 10 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Dynamic Maintenance of Kinematic Structures %A Halperin, Dan %A Latombe, Jean-Claude %A Motwani, Rajeev %D August 1995 %X We study the following dynamic data structure problem. Given a collection of rigid bodies moving in three-dimensional space and hinged together in a kinematic structure, our goal is to maintain a data structure that describes certain geometric features of these bodies, and efficiently update it as the bodies move. This data structure problem seems to be fundamental and it comes up in a variety of applications such as conformational search in molecular biology, simulation of hyper-redundant robots, collision detection and computer animation. In this note we present preliminary results on a few variants of the problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/23/CS-TN-95-23.pdf %R CS-TN-95-24 %Z Mon, 09 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Approximation Algorithms for $k$-Delivery TSP %A Chalasani, Prasad %A Motwani, Rajeev %D August 1995 %X We provide O(1) approximation algorithms for the following NP-hard problem called k-Delivery TSP: We have at our disposal a truck of capacity k, and there are n depots and n customers at various locations in some metric space, and exactly one item (all of which are identical) at each depot. We want to find an optimal tour using the truck to deliver one item to each customer. Our algorithms run in time polynomial in both n and k. The 1-Delivery problem is one of finding an optimal tour that alternately visits depots and customers. For this case we use matroid intersection to show a polynomial-time 2-approximation algorithm, improving upon a factor 2.5 algorithm of Anily and Hassin. Using this approximation combined with certain lower bounding arguments we show a factor 11.5 approximation to the optimal k-Delivery tour. For the infinite k case we show a factor 2 approximation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/24/CS-TN-95-24.pdf %R CS-TN-95-25 %Z Thu, 05 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complexity Measures for Assembly Sequences %A Goldwasser, Michael %A Latombe, Jean-Claude %A Motwani, Rajeev %D October 1995 %X Our work examines various complexity measures for two-handed assembly sequences. Many present assembly sequencers take a description of a product and output a valid assembly sequence. For many products there exists an exponentially large set of valid sequences, and a natural goal is to use automated systems to attempt to select wisely from the choices. Since assembly sequencing is a preprocessing phase for a long and expensive manufacturing process, any work towards finding a ``better'' assembly plan is of great value when it comes time to assemble the physical product in mass quantities. We take a step in this direction by introducing a formal framework for studying the optimization of several complexity measures. This framework focuses on the combinatorial aspect of the family of valid assembly sequences, while temporarily separating out the specific geometric assumptions inherent to the problem. With an exponential number of possibilities, finding the true optimal cost solution seems hard. In fact in the most general case, our results suggest that even finding an approximate solution is hard. Future work is directed towards using this model to study how the original geometric assumptions can be reintroduced to prove stronger approximation results. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/25/CS-TN-95-25.pdf %R CS-TN-95-26 %Z Thu, 05 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Mediation and Software Maintenance %A Wiederhold, Gio %D October 1995 %X This paper reports on recent work and directions in modern software architectures and their formal models with respect to software maintenance. Related earlier work, now entering practice, provides automatic creation of object structures for customer applications using such models and their algebra, and we will summarize that work. Our focus on maintenance intends to attack the most costly and frustrating aspect in dealing with large-scale software systems: keeping them up-to-date and responsive to user needs in changing environments. We introduce the concept of domain-specific mediators to partition the maintenance effort. Mediators are autonomous modules which create information objects out of source data. These modules are placed into an intermediate layer, bridging clients and servers. These mediators contain knowledge required to establish and maintain services in a coherent domain. A mediated architecture can reduce the cost growth of maintenance to a near-linear function of system size, whereas current system architectures have quadratic factors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/26/CS-TN-95-26.pdf %R CS-TN-95-27 %Z Fri, 27 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Comparing Very Large Database Snapshots %A Labio, Wilburt Juan %A Garcia-Molina, Hector %D May 1995 %X Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not "very different". The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/95/27/CS-TN-95-27.pdf %R CS-TN-96-28 %Z Tue, 09 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Common Framework for Steerability, Motion Estimation and Invariant Feature Detection %A Hel-Or, Yacov %A Teo, Patrick C. %D January 1996 %X Many problems in computer vision and pattern recognition involve groups of transformations. In particular, motion estimation, steerable filter design and invariant feature detection are often formulated with respect to a particular transformation group. Traditionally, these problems have been investigated independently. From a theoretical point of view, however, the issues they address are similar. In this paper, we examine these common issues and propose a theoretical framework within which they can be discussed in concert. This framework is based on constructing a more natural representation of the image for a given transformation group. Within this framework, many existing techniques of motion estimation, steerable filter design and invariant feature detection appear as special cases. Furthermore, several new results are direct consequences of this framework. First, a canonical decomposition of all filters that can be steered with respect to any one-parameter group and any multi-parameter Abelian group is proposed. Filters steerable under various subgroups of the affine group are also tabulated. Second, two approximation techniques are suggested to deal with filters that cannot be steered exactly. Approximating steerable filters can also be used for motion estimation. Third, within this framework, invariant features can easily be constructed using traditional techniques for computing point invariance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/28/CS-TN-96-28.pdf %R CS-TN-96-30 %Z Tue, 16 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Development of Type Systems for Object-Oriented Languages %A Fisher, Kathleen %A Mitchell, John C. %D January 1996 %X This paper, which is partly tutorial in nature, summarizes some basic research goals in the study and development of typed object-oriented programming languages. These include both immediate repairs to problems with existing languages and the long-term development of more flexible and expressive, yet type-safe, approaches to program organization and design. The technical part of the paper is a summary and comparison of three object models from the literature. We conclude by discussing approaches to selected research problems, including changes in the type of a method from super class to sub class and the use of types that give information about the implementations as well as the interfaces of objects. Such implementation types seem essential for adequate typing of binary operations on objects, for example. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/30/CS-TN-96-30.pdf %R CS-TN-96-31 %Z Tue, 16 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Classes = Objects + Data Abstraction %A Fisher, Kathleen %A Mitchell, John C. %D January 1996 %X We describe a type-theoretic foundation for object systems that include ``interface types'' and ``implementation types.'' Our approach begins with a basic object calculus that provides a notion of object, method lookup, and object extension (an object-based form of inheritance). In this calculus, the type of an object gives its interface, as a set of methods and their types, but does not imply any implementation properties. We extend this object calculus with a higher-order form of data abstraction that allows us to declare supertypes of an abstract type and a list of methods guaranteed not to be present. This results in a flexible framework for studying and improving practical programming languages where the type of an object gives certain implementation guarantees, such as would be needed to statically determine the offset of a method or safely implement binary operations without exposing the internal representation of objects. We prove type soundness for the entire language using operational semantics and an analysis of typing derivations. One insight that is an immediate consequences of our analysis is a principled, type-theoretic explanation (for the first time, as far as we know) of the link between subtyping and inheritance in C++, Eiffel and related languages. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/31/CS-TN-96-31.pdf %R CS-TN-96-29 %Z Tue, 16 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Optimization Complexity of Constraint Satisfaction Problems %A Khanna, Sanjeev %A Sudan, Madhu %D December 1995 %X In 1978, Schaefer considered a subclass of languages in NP and proved a ``dichotomy theorem'' for this class. The subclass considered were problems expressible as ``constraint satisfaction problems'', and the ``dichotomy theorem'' showed that every language in this class is either in P, or is NP-hard. This result is in sharp contrast to a result of Ladner, which shows that such a dichotomy does not hold for NP, unless NP=P. We consider optimization version of the dichotomy question and show an analog of Schaefer's result for this case. More specifically, we consider optimization version of ``constraint satisfaction problems'' and show that every optimization problem in this class is either solvable exactly in P, or is MAX SNP-hard, and hence not approximable to within some constant factor in polynomial time, unless NP=P. This result does not follow directly from Schaefer's result. In particular, the set of problems that turn out to be hard in this case, is quite different from the set of languages which are shown hard by Schaefer's result. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/29/CS-TN-96-29.pdf %R CS-TN-96-32 %Z Mon, 04 Mar 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Design of Multi-Parameter Steerable Functions Using Cascade Basis Reduction %A Teo, Patrick C. %A Hel-Or, Yacov %D March 1996 %X A new cascade basis reduction method of computing the optimal least-squares set of basis functions steering a given function is presented. The method combines the Lie group-theoretic and the singular value decomposition approaches in such a way that their respective strengths complement each other. Since the Lie group-theoretic approach is used, the set of basis and steering functions computed can be expressed analytically. Because the singular value decomposition method is used, this set of basis and steering functions is optimal in the least-squares sense. Furthermore, the computational complexity in designing basis functions for transformation groups with large numbers of parameters is significantly reduced. The efficiency of the cascade basis reduction method is demonstrated by designing a set of basis functions that steers a Gabor function under the four-parameter linear transformation group. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/32/CS-TN-96-32.pdf %R CS-TN-96-34 %Z Wed, 15 May 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast Estimation of Diameter and Shortest Paths (without Matrix Multiplication) %A Aingworth, Donald %A Chekuri, Chandra %A Indyk, Piotr %A Motwani, Rajeev %D May 1996 %X In the recent past, there has been considerable progress in devising algorithms for the all-pairs shortest paths problem running in time significantly smaller than the obvious time bound of O(n^3). Unfortunately, all the new algorithms are based on fast matrix multiplication algorithms that are notoriously impractical. Our work is motivated by the goal of devising purely combinatorial algorithms that match these improved running times. Our results come close to achieving this goal, in that we present algorithms with a small additive error in the length of the paths obtained. Our algorithms are easy to implement, have the desired property of being combinatorial in nature, and the hidden constants in the running time bound are fairly small. Our main result is an algorithm which solves the all-pairs shortest paths problem in unweighted, undirected graphs with an additive error of 2 in time O(n^{2.5} sqrt{log n}). This algorithm returns actual paths and not just the distances. In addition, we give more efficient algorithms with running time O(n^{1.5} sqrt{k log n} + n^2 log^2 n) for the case where we are only required to determine shortest paths between k specified pairs of vertices rather than all pairs of vertices. The starting point for all our results is an O(m sqrt{n log n}) algorithm for distinguishing between graphs of diameter 2 and 4, and this is later extended to obtaining a ratio 2/3 approximation to the diameter in time O(m sqrt{n log n} + n^2 log n). Unlike in the case of all-pairs shortest paths, our results for approximate diameter computation can be extended to the case of directed graphs with arbitrary positive real weights on the edges. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/34/CS-TN-96-34.pdf %R CS-TN-96-33 %Z Mon, 15 Apr 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Computational Group-Theoretic Approach to Steerable Functions %A Teo, Patrick C. %A Hel-Or, Yacov %D April 1996 %X We present a computational, group-theoretic approach to steerable functions. The approach is group-theoretic in that the treatment involves continuous transformation groups for which elementary Lie group theory may be applied. The approach is computational in that the theory is constructive and leads directly to a procedural implementation. For functions that are steerable with $n$ basis functions under a $k$-parameter group, the procedure is efficient in that at most $nk+1$ iterations of the procedure are needed to compute all the basis functions. Furthermore, the procedure is guaranteed to return the minimum number of basis functions. If the function is not steerable, a numerical implementation of the procedure could be used to compute basis functions that approximately steer the function over a range of parameters. Examples of both applications are described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/33/CS-TN-96-33.pdf %R CS-TN-96-35 %Z Mon, 10 Jun 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Calculus for Concurrent Objects %A DiBlasio, Paolo %A Fisher, Kathleen %D June 1996 %X This paper presents an imperative and concurrent extension of the functional object-oriented calculus described in [FHM94]. It belongs to the family of so-called prototype-based object-oriented languages, in which objects are created from existing ones via the inheritance primitives of object extension and method override. Concurrency is introduced through the identification of objects and processes. To our knowledge, the resulting calculus is the first concurrent object calculus to be studied. We define an operational semantics for the calculus via a transition relation between configurations, which represent snapshots of the run-time system. Our static analysis includes a type inference system, which statically detects message-not-understood errors, and an effect system, which guarantees that synchronization code, specified via guards, is side-effect free. We present a subject reduction theorem, modified to account for imperative and concurrent features, and type and effect soundness theorems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/35/CS-TN-96-35.pdf %R CS-TN-96-36 %Z Thu, 13 Jun 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficient Snapshot Differential Algorithms for Data Warehousing %A Garcia-Molina, Hector %A Labio, Wilburt Juan %D June 1996 %X Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this em snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a {\em window} algorithm that works very well if the snapshots are not ``very different.'' The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/36/CS-TN-96-36.pdf %R CS-TN-96-37 %Z Wed, 09 Oct 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Improved Lower Bound for Load Balancing of Tasks with Unknown Duration %A Plotkin, Serge %A Ma, Yuan %D October 1996 %X Suppose there are n servers and a sequence of tasks, each of which arrives in an on-line fashion and can be handled by a subset of the servers. The level of the service required by a task is known upon arrival, but the duration of the service is unknown. The on-line load balancing problem is to assign each task to an appropriate server so that the maximum load on the servers is minimized. The best known lower bound on the competitive ratio for this problem was Sqrt(n). However, the argument used to prove this lower bound used a sequence of tasks with exponential duration, and therefore this lower bound does not preclude an algorithm with a competitive ratio that is polylogarithmic in T, the maximum task duration. In this paper we prove a lower bound of sqrt(T), thereby proving that a competitive ratio that is polylogarithmic in T is impossible. This should be compared to the analogous case for known-duration tasks, where it is possible to achieve competitive ratio that is logarithmic in T. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/37/CS-TN-96-37.pdf %R CS-TN-96-38 %Z Tue, 17 Dec 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Intractability of Assembly Sequencing: Unit Disks in the Plane %A Goldwasser, Michael %A Motwani, Rajeev %D December 1996 %X We consider the problem of removing a given disk from a collection of unit disks in the plane. At each step, we allow a disk to be removed by a collision-free translation to infinity, and the goal is to access a given disk using as few steps as possible. Recently there has been a focus on optimizing assembly sequences over various cost measures, however with very limited algorithmic success. We explain this lack of success, proving strong inapproximability results in this simple geometric setting. These inapproximability results, to the best of our knowledge, are the strongest hardness results known for any purely combinatorial problem in a geometric setting. As a stepping stone, we study the approximability of scheduling with AND/OR precedence constraints. The Disks problem can be formulated as a scheduling problem where the order of removals is to be scheduled. Before scheduling a disk to be removed, a path must be cleared, and so we get precedence constraints on the tasks; however, the form of such constraints differs from traditional scheduling in that there is a choice of which path to clear. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/38/CS-TN-96-38.pdf %R CS-TN-96-39 %Z Wed, 18 Dec 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complexity Measures for Assembly Sequences %A Goldwasser, Michael %A Motwani, Rajeev %D December 1996 %X Our work examines various complexity measures for two-handed assembly sequences. Although there has been a great deal of algorithmic success for finding feasible assembly sequences, there has been very little success towards optimizing the costs of sequences. We attempt to explain this lack of progress, by proving the inherent difficulty in finding optimal, or even near-optimal, assembly sequences. We begin by introducing a formal framework for studying the optimization of several complexity measures. We consider a variety of different settings and natural cost measures for assembly sequences. Following which, we define a graph-theoretic problem which is a generalization of assembly sequencing. For our virtual assembly sequencing problem we are able to use techniques common to the theory of approximability to prove the hardness of finding even near-optimal sequences for most cost measures in our generalized framework. Of course, hardness results in our generalized framework do not immediately carry over to the original geometric problems. We continue by realizing several of these hardness results in rather simple geometric settings, proving the difficulty of some of the original problems. These inapproximability results, to the best of our knowledge, are the strongest hardness results known for a purely combinatorial problem in a geometric setting. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/96/39/CS-TN-96-39.pdf %R CS-TN-97-40 %Z Tue, 28 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Content Ratings, and Other Third-Party Value-Added Information: Defining an Enabling Platform %A Roscheisen, Martin %A Winograd, Terry %A Paepcke, Andreas %D January 1997 %X This paper describes the ComMentor annotation architecture and its usages, with a specific emphasis on the content rating application. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/40/CS-TN-97-40.pdf %R CS-TN-97-41 %Z Mon, 03 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reducing Initial Latency in a Multimedia Storage System %A Chang, Edward %A Garcia-Molina, Hector %D February 1997 %X A multimedia server delivers presentations (e.g., videos, movies, games), providing high bandwidth and continuous real-time delivery. In this paper we present techniques for reducing the initial latency of presentations, i.e., for reducing the time between the arrival of a request and the start of the presentation. Traditionally, initial latency has not received much attention. This is because one major application of multimedia servers is ``movies on demand'' where a delay of a few minutes before a new multi-hour movie starts is acceptable. However, latency reduction is important in interactive applications such as playing of video games and browsing of multimedia documents. Latency reduction is also crucial to improve access performance to media data in a multimedia database system. Various latency reduction schemes are proposed and analyzed, and their performance compared. We show that our techniques can significantly reduce (almost eliminate in some cases) initial latency without adversely affecting throughput. Moreover, a novel on-disk partial data replication scheme that we propose proves to be far more cost effective than any other previous attempts in reducing initial latency. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/41/CS-TN-97-41.pdf %R CS-TN-97-42 %Z Mon, 03 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T From User Access Patterns to Dynamic Hypertext Linking %A Yan, Tak Woon %A Jacobsen, Matthew %A Garcia-Molina, Hector %A Dayal, Umeshwar %D February 1997 %X This paper describes an approach for automatically classifying visitors of a web site according to their access patterns. User access logs are examined to discover clusters of users that exhibit similar information needs; e.g., users that access similar pages. This may result in a better understanding of how users visit the site, and lead to an improved organization of the hypertext documents for navigational convenience. More interestingly, based on what categories an individual user falls into, we can dynamically suggest links for him to navigate. In this paper, we describe the overall design of a system that implements these ideas, and elaborate on the preprocessing, clustering, and dynamic link suggestion tasks. We present some experimental results generated by analyzing the access log of a web site. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/42/CS-TN-97-42.pdf %R CS-TN-97-43 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Report on the May 18-19 1995 IITA Digital Libraries Workshop: Final Draft for Participant Review, August 4, 1995 %A Lynch, Clifford %A Garcia-Molina, Hector %D February 1997 %X This report summarizes the outcomes of a workshop on Digital Libraries held under the auspices of the US Government's Information Infrastructure Technology and Applications (IITA) Working Group in Reston, Virginia on May 18-19, 1995. The objective of the workshop was to refine the research agenda for digital libraries with specific emphasis on scaling and interoperability and the infrastructure needed to enable this research agenda. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/43/CS-TN-97-43.pdf %R CS-TN-97-44 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Addressing Heterogeneity in the Networked Information Environment %A Baldonado, Michelle Q. Wang %A Cousins, Steve B. %D February 1997 %X Several ongoing Stanford University Digital Library projects address the issue of heterogeneity in networked information environments. A networked information environment has the following components: users, information repositories, information services, and payment mechanisms. This paper describes three of the heterogeneity-focused Stanford projects-InfoBus, REACH, and DLITE. The InfoBus project is at the protocol level, while the REACH and DLITE projects are both at the conceptual model level. The InfoBus project provides the infrastructure necessary for accessing heterogeneous services and utilizing heterogeneous payment mechanisms. The REACH project sets forth a uniform conceptual model for finding information in networked information repositories. The DLITE project presents a general task-based strategy for building user interfaces to heterogeneous networked information services. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/44/CS-TN-97-44.pdf %R CS-TN-97-45 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Information Needs in Technical Work Settings and Their Implications for the Design of Computer Tools %A Paepcke, Andreas %D February 1997 %X We interviewed information workers in multiple technical areas of a large, diverse company, and we describe some of the unsatisfied information needs we observed during our study. Two clusters of issues are described. The first covers how loosely coupled work groups use and share information. We show the need to structure information for multiple, partly unanticipated uses. We show how the construction of information compounds helps users accomplish some of this restructuring, and we explain how structuring flexibility is also required because of temperamental differences among users. The second cluster of issues revolves around collections of tightly coupled work groups. We show that information shared within such groups differs from information shared across group boundaries. We present the barriers to sharing which we saw operating both within groups and outside, and we explain the function of resource and contact broker which evolved in the settings we examined. For each of these issues we propose implications for information tool design. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/45/CS-TN-97-45.pdf %R CS-TN-97-46 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Proposal for Basing our Protocols on a General Information Exchange Level %A Winograd, Terry %D February 1997 %X In order to build our protocols in a way that will provide for long term growth and extensibility, we should define a standard level on top of the CORBA/ILU level, to deal with the management of information across objects on different servers. It is based on a generalization of the protocols we have been working on. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/46/CS-TN-97-46.pdf %R CS-TN-97-47 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Modes of Information Integration %A Winograd, Terry %D February 1997 %X In order to better understand the different approaches to digital libraries, I compared a number of existing and proposed systems and developed a taxonomy that can be used in identifying the different tradeoffs they make in the overall design space. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/47/CS-TN-97-47.pdf %R CS-TN-97-48 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Conceptual Models for Comparison of Digital Library Systems and Approaches %A Winograd, Terry %D February 1997 %X This document is a working paper that grew out of the discussions at the Digital Libraries joint projects meeting in Washington on Nov. 8-9, 1994. It is intended as a first rough cut at a conceptual framework for understanding the significant differences among systems and ideas, so that we can better decide where to work for interoperability and where to take complementary approaches. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/48/CS-TN-97-48.pdf %R CS-TN-97-49 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Why You Won't Be Buying and Selling Information Yourself %A Winograd, Terry %D February 1997 %X A large part of the economics of electronic publishing of library materials will be based on site licensing, not on per-use fees. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/49/CS-TN-97-49.pdf %R CS-TN-97-50 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lightweight Objects for the Digital Library %A Winograd, Terry %D February 1997 %X We have been looking at the potentials for integrating Xerox's GAIA system into the INFObus architecture. Ramana suggested we look at Xerox/Novell's Document Enhanced Networking (DEN) specification which incorporates some of the GAIA ideas in a product. I went through the spec, and had some realizations about what we are trying to do with the INFObus that I thought would be generally useful. The latter part of this message is a proposal for our own architecture. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/50/CS-TN-97-50.pdf %R CS-TN-97-51 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Proxy Is Where It's At! %A Winograd, Terry %D February 1997 %X Soon, more than 90% of access to the Internet will be through commercial proxies. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/51/CS-TN-97-51.pdf %R CS-TN-97-52 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Adaptive Agent for Automated Web Browsing %A Balabanovic, Marko %A Shoham, Yoav %A Yun, Yeogirl %D February 1997 %X The current exponential growth of the Internet precipitates a need for new tools to help people cope with the volume of information. To complement recent work on creating searchable indexes of the World-Wide Web and systems for filtering incoming e-mail and Usenet news articles, we describe a system which learns to browse the Internet on behalf of a user. Every day it presents a selection of interesting Web pages. The user evaluates each page, and given this feedback the system adapts and attempts to produce better pages the following day. After demonstrating that our system is able to learn a model of a user with a single well-defined interest, we present an initial experiment where over the course of 24 days the output of our system was compared to both randomly-selected and human-selected pages. It consistently performed better than the random pages, and was better than the human-selected pages half of the time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/52/CS-TN-97-52.pdf %R CS-TN-97-53 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Communication Agreement Framework of Access Control %A Roscheisen, Martin %A Winograd, Terry %D February 1997 %X We introduce a framework of access control which shifts the emphasis from the participants to their relationship. The framework is based on a communication model in which participants negotiate the mutually agreed-upon boundary conditions of their relationship in compact "communication pacts," called "commpacts." Commpacts can be seen as a third fundamental type next to access-control lists (ACLs) and capabilities. We argue that in current networked environments characterized by multiple authorities and "trusted proxies," this model provides an encapsulation for interdependent authorization policies, which reduces the negotiation complexity of general (user- and content-dependent) distributed access control and provides a clear user-conceptual metaphor; it also generalizes work in electronic contracting and embeds naturally into the existing legal and institutional infrastructure. The framework is intended to provide a language enabling a social mechanism of coordinated expectation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/53/CS-TN-97-53.pdf %R CS-TN-97-54 %Z Wed, 05 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Combining CORBA and the World-Wide Web in the Stanford Digital Library Project %A Paepcke, Andreas %A Hassan, Scott %D February 1997 %X Describes in 1.5 pages how SIDL combines CORBA and the WWW. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/54/CS-TN-97-54.pdf %R CS-TN-97-55 %Z Mon, 24 Mar 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Performance Evaluation of Centralized and Distributed Index Schemes for a Page Server OODBMS %A Basu, Julie %A Keller, Arthur M. %A Poess, Meikel %D March 1997 %X Recent work on client-server data-shipping OODBs has demonstrated the usefulness of local data caching at client sites. However, none of the studies has investigated index-related performance issues in particular. References to index pages arise from associative queries and from updates on indexed attributes, often making indexes the most heavily used "hot spots" in a database. System performance is therefore quite sensitive to the index management scheme. This paper examines the effects of index caching, and investigates two schemes, one centralized and the other distributed, for index page management in a page server OODB. In the centralized scheme, index pages are not allowed to be cached at client sites; thus, communication with the central server is required for all index-based queries and index updates. The distributed index management scheme supports inter-transaction caching of index pages at client sites, and enforces a distributed index consistency control protocol similar to that of data pages. We study via simulation the performance of these two index management schemes under several different workloads and contention profiles, and identify scenarios where each of the two schemes performs better than the other. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/55/CS-TN-97-55.pdf %R CS-TN-97-56 %Z Mon, 24 Mar 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Approximation Algorithms for Directed Steiner Tree Problems %A Charikar, Moses %A Chekuri, Chandra %A Goel, Ashish %A Guha, Sudipto %D March 1997 %X We obtain the first non-trivial approximation algorithms for the Steiner Tree problem and the Generalized Steiner Tree problem in general directed graphs. Essentially no approximation algorithms were known for these problems. For the Directed Steiner Tree problem, we design a family of algorithms which achieve an approximation ratio of O(k^\epsilon) in time O(kn^{1/\epsilon}) for any fixed (\epsilon > 0), where k is the number of terminals to be connected. For the Directed Generalized Steiner Tree Problem, we give an algorithm which achieves an approximation ratio of O(k^{2/3}\log^{1/3} k), where k is the number of pairs to be connected. Related problems including the Group Steiner tree problem, the Node Weighted Steiner tree problem and several others can be reduced in an approximation preserving fashion to the problems we solve, giving the first non-trivial approximations to those as well. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/56/CS-TN-97-56.pdf %R CS-TN-97-57 %Z Tue, 15 Jul 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Axiomatizing Flat Iteration %A Glabbeek, R.J. van %D April 1997 %X Flat iteration is a variation on the original binary version of the Kleene star operation P*Q, obtained by restricting the first argument to be a sum of atomic actions. It generalizes prefix iteration, in which the first argument is a single action. Complete finite equational axiomatizations are given for five notions of bisimulation congruence over basic CCS with flat iteration, viz. strong congruence, branching congruence, eta-congruence, delay congruence and weak congruence. Such axiomatizations were already known for prefix iteration and are known not to exist for general iteration. The use of flat iteration has two main advantages over prefix iteration: 1. The current axiomatizations generalize to full CCS, whereas the prefix iteration approach does not allow an elimination theorem for an asynchronous parallel composition operator. 2. The greater expressiveness of flat iteration allows for much shorter completeness proofs. In the setting of prefix iteration, the most convenient way to obtain the completeness theorems for eta-, delay, and weak congruence was by reduction to the completeness theorem for branching congruence. In the case of weak congruence this turned out to be much simpler than the only direct proof found. In the setting of flat iteration on the other hand, the completeness theorems for delay and weak (but not eta-) congruence can equally well be obtained by reduction to the one for strong congruence, without using branching congruence as an intermediate step. Moreover, the completeness results for prefix iteration can be retrieved from those for flat iteration, thus obtaining a second indirect approach for proving completeness for delay and weak congruence in the setting of prefix iteration. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/57/CS-TN-97-57.pdf %R CS-TN-97-60 %Z Tue, 07 Oct 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficient Linear Re-rendering for Interactive Lighting Design %A Teo, Patrick C. %A Simoncelli, Eero P. %A Heeger, David J. %D October 1997 %X We present a framework for interactive lighting design based on linear re-rendering. The rendering operation is linear with respect to light sources, assuming a fixed scene and camera geometry. This linearity means that a scene may be interactively re-rendered via linear combination of a set of basis images, each rendered under a particular basis light. We focus on choosing and designing a suitable set of basis lights. We provide examples of bases that allow 1) interactive adjustment of a spotlight direction, 2) interactive adjustment of the position of an area light, and 3) a combination in which light sources are adjusted in both position and direction. We discuss a method for reducing the size of the basis using principal components analysis in the image domain. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/60/CS-TN-97-60.pdf %R CS-TN-97-61 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Performance Analysis of an Associative Caching Scheme for Client-Server Databases %A Julie Basu, Meikel Poess, Arthur M. Keller %D September 1997 %X This paper presents a detailed performance study of the associative caching scheme proposed in "A Predicate-based Caching Scheme for Client-Server Database Architectures," The VLDB Journal, Jan 1996. A client cache dynamically loads query results in the course of transaction execution, and formulates a description of its current contents. Predicate-based reasoning is used on the cache description to examine and maintain the cache. The benefits of the scheme include local evaluation of associative queries, at the cost of maintaining the cached query results through update notifications >From the server. In this paper, we investigate through detailed simulation the behavior of this caching scheme for a client-server database under different workloads and contention profiles. An optimized version of our basic caching scheme is also proposed and studied. We examine both read-only and update transactions, with the effect of updates on the caching performance as our primary focus. Using an extended version of a standard database benchmark, we identify scenarios where these caching schemes improve the system performance and scalability, as compared to systems without client-side caching. Our results demonstrate that associative caching can be beneficial even for moderately high update activity. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/61/CS-TN-97-61.pdf %R CS-TN-97-59 %Z Thu, 18 Sep 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stability of Networks and Protocols in the Adversarial Queueing Model for Packet Routing %A Goel, Ashish %D September 1997 %X The adversarial queueing theory model for packet routing was suggested by Borodin et al. We give a complete and simple characterization of all networks that are universally stable in this model. We show that a specific greedy protocol, SIS (Shortest In System), is stable against a large class of stochastic adversaries. New applications such as multicast packet scheduling and job scheduling with precedence constraints xsare suggested for the adversarial model. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/59/CS-TN-97-59.pdf %R CS-TN-97-58 %Z Mon, 18 Aug 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Precedence Constrained Scheduling to Minimize Weighted Completion Time on a Single Machine. %A Chekuri, Chandra %A Motwani, Rajeev %D August 1997 %X We consider the problem of scheduling a set of jobs on a single machine with the objective of mimizing weighted (average) completion time. The problem is NP-hard when there are precedence constraints between jobs, [12] and we provide a simple and efficient combinatorial 2-approximation algorithm. In contrast to our work, earlier approximation altorithms [9] achieving the same ratio are based on solving a linear programming relaxation of the problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/97/58/CS-TN-97-58.pdf %R CS-TN-98-62 %Z Tue, 21 Apr 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Type System for Object Initialization in the Java Bytecode Language %A Freund, Stephen N. %A Mitchell, John C. %D April 1998 %X In the standard Java implementation, a Java language program is compiled to Java bytecode. This bytecode may be sent across the network to another site, where it is then interpreted by the Java Virtual Machine. Since bytecode may be written by hand, or corrupted during network transmission, the Java Virtual Machine contains a bytecode verifier that performs a number of consistency checks before code is interpreted. As illustrated by previous attacks on the Java Virtual Machine, these tests, which include type correctness, are critical for system security. In order to analyze existing bytecode verifiers and to understand the properties that should be verified, we develop a precise specification of statically-correct Java bytecode, in the form of a type system. Our focus in this paper is a subset of the bytecode language dealing with object creation and initialization. For this subset, we prove that for every Java bytecode program that satisfies our typing constraints, every object is initialized before it is used. The type system is easily combined with a previous system developed by Stata and Abadi for bytecode subroutines. Our analysis of subroutines and object initialization reveals a previously unpublished bug in the Sun JDK bytecode verifier. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/62/CS-TN-98-62.pdf %R CS-TN-98-63 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Collaborative value filtering on the Web %A Rodriguez-Mula, Gerard %A Garcia-Molina, Hector %A Paepcke, Andreas %D May 1998 %X Today's Internet search engines help users locate information based on the textual similarity of a query and potential documents. Given the large number of documents available, the user often finds too many documents, and even if the textual similarity is high, in many cases the matching documents are not relevant or of interest. Our goal is to explore other ways to decide if documents are "of value" to the user, i.e., to perform what we call "value filtering." In particular, we would like to capture access information that may tell us-within limits of privacy concerns-which user groups are accessing what data, and how frequently. This information can then guide users, for example, helping identify information that is popular, or that may have helped others before. This is a type of collaborative filtering or community-based navigation. Access information can either be gathered by the servers that provide the information, or by the clients themselves. Tracing accesses at servers is simple, but often information providers are not willing to share this information. We therefore are exploring client-side gathering. Companies like Alexa are currently using client gathering in the large. We are studying client gathering at a much smaller scale, where a small community of users with shared interest collectively track their information accesses. For this, we have developed a proxy system called the Knowledge Sharing System (KSS) that monitors the behavior of a community of users. Through this system we hope to: 1. Develop mechanisms for sharing browsing expertise among a community of users; and 2. Better understand the access patterns of a group of people with common interests, and develop good schemes for sharing this information. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/63/CS-TN-98-63.pdf %R CS-TN-98-64 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Standard Textual Interchange Format for the Object Exchange Model (OEM) %A Goldman, Roy %A Chawathe, Sudarshan %A Crespo, Arturo %A McHugh, Jason %D May 1998 %X The Object Exchange Model (OEM) serves as the basic data model in numerous projects of the Stanford University Dabase Group, including Tsimmis, Lore and C. This document first defines and explains the model, and then it describes a syntax for textually encoding OEM. By adopting this syntax as a standard across all of our OEM projects, we hope to encourage interoperability and also to provide a consistent view of OEM to interested parties outside Stanford. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/64/CS-TN-98-64.pdf %R CS-TN-98-65 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Responsive Interaction for a Large Web Application The Meteor Shower Architecture in the WebWriter II Editor %A Crespo, Arturo %A Chang, Bay-Wei %A Bier, Eric A. %D May 1998 %X Traditional server-based web applications allow access to server-hosted resources, but often exhibit poor responsiveness due to server load and network delays. Client-side web applications, on the other hand, provide excellent interactivity at the expense of limited access to server resources. The WebWriter II Editor, a direct manipulation HTML editor that runs in a web browser, uses both server-side and client-side processing in order to achieve the advantages of both. In particular, this editor downloads the document data structure to the browser and performs all operations locally. The user interface is based on HTML frames and includes individual frames for previewing the document and displaying general and specific control panels. All editing is done by JavaScript code residing in roughly twenty HTML pages that are downloaded into these frames as needed. Such a client-server architecture, based on frames, client-side data structures, and multiple JavaScript-enhanced HTML pages appears promising for a wide variety of applications. This paper describes this architecture, the Meteor Shower Application Architecture, and its use in the WebWriter II Editor. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/65/CS-TN-98-65.pdf %R CS-TN-98-66 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Archival Storage for Digital Libraries %A Crespo, Arturo %A Garcia-Molina, Hector %D May 1998 %X We propose an architecture for Digital Library Repositories that assures long-term archival storage of digital objects. The architecture is formed by a federation of independent but collaborating sites, each managing a collection of digital objects. The architecture is based on the following key components: use of signatures as object handles, no deletions of digital objects, functional layering of services, the presence of an awareness service in all layers, and use of disposable auxiliary structures. Long-term persistence of digital objects is achieved by creating replicas at several sites. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/66/CS-TN-98-66.pdf %R CS-TN-98-67 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A GUI-Based Version of the SenseMaker Interface for Information Exploration %A Baldonado, Michelle Q Wang %A Winograd, Terry %D May 1998 %X SenseMaker is an interface for information exploration. The original HTML version of the interface relied on tables for display and forms for interaction. The new Java version is GUI-based. This video illustrates the new SenseMaker interface by presenting a hypothetical scenario of a user carrying out an information-exploration task. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/67/CS-TN-98-67.pdf %R CS-TN-98-68 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext %A James, Frankie %D May 1998 %X This paper discusses the results of a 2 by 4 mixed design experiment testing various ways of presenting HTML structures in audio. Four interface styles were tested: (1) one speaker, minimal sound effects, (2) one speaker, many sound effects, (3) many speakers, minimal sound effects, and (4) many speakers, many sound effects. The results obtained were both specific to the interfaces used (i.e., that the use of three different speakers to present heading levels was confusing) and more general (for example, natural sounds are more distinguishable and easier to remember than tones). A short discussion of typical HTML usage is also presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/68/CS-TN-98-68.pdf %R CS-TN-98-69 %Z Fri, 15 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Distinguishability vs. Distraction in Audio HTML Interfaces %A James, Frankie %D May 1998 %X In this paper, we present the findings and conclusions from a user study on audio interfaces. In the experiment we discuss, we studied a framework for choosing sounds for audio interfaces by comparing a prototype interface against two existing audio browsers. Our findings indicate that our initial framework, which was described as a separation between recognizable and non-recognizable sounds, could be better interpreted in the context of the distinguishability and distraction level of various types of sounds. We propose a new definition of how a sound can be called distracting and how to avoid this when creating audio interfaces. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/69/CS-TN-98-69.pdf %R CS-TN-98-70 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Merging Ranks from Heterogeneous Internet Sources %A Garcia-Molina, Hector %A Gravano, Luis %D May 1998 %X Many sources on the Internet and elsewhere rank the objects in query results according to how well these objects match the original query. For example, a real-estate agent might rank the available houses according to how well they match the user's preferred location and price. In this environment, ``meta-brokers'' usually query multiple autonomous, heterogeneous sources that might use varying result- ranking strategies. A crucial problem that a meta-broker then faces is extracting from the underlying sources the top objects for a user query according to the meta-broker's ranking function. This problem is challenging because these top objects might not be ranked high by the sources where they appear. In this paper we discuss strategies for solving this ``meta-ranking'' problem. In particular, we present a condition that a source must satisfy so that a meta-broker can extract the top objects for a query from the source without examining its entire contents. Not only is this condition necessary but it is also sufficient, and we show an efficient algorithm to extract the top objects from sources that satisfy the given condition. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/70/CS-TN-98-70.pdf %R CS-TN-98-71 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford Digital Library Interoperability Protocol %A Hassan, Scott %A Paepcke, Andreas %D May 1998 %X Description of Stanford's interoperability protocol for interacting with search related proxy objects on the InfoBus. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/71/CS-TN-98-71.pdf %R CS-TN-98-72 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Stanford InfoBus and Its Service Layers: Augmenting the Internet with Higher-Level Information Management Protocols %A Roscheisen, Martin %A Baldonado, Michelle %A Chang, Kevin %A Gravano, Luis %A Ketchpel, Steven %A Paepcke, Andreas %D May 1998 %X The Stanford InfoBus is a prototype infrastructure developed as part of the Stanford Digital Libraries Project to extend the current Internet protocols with a suite of higher-level information management protocols. This paper surveys the five service layers pro vided by the Stanford InfoBus: protocols for managing items and collections (DLIOP), metadata (SMA), search (STARTS), payment (UPAI), and rights and obligations (FIRM). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/72/CS-TN-98-72.pdf %R CS-TN-98-73 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Proposal for I**3 Client Server Protocol %A Garcia-Molina, Hector %A Paepcke, Andreas %D May 1998 %X This document proposes a CORBA-based protocol for submitting queries to servers and for obtaining the results. It is a subset of the Stanford Digital Library Interoperability protocol (DLIOP). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/73/CS-TN-98-73.pdf %R CS-TN-98-74 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System %A Chang, Chen-Chuan K. %A Garcia-Molina, Hector %A Paepcke, Andreas %D May 1998 %X Searching over heterogeneous information sources is difficult in part because of the non- uniform query languages. Our approach is to allow users to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final results. In this paper we introduce the architecture and associated mechanism for query translation. In particular, we discuss techniques for rewriting predicates in Boolean queries into native subsuming forms, which is a basis of translating complex queries. In addition, we present experimental results for evaluating the cost of post-filtering. We also discuss the drawbacks of this approach and cases when it may not be effective. We have implemented prototype versions of these mechanisms and demonstrated them on heterogeneous Boolean systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/74/CS-TN-98-74.pdf %R CS-TN-98-75 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Extensible Constructor Tool for the Rapid, Interactive Design of Query Synthesizers %A Baldonado, Michelle %A Katz, Seth %A Paepcke, Andreas %A Chang, Chen-Chuan K. %A Garcia-Molina, Hector %A Winograd, Terry %D May 1998 %X We describe an extensible constructor tool that helps information experts (e.g., librarians) create specialized query synthesizers for heterogeneous digital-library environments. A query synthesizer provides a graphical user interface in which a digital-library patron can specify a high-level, fielded, multi-source query. Furthermore, a query synthesizer interacts with a query translator and an attribute translator to transform high-level queries into sets of source-specific queries. We discuss how the constructor can facilitate discovery of available attributes (e.g., title), collation of schemas from different sources, selection of input widgets for a synthesizer (e.g., a text box or a drop-down list widget to support input of controlled vocabulary),, and other design aspects. We also describe a prototype constructor we implemented, based on the Stanford InfoBus and metadata architecture. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/75/CS-TN-98-75.pdf %R CS-TN-98-76 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interoperability for Digital Libraries Worldwide %A Paepcke, Andreas %A Chang, Chen-Chuan K. %A Garcia-Molina, Hector %A Winograd, Terry %D May 1998 %X Discusses the history and current directions of interoperability in different parts of computing systems relevant to Digital Libraries %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/76/CS-TN-98-76.pdf %R CS-TN-98-78 %Z Tue, 07 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Probabilistic Poly-time Framework for Protocol Analysis %A Lincoln, P. %A Mitchell, J. %A Mitchell, M. %A Scedrov, A. %D April 3, 1998 %X We develop a framework for analyzing security protocols in which protocol adversaries may be arbitrary probabilistic polynomial-time processes. In this framework, protocols are written in a restricted form of pi-calculus and security may expressed as a form or observational equivalence, a standard relation from programming language theory that involves quantifying over possible environments that might interact with the protocol. Using an asymptotic notion of probabilistic equivalence, we relate observational equivalence to polynomial-time statistical tests and discuss some example protocols to illustrate the potential strengths of our approach. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/78/CS-TN-98-78.pdf %R CS-TN-98-77 %Z Tue, 07 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Linguistic Characterization of Bounded Oracle Computation and Probabilistic Polynomial Time %A Mitchell, J. %A Mitchell, M. %A Scedrov, A. %D May 4, 1998 %X We present a higher-order functional notation for polynomial-time computation with arbitrary 0,1-valued oracle. This provides a linguistic characterization for classes such as NP and BPP, as well as a notation for probabilistic polynomial-time functions. The language is derived from Hofmann's adaptation of Bellantoni-Cook safe recursion, extended to oracle computation via work derived from that of Kapron and Cook. Like Hofmann's language, ours is an applied version of typed lambda calculus with complexity bounds enforced by a type system. The type system uses a modal operator to distinguish between two types of numerical expressions, only one of which is allowed in recursion indices. The proof that the language captures precisely oracle polynomial time is model-theoretic, using adaptations of various techniques from category theory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/77/CS-TN-98-77.pdf %R CS-TN-98-79 %Z Wed, 22 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T 2D BubbleUp: Managing Parallel Disks for Media Servers %A Chang, Edward %A Garcia-Molina, Hector %A Li, Chen %D July 1998 %X In this study we present a scheme called two-dimensional BubbleUp (2DB) for managing parallel disks in a multimedia server. Its goal is to reduce initial latency for interactive multimedia applications, while balancing disk loads to maintain high throughput. The 2DB scheme consists of a data placement and a request scheduling policy. The data placement policy replicates frequently accessed data and places them cyclically throughout the disks. The request scheduling policy attempts to maintain free ``service slots'' in the immediate future. These slots can then be used to quickly service newly arrived requests. Through examples and simulation, we show that our scheme significantly reduces initial latency and maintains throughput comparable to that of the traditional schemes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/79/CS-TN-98-79.pdf %R CS-TN-98-80 %Z Wed, 22 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T MEDIC: A Memory & Disk Cache for Multimedia Clients %A Chang, Edward %A Garcia-Molina, Hector %D July 1998 %X In this paper we propose an integrated memory and disk cache for a multimedia client. The cache cushions the multimedia decoder from input rate fluctuations and mismatches, and because data can be cached to disk, the acceptable fluctuations can be very large. This gives the media server much greater flexibility for load balancing, and lets the client operate efficiently when the network rate is much larger or smaller than the media display rate. We analyze the memory requirements for this cache, and analytically derive safe values for its control parameters. Using a realistic case study, we study the interaction between memory size, peak input rate, and disk performance, and show that a relatively modest amount of main memory can support a wide range of scenarios. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/80/CS-TN-98-80.pdf %R CS-TN-98-81 %Z Thu, 23 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T How to build a DLITE component %A Cousins, Steve B. %D July 1998 %X This paper describes how to build a DLITE component. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/81/CS-TN-98-81.pdf %R CS-TN-98-82 %Z Thu, 23 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Minimizing Memory Requirements in Media Servers %A Chang, Edward %A Chen, Yi-Yin %D July 1998 %X Poor memory management policies lead to lower throughput and excessive memory requirements. This problem is aggravated in multimedia databases by the large volume and real-time data requirements. This study explores the temporal and spatial relationships among concurrent media streams. Specifically, we propose adding proper delays to space out IOs in a media server to give more room for buffer sharing among streams. Memory requirements can be reduced by trading time for space. We present and prove theorems that state the optimal IO schedules for reducing memory requirements for two cases: streams with the same required display rate and different display rates. We also show how the theorems can be put in practice to improve system performance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/82/CS-TN-98-82.pdf %R CS-TN-98-83 %Z Thu, 23 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford DLITE User Study %A Mortensen, Mark %D July 1998 %X User tests were conducted on the DLITE digital workspace. These consisted of observed use of the DLITE system, followed by an interview with the test administrator. The tests themselves were carried out both remotely over a network, and locally in the digital libraries lab on subjects with moderate computer knowledge. Initial tests resulted in system failures that caused DLITE to crash or become totally unusable. In subsequent tests, users noted a number of areas of DLITE that caused confusion. In particular, the instantiation of queries, the purpose and functioning of the graphics in the upper-left corner of objects, and the obscuring of objects in the workspace when dragging large components. Given the reactions of users in the post-test interview, these problems do not appear to be flaws in the design of DLITE, but implementation errors not intrinsic to the model upon which the functionality is based. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/83/CS-TN-98-83.pdf %R CS-TN-98-84 %Z Thu, 23 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lessons from Developing Audio HTML Interfaces %A James, Frankie %D July 1998 %X In this paper, we discuss our previous research on the establishment of guidelines and principles for choosing sounds to use in an audio interface to HTML, called the AHA framework. These principles, along with issues related to the target audience such as user tasks, goals, and interests are factors that can help us to choose specific sounds for the interface. We conclude by describing scenarios of two potential users and the interfaces that would seem to be appropriate for them. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/84/CS-TN-98-84.pdf %R CS-TN-98-85 %Z Thu, 23 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Conjunctive Constraint Mapping for Data Translation %A Chang, Chen-Chuan K. %A Garcia-Molina, Hector %D July 1998 %X In this paper we present a mechanism for translating information in heterogeneous digital library environments. We model information as a set of conjunctive constraints that are satisfied by real- world objects (e.g, documents, their metadata). Through application of semantic rules and value transformation functions, constraints are mapped into ones understood and supported in another context. Our machinery can also deal with hierarchically structured information. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/85/CS-TN-98-85.pdf %R CS-TN-98-86 %Z Mon, 14 Sep 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Earth Mover's Distance as a Metric for Image Retrieval %A Rubner, Yossi %A Tomasi, Carlo %A Guibas, Leonidas J. %D September 1998 %X We introduce a metric between two distributions that we call the Earth Mover's Distance (EMD). The EMD is based on the minimal cost that must be paid to transform one distribution into the other, in a precise sense. We show that the EMD has attractive properties for content-based image retrieval. The most important one, as we show, is that it matches perceptual similarity better than other distances used for image retrieval. The EMD is based on a solution to the transportation problem from linear optimization, for which efficient algorithms are available, and also allows naturally for partial matching. It is more robust than histogram matching techniques, in that it can operate on variable-length representations of the distributions that avoid quantization and other binning problems typical of histograms. When used to compare distributions with the same overall mass, the EMD is a true metric. In this paper we focus on applications to color and texture, and we compare the retrieval performance of the EMD with that of other distances. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/86/CS-TN-98-86.pdf %R CS-TN-98-87 %Z Mon, 14 Dec 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Scheduling Algebra %A Glabbeek, R.J. van %A Rittgen, P. %D December 1998 %X The goal of this paper is to develop an algebraic theory of process scheduling. We specify a syntax for denoting processes composed of actions with given durations. Subsequently, we propose axioms for transforming any specification term of a scheduling problem into a term of all valid schedules. Here a schedule is a process in which all (implementational) choices (e.g. precise timing) are resolved. In particular, we axiomatize an operator restricting attention to the efficient schedules. These schedules turn out to be representable as trees, because in an efficient schedule actions start only at time zero or when a resource is released, i.e. upon termination of the action binding a required resource. All further delay would be useless. Nevertheless, we do not consider resource constraints explicitly here. We show that a normal form exists for every term of the algebra and establish soundness of our axiom system with respect to a schedule semantics, as well as completeness for efficient processes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/98/87/CS-TN-98-87.pdf %R CS-TN-99-88 %Z Mon, 12 Jul 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Truth Revelation in Rapid, Approximately Efficient Combinatorial Auctions %A Lehmann, Daniel %A O'Callaghan, Liadan Ita %A Shoham, Yoav %D July 1999 %X Some important classical mechanisms considered in Microeconomics and Game Theory require the solution of a difficult optimization problem. This is true of mechanisms for combinatorial auctions, which have in recent years assumed practical importance, and in particular of the gold standard for combinatorial auctions, the Generalized Vickrey Auction (GVA). Traditional analysis of these mechanisms - in particular, their truth revelation properties - assumes that the optimization problems are solved precisely. In reality, these optimization problems can usually be solved only in an approximate fashion. We investigate the impact on such mechanisms of replacing exact solutions by approximate ones. Specifically, we look at a particular greedy optimization method, which has empirically been shown to perform well. We show that the GVA payment scheme does not provide for a truth revealing mechanism. We introduce another scheme that does guarantee truthfulness for a restricted class of players. We demonstrate the latter property by identifying sufficient conditions for a combinatorial auction to be truth-revealing, conditions which have applicability beyond the specific auction studied here. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/99/88/CS-TN-99-88.pdf %R CS-TN-99-89 %Z Wed, 28 Jul 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Extending Greedy Multicast Routing to Delay Sensitive Applications %A Goel, Ashish %A Munagala, Kamesh %D July 1999 %X Given a weighted undirected graph G(V,E) and a subset R of V, a Steiner tree is a subtree of G that contains each vertex in R. We present an online algorithm for finding a Steiner tree that simultaneously approximates the shortest path tree and the minimum weight Steiner tree, when the vertices in the set R are revealed in an online fashion. This problem arises naturally while trying to construct source-based multicast trees of low cost and good delay. The cost of the tree we construct is within an O(log |R|) factor of the optimal cost, and the path length from the root to any terminal is at most O(1) times the shortest path length. The algorithm needs to perform at most one reroute for each node in the tree. Our algorithm extends the results of Khuller etal and Awerbuch etal, who looked at the offline problem. We conduct extensive simulations to compare the performance of our algorithm (in terms of cost and delay) with that of two popular multicast routing strategies: shortest path trees and the online greedy Steiner tree algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/99/89/CS-TN-99-89.pdf %R CS-TN-99-90 %Z Fri, 20 Aug 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Simulation of Iterative Matching for Combined Input and Output Queueing %A Pichai, Srinivasan %A Mudulodu, Sriram %D August 1999 %X Since its introduction the Stable Marriage problem has been a subject of interest in mathematics and computer science. Recently this algorithm has found application in the area of switch scheduling algorithms for high performance switches. Inputs and Output ports of the switch compute their preference lists based on expected departure times for an ideal output queued switch. The stable matching as computed by the Galey-Shapley for this set of preferences determines the configuration of the interconnection fabric. The nature of the stable matching enables the emulation of an output-queued switch with combined input and output queueing using a speedup factor of 2. However it is important to compute the stable match efficiently for high performance. Hence parallel iterative versions of the algorithm have been proposed. In this report we investigate the convergence time of the parallel stable matching algorithm. The definition of the preference lists imposes special constraints on the problem and this reduces the worst case complexity of the algorithm. Simulations have shown that convergence time for the average case is also considerably lower than the general version of the algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/99/90/CS-TN-99-90.pdf %R CS-TN-00-92 %Z Mon, 28 Feb 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Cost-Distance: Two Metric Network Design %A Meyerson, Adam %A Munagala, Kamesh %A Plotkin, Serge %D February 2000 %X We present the Cost-Distance problem: finding a Steiner tree which optimizes the sum of edge costs along one metric and the sum of source-sink distances along an unrelated second metric. We give the first known O(log k) randomized approximation scheme for Cost-Distance, where k is the number of sources. We reduce many common network design problems to Cost-Distance, obtaining (in some cases) the first known logarithmic approximation for them. These problems include single-sink buy-at-bulk with variable pipe types between different sets of nodes, and facility location with buy-at-bulk type costs on edges. Our algorithm is also the algorithm of choice for several previous network design problems, due to its ease of implementation and fast running time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/00/92/CS-TN-00-92.pdf %R CS-TN-00-95 %Z Mon, 15 May 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Facility Location with Demand Dependent Costs and Generalized Clustering %A Guha, Sudipto %A Meyerson, Adam %A Munagala, Kamesh %D May 2000 %X We solve the vaiant of facility location problem in which the costs of facilities depend on the demand served, more specifically decrease with the demand served. We show application of this problem to generalized clustering problems which does not penalize large clusters. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/00/95/CS-TN-00-95.pdf %R CS-TN-00-96 %Z Mon, 05 Jun 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Improved Combinatorial Algorithms for Single Sink Edge Installation Problems. %A Guha, Sudipto %A Meyerson, Adam %A Munagala, Kamesh %D June 2000 %X We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per unit length to route demands at a set of sources to a single sink. The distances in the underlying network form a metric. This result improves the previous bound of log |S|, where S is the set of sources. We also present an improved constant approximation to the related Access Network Design problem. Our algorithms are randomized and fully combinatorial. They can be derandomized easily at the cost of a constant factor loss in the approximation ratio. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/00/96/CS-TN-00-96.pdf %R CS-TN-00-94 %Z Mon, 15 May 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Web Caching using Access Statistics %A Meyerson, Adam %A Munagala, Kamesh %A Plotkin, Serge %D May 2000 %X We present the problem of caching web pages under the assumption that each user has a fixed, known demand vector for the pages. Such demands could be computed using access statistics. We wish to place web pages in the caches in order to optimize the latency from user to page, under the constraints that each cache has limited memory and can support a limited total number of requests. When C caches are present with fixed locations, we present a constant factor approximation to the latency while exceeding capacity constraints by log C. We improve this result to a constant factor provided no replication of web pages is allowed. We present a constant factor approximation where the goal is to minimize the maximum latency. We also consider the case where we can place our own caches in the network for a cost, and produce a constant approximation to the sum of cache cost plus weighted average latency. Finally, we extend our results to incorporate page update latency, temporal variation in request rates, and economies of scale in cache costs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/00/94/CS-TN-00-94.pdf %R CS-TN-00-93 %Z Mon, 15 May 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Hierarchical Placement and Network Design Problems. %A Guha, Sudipto %A Meyerson, Adam %A Munagala, Kamesh %D May 2000 %X In this paper, we give the first constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer satisfies a fixed percentage of the demand (bounded miss rates). We present a constant approximation to the minimum total cost of placing the caches and routing demand through the layers. We extend this model to cover more general layered caching scenarios, giving the first constant approximation to the well studied multi-level facility location problem. We consider a facility location variant, the Load Balanced Facility Location problem in which every demand is served by a unique facility and each open facility must serve at least a certain amount of demand. By combining Load Balanced Facility Location with our results on hierarchical caching, we give the first constant approximation for the Access Network Design problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/00/93/CS-TN-00-93.pdf %R CS-TN-00-97 %Z Fri, 28 Jul 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Hardware Support for Tamper-Resistant and Copy-Resistant Software %A Boneh, Dan %A Lie, David %A Lincoln, Pat %A Mitchell, John %A Mitchell, Mark %D July 2000 %X Although there have been many attempts to develop code transformations that yield tamper-resistant software, no reliable software-only methods are known. Motivated by numerous potential applications, we investigate a prototype hardware mechanism that supports software tamper-resistance with an atomic decrypt-and-execute operation. Our hardware architecture uses a novel combination of standard architectural units. As usual, security has its costs. In this design, the most difficult security tradeoffs involve testability and performance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tn/00/97/CS-TN-00-97.pdf %R CS-TR-92-1432 %Z Thu, 28 Oct 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Overview of multidatabase transaction management %A Breitbart, Yuri %A Garcia-Molina, Hector %A Silberschatz, Avi %D October 1993 %X A multidatabase system (MDBS) is a facility that allows users access to data located in multiple autonomous database management systems (DBMSs). In such a system, global transactions are executed under the control of the MDBS. Independently, local transactions are executed under the control of the local DBMSs. Each local DBMS integrated by the MDBS may employ a different transaction management scheme. In addition, each local DBMS has complete control over all transactions (global and local) executing at its site, including the ability to abort at any point any of the transactions executing at its site. Typically, no design or internal DBMS structure changes are allowed in order to accommodate the MDBS. Furthermore, the local DBMSs may not be aware of each other, and, as a consequence, cannot coordinate their actions. Thus, traditional techniques for ensuring transaction atomicity and consistency in homogeneous distributed database systems may not be appropriate for an MDBS environment. The objective of this paper is to provide a brief review of the most current work in the area of multidatabase transaction management. We first define the problem and argue that the multidatabase research will become increasingly important in the coming years. We then outline basic research issues in multidatabase transaction management and review recent results in the area. We conclude the paper with a discussion of open problems and practical implications of this research. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1432/CS-TR-92-1432.pdf %R CS-TR-92-1431 %Z Fri, 22 Oct 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Aggressive transmissions over redundant paths for time critical messages %A Kao, Ben %A Garcia-Molina, Hector %A Barbara, Daniel %D October 1993 %X Fault tolerant computer systems have redundant paths connecting their components. Given these paths, it is possible to use aggressive techniques to reduce the average value and variability of the response time for critical messages. One technique is to send a copy of a packet over an alternate path before it is known if the first copy failed or was delayed. A second technique is to split a single stream of packets over multiple paths. We analize both approaches and show that they can provide significant improvements over conventional, conservative mechanisms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1431/CS-TR-92-1431.pdf %R CS-TR-92-1435 %Z Tue, 19 Oct 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lecture notes on approximation algorithms: Volume I %A Motwani, Rajeev %D October 1993 %X These lecture notes are based on the course CS351 (Dept. of Computer Science, Stanford University) offered during the academic year 1991-92. The notes below correspond to the first half of the course. The second half consists of topics such as AL4X SNP. cliques, and colorings, as well as more specialized material covering topics such as geometric problems, Steiner trees and multicommodity flows. The second half is being revised to incorporate the implications of recent results in approximation algorithms and the complexity of approximation problems. Please let me know if you would like to be on the mailing list for the second half. Comments, criticisms and corrections are welcome, please send them by electronic mail to rajeev@cs.Stanford.edu. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1435/CS-TR-92-1435.pdf %R CS-TR-92-1426 %Z Wed, 03 Nov 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Proceedings of the ACM SIGPLAN Workshop on Continuations CW92 %A Danvy, Olivier (ed.) %A Talcott, Carolyn (ed.) %D November 1993 %X The notion of continuation is ubiquitous in many different areas of computer science, including logic, constructive mathematics, programming languages, and programming. This workshop aims at providing a forum for discussion of: new results and work in progress; work aimed at a better understanding of the nature of continuations; applications of continuations, and the relation of continuations to other areas of logic and computer science. This technical report serves as informal proceedings for CW92. It consists of submitted manuscripts bound together according to the program order. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1426/CS-TR-92-1426.pdf %R CS-TR-92-1423 %Z Fri, 05 Nov 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Time-lapse snapshots %A Dwork, Cynthia %A Herlihy, Maurice %A Plotkin, Serge A. %A Waarts, Orli %D November 1993 %X Abstract. A snapshot scan algorithm takes an "instantaneous" picture of a region of shared memory that may he updated by concurrent processes. Many complex shared memory algorithms can be greatly simplified by structuring them around the snapshot scan abstraction. Unforinnately, the substantial decrease in conceptual complity is quite often counterbalanced by an increase in computational complexity. In this paper, we introduce the notion of a weak snapshot scan, a slightly weaker primitive that has a more efficient implementation. We propose the following methodology for using this abstraction: first, design and verify an algorithm using the more powerful snapshot scan, and second, replace the more powerful but less efficient snapshot with the weaker but more efficient snapshot, and show that the weaker abstraction nevertheless suffices to ensure the correctness of the enclosing algorithm. We give two examples of algorithms whose performance can be enhanced while retaining a simple modular structure: bounded concurrent timestamping, and bounded randomized consensus. The resulting timestamping protocol is the fastest known bounded concurrent timestamping protocol. The resulting randomized consensus protocol matches the computational complexity of the best known protocol that uses only bouned values. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1423/CS-TR-92-1423.pdf %R CS-TR-92-1419 %Z Fri, 05 Nov 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast approximation algorithms for fractional packing and covering problems %A Plotkin, Serge A. %A Shmoys, David B. %A Tardos, Eva %D November 1993 %X This paper presents fast algorithms that find approximate solutions for a general class of problems, which we call fractional packing and covering problems. The only previously known algorithms for solving these problems are based on general linear programming techniques. The techniques developed in this paper greatly outperform the general methods in many applications, and are extensions of a method previously applied to find approximate solutions to multicommodity flow problems. Our algorithm is a Lagrangean relaxation technique; an important aspect of our results is that we obtain a theoretical analysis of the running time of a Lagrangean relaxation-based algorithm. We give several applications of our algorithms. The new approach yields several orders of magnitude of improvement over the best previously known running times for the scheduling of unrelated parallel machines in both the preemptive and the non-preemptive models, for the job shop problem, for the cutting-stock problem, and for the minimum cost multicommodity flow problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1419/CS-TR-92-1419.pdf %R CS-TR-92-1401 %Z Mon, 22 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T The performance impact of data reuse in parallel dense Cholesky factorization %A Rothberg, Edward %A Gupta, Anoop %D January 1992 %X This paper explores performance issues for several prominent approaches to parallel dense Cholesky factorization. The primary focus is on issues that arise when blocking techniques are integrated into parallel factorization approaches to improve data reuse in the memory hierarchy. We first consider panel-oriented approaches, where sets of contiguous columns are manipulated as single units. These methods represent natural extensions of the column-oriented methods that have been widely used previously. On machines with memory hierarchies, panel-oriented methods significantly increase the achieved performance over column-oriented methods. However, we find that panel- oriented methods do not expose enough concurrency for problems that one might reasonably expect to solve on moderately parallel machines, thus significantly limiting their performance. We then explore block-oriented approaches, where square submatrices are manipulated instead of sets of columns. These methods greatly increase the amount of available concurrency, thus alleviating the problems encountered with panel-oriented methods. However, a number of issues, including scheduling choices and block- placement issues, complicate their implementation. We discuss these issues and consider approaches that solve the resulting problems. The resulting block-oriented implementation yields high processor utilization levels over a wide range of problem sizes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1401/CS-TR-92-1401.pdf %R CS-TR-92-1412 %Z Thu, 25 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Toward agent programs with circuit semantics %A Nilsson, Nils J. %D January 1992 %X New ideas are presented for computing and organizing actions for autonomous agents in dynamic environments - environments in which the agent's current situation cannot always be accurately discerned and in which the effects of actions cannot always be reliably predicted. The notion of "circuit semantics" for programs based on "teleo-reactive trees" is introduced. Program execution builds a combinational circuit which receives sensory inputs and controls actions. These formalisms embody a high degree of inherent conditionality and thus yield programs that are suitably reactive to their environments. At the same time, the actions computed by the programs are guided by the overall goals of the agent. The paper also speculates about how programs using these ideas could be automatically generated by artificial intelligence planning systems and adapted by learning methods. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1412/CS-TR-92-1412.pdf %R CS-TR-92-1441 %Z Sun, 28 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Motion planning in stereotaxic radiosurgery %A Schweikard, Achim %A Adler, John R. %A Latombe, Jean-Claude %D September 1992 %X Stereotaxic radiosurgery is a procedure which uses a beam of radiation as an ablative surgical instrument to destroy brain tumors. The beam is produced by a linear accelerator which is moved by a jointed mechanism. Radiation is concentrated by crossfiring at the tumor from multiple directions and the amount of energy deposited in normal brain tissues is reduced. Because access to the tumor is obstructed along some directions by critical regions (e.g., brainstem, optic nerves) and most tumors are not shaped like spheres, planning the path of the beam is often difficult and time-consuming. This paper describes a computer-based planner developed to assist the surgeon generate a satisfactory path, given the spatial distribution of the brain tissues obtained with medical imaging. Experimental results with the implemented planner are presented, including a comparison with manually generated paths. According to these results, automatic planning significantly improves energy deposition. It can also shorten the overall treatment, hence reducing the patient's pain and allowing the radiosurgery equipment to be used for more patients. Stereotaxic radiosurgery is an example of so-called "bloodless surgery". Computer-based planning techniques are expected to facilitate further development of this safer, less painful, and more cost effective type of surgery. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1441/CS-TR-92-1441.pdf %R CS-TR-92-1446 %Z Sun, 28 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Independent updates and incremental agreement in replicated databases %A Ceri, Stefano %A Houtsma, Maurice A. W. %A Keller, Arthur M. %A Samarati, Pierangela %D October 1992 %X Update propagation and transaction atomicity are major obstacles to the development of replicated databases. Many practical applications, such as automated teller machine (ATM) networks, flight reservation, and part inventory control, do not really require these properties. In this paper we present an approach for incrementally updating a distributed, replicated database without requiring multi-site atomic commit protocols. We prove that the mechanism is correct, as it asymptotically performs all the updates on all the copies. Our approach has two important characteristics: it is progressive, and non-blocking. Progressive means that the transaction's coordinator always commits, possibly together with a group of other sites. The update is later propagated asynchronously to the remaining sites. Non-blocking means that each site can take unilateral decisions at each step of the algorithm. Sites which cannot commit updates are brought to the same final state by means of a reconciliation mechanism. This mechanism uses the history logs, which are stored locally at each site, to bring sites to agreement. It requires a small auxiliary data structure, called reception vector, to keep track of the time until which the other sites are guaranteed to be up-to-date. Several optimizations to the basic mechanism are also discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1446/CS-TR-92-1446.pdf %R CS-TR-92-1452 %Z Sun, 28 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Deadline assignment in a distributed soft real-time system %A Kao, Ben %A Garcia-Molina, Hector %D October 1992 %X In a distributed environment, tasks often have processing demands on multiple different sites. A distributed task is usually divided up into several subtasks, each one to be executed at some site in order. In a real-time system, an overall deadline is usually specified by an application designer indicating when a distributed task is to be finished. However, the problem of how a global deadline is automatically translated to the deadline of each individual subtask has not been well studied. This paper examines (through simulations) four strategies for subtask deadline assignment in a distributed soft real-time environment. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/92/1452/CS-TR-92-1452.pdf %R CS-TR-93-1491 %Z Tue, 19 Oct 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Subtask Deadline Assignment for Complex Distributed Soft Real-Time Tasks %A Kao, Ben %A Garcia-Molina, Hector %D October 1993 %X Complex distributed tasks often involve parallel execution of subtasks at different nodes. To meet the deadline of a global task, all of its parallel subtasks have to be finished on time. Comparing to a local task (which involves execution at only one node), a global task may have a much harder time making its deadline because it is fairly likely that at least one of its subtasks run into an overloaded node. Another problem with complex distributed tasks occurs when a global task consists of a number of serially executing subtasks. In this case, we have the problem of dividing up the end-to-end deadline of the global task and assigning them to the intermediate subtasks. In this paper, we study both of these problems. Different algorithms for assigning deadlines to subtasks are presented and evaluated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/93/1491/CS-TR-93-1491.pdf %R CS-TR-93-1494 %Z Wed, 08 Dec 93 00:00:00 GMT %I Stanford University, Department of Computer Science %T Index Structures for Information Filtering Under the Vector Space Model %A Yan, Tak W. %A Garcia-Molina, Hector %D December 1993 %X With the ever increasing volumes of information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term interests), handled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The efficiency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select only the significant ones to index. We evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. We also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/93/1494/CS-TR-93-1494.pdf %R CS-TR-93-1499 %Z Thu, 27 Jan 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Sandwich Theorem %A Knuth, Donald E. %D January 1994 %X This report contains expository notes about a function vartheta(G) that is popularly known as the Lovasz number of a graph G. There are many ways to define vartheta(G), and the surprising variety of different characterizations indicates in itself that vartheta(G) should be interesting. But the most interesting property of vartheta(G) is probably the fact that it can be computed efficiently, although it lies "sandwiched" between other classic graph numbers whose computation is NP-hard. I have tried to make these notes self-contained so that they might serve as an elementary introduction to the growing literature on Lovasz's fascinating function. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/93/1499/CS-TR-93-1499.pdf %R CS-TR-94-1500 %Z Thu, 27 Jan 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T 1993 Publications Summary for the Stanford Database Group %A Siroker, Marianne %D January 1994 %X This Technical Report contains the first page of papers written by members of the Stanford Database Group during 1993. Readers interested in the full papers can fetch electronic copies via FTP. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1500/CS-TR-94-1500.pdf %R CS-TR-94-1501 %Z Mon, 28 Feb 94 00:00:00 GMT %I Stanford University, Department of Comuputer Science %T Deriving Properties of Belief Update from Theories of Action %A Val, Alvaro del %A Shoham, Yoav %D February 1994 %X We present an approach to database update as a form of non monotonic temporal reasoning, the main idea of which is the (circumscriptive) minimization of changes with respect to a set of facts declared ``persistent by default.'' The focus of the paper is on the relation between this approach and the update semantics recently proposed by Katsuno and Mendelzon. Our contribution in this regard is twofold: - We prove a representation theorem for KM semantics in terms of a restricted subfamily of the operators defined by our construction. - We show how the KM semantics can be generalized by relaxing our construction in a number of ways, each justified in certain intuitive circumstances and each corresponding to one specific postulate. It follows that there are reasonable update operators outside the KM family. Our approach is not dependent for its plausibility on this connection with KM semantics. Rather, it provides a relatively rich and flexible framework in which the frame and ramification problems can be solved in a systematic way by reasoning about default persistence of facts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1501/CS-TR-94-1501.pdf %R CS-TR-94-1502 %Z Mon, 28 Feb 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Natural Language Parsing as Statistical Pattern Recognition %A Magerman, David M. %D February 1994 %X Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. By basing the disambiguation criteria selection on entropy reduction rather than human intuition, this parser development method is able to consider more sentences than a human grammarian can when making individual disambiguation rules. In experiments, the decision tree parser significantly outperforms a grammarian's rule-based parser, achieving an accuracy rate of 78% compared to the rule-based parser's 69%. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1502/CS-TR-94-1502.pdf %R CS-TR-94-1503 %Z Mon, 28 Feb 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Deciding whether to plan to react %A Dabija, Vlad G. %D February 1994 %X Intelligent agents that operate in real-world real-time environments have limited resources. An agent must take these limitations into account when deciding which of two control modes - planning versus reaction - should control its behavior in a given situation. The main goal of this thesis is to develop a framework that allows a resource-bounded agent to decide at planning time which control mode to adopt for anticipated possible run-time contingencies. Using our framework, the agent: (a) analyzes a complete (conditional) plan for achieving a particular goal; (b) decides which of the anticipated contingencies require and allow for preparation of reactive responses at planning time; and (c) enhances the plan with prepared reactions for critical contingencies, while maintaining the size of the plan, the planning and response times, and the use of all other critical resources of the agent within task-specific limits. For a given contingency, the decision to plan or react is based on the characteristics of the contingency, the associated reactive response, and the situation itself. Contingencies that may occur in the same situation compete for reactive response preparation because of the agent's limited resources. The thesis also proposes a knowledge representation formalism to facilitate the acquisition and maintenance of knowledge involved in this decision process. We also show how the proposed framework can be adapted for the problem of deciding, for a given contingency, whether to prepare a special branch in the conditional plan under development or to leave the contingency for opportunistic treatment at execution time. We make a theoretical analysis of the properties of our framework and then demonstrate them experimentally. We also show experimentally that this framework can simulate several different styles of human reactive behaviors described in the literature and, therefore, can be useful as a basis for describing and contrasting such behaviors. Finally we demonstrate that the framework can be applied in a challenging real domain. That is: (a) the knowledge and data needed for the decision making within our framework exist and can be acquired from experts, and (b) the behavior of an agent that uses our framework improves according to response time, reliability and resource utilization criteria. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1503/CS-TR-94-1503.pdf %R CS-TR-94-1504 %Z Mon, 28 Feb 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Algebraic Approach to Rule Analysis in Expert Database Systems %A Baralis, Elena %A Widom, Jennifer %D February 1994 %X Expert database systems extend the functionality of conventional database systems by providing a facility for creating and automatically executing Condition-Action rules. While Condition-Action rules in database systems are very powerful, they also can be very difficult to program, due to the unstructured and unpredictable nature of rule processing. We provide methods for static analysis of Condition-Action rules; our methods determine whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (has a guaranteed unique final state). Our methods are based on previous methods for analyzing rules in active database systems. We improve considerably on the previous methods by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not. Our improved analysis is based on a ``propagation'' algorithm, which uses a formal approach based on an extended relational algebra to accurately determine when the action of one rule can affect the condition of another. Our algebraic approach yields methods that are applicable to a broad class of expert database rule languages. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1504/CS-TR-94-1504.pdf %R CS-TR-94-1505 %Z Fri, 04 Mar 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Using a Position History-Based Protocol for Distributed Object Visualization %A Singhal, Sandeep K. %A Cheriton, David R. %D February 1994 %X Users of distributed virtual reality applications interact with users located across the network. Similarly, distributed object visualization systems store dynamic data at one host and render it in real-time at other hosts. Because data in both systems is animated and exhibits unpredictable behavior, providing up-to-date information about remote objects is expensive. Remote hosts must instead apply extrapolation between successive update packets to render the object's true animated behavior. This paper describes and analyzes a ``position history-based'' protocol in which hosts apply several recent position updates to track the position of remote objects. The history-based approach offers smooth, accurate visualizations of remote objects while providing a scalable solution. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1505/CS-TR-94-1505.pdf %R CS-TR-94-1506 %Z Mon, 28 Feb 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Optimized Memory-Based Messaging: Leveraging the Memory System for High-Performance Communication %A Cheriton, David R. %A Kutter, Robert A. %D February 1994 %X Memory-based messaging, passing messages between programs using shared memory, is a recognized technique for efficient communication that takes advantage of memory system performance. However, the conventional operating system support for this approach is inefficient, especially for large-scale multiprocessor interconnects, and is too complex to effectively support in hardware. This paper describes hardware and software optimizations for memory-based messaging that efficiently exploit the mechanisms of the memory system to provide superior communication performance. We describe the overall model of optimized memory-based messaging, its implementation in an operating system kernel and hardware support for this approach in a scalable multiprocessor architecture. The optimizations include address-valued signals, message-oriented memory consistency and automatic signaling on write. Performance evaluations show these extensions provide a three-to-five-fold improvement in communication performance over a comparable software-only implementation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1506/CS-TR-94-1506.pdf %R CS-TR-94-1507 %Z Wed, 02 Mar 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography Department of Computer Science Technical Reports, 1963-1993 %A Mashack, Thea %D March 1994 %X This Bibliography lists all the reports published by the Department of Computer Science from 1963 through 1993 %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1507/CS-TR-94-1507.pdf %R CS-TR-94-1508 %Z Tue, 22 Mar 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Inverse Kinematics of a Human Arm %A Kondo, Koichi %D March 1994 %X This paper describes a new inverse kinematics algorithm for a human arm. Potential applications of this algorithm include computer-aided design and concurrent engineering from the viewpoint of human factors. For example, it may be used to evaluate a new design in terms of its usability and to automatically generate instruction videos. The inverse kinematics algorithm is based on a sensorimotor transformation model developed in recent neurophysiological experiments. This method can be applied to both static arm postures and human manipulation motions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1508/CS-TR-94-1508.pdf %R CS-TR-94-1509 %Z Tue, 22 Mar 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Global Price Updates Help %A Goldberg, Andrew V. %A Kennedy, Robert %D March 1994 %X Periodic global updates of dual variables have been shown to yield a substantial speed advantage in implementations of push-relabel algorithms for the maximum flow and minimum cost flow problems. In this paper, we show that in the context of the bipartite matching and assignment problems, global updates yield a theoretical improvement as well. For bipartite matching, a push-relabel algorithm that matches the best bound when global updates are used achieves a bound that is worse by a square root of n factor without the updates. A similar result holds for the assignment problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1509/CS-TR-94-1509.pdf %R CS-TR-94-1510 %Z Tue, 22 Mar 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Key Objects in Garbage Collection %A Hayes, Barry %D March 1994 %X When the cost of global garbage collection in a system grows large, the system can be redesigned to use generational collection. The newly-created objects usually have a much shorter half-life than average, and by concentrating the collector's efforts on them a large fraction of the garbage can be collected at a tiny fraction of the cost. The objects that survive generational collection may still become garbage, and the current practice is to perform occasional global garbage collections to purge these objects from the system, and again, the cost of doing these collections may become prohibitive when the volume of memory increases. Previous research has noted that the objects that survive generational collection often are born, promoted, and collected in large clusters. In this dissertation I show that carefully selected semantically or structurally important key objects can be drawn from the clusters and collected separately; when a key object becomes unreachable, the collector can take this as a hint to collect the cluster from which the key was drawn. To gauge the effectiveness of key objects, their use was simulated in ParcPlace's Objectworks\Smalltalk system. The objects selected as keys were those that, as young objects, had pointers to them stored into old objects. The collector attempts to create a cluster for each key by gathering together all of the objects reachable from that key and >From no previous key. Using this simple heuristic for key objects, the collector finds between 41% and 92% of the clustered garbage in a suite of simple test programs. Except for one program in the suite, about 95% of the time these key objects direct the collector to a cluster that is garbage. The exception should be heeded in improving the heuristics. In a replay of an interactive session, key object collection finds 59% of the clustered garbage and 66% of suggested targets are indeed garbage. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1510/CS-TR-94-1510.pdf %R CS-TR-94-1511 %Z Tue, 19 Apr 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Co-Learning and the Evolution of Social Acitivity %A Shoham, Yoav %A Tennenholtz, Moshe %D April 1994 %X We introduce the notion of co-learning, which refers to a process in which several agents simultaneously try to adapt to one another's behavior so as to produce desirable global system properties. Of particular interest are two specific co-learning settings, which relate to the emergence of conventions and the evolution of cooperation in societies, respectively. We define a basic co-learning rule, called Highest Cumulative Reward (HCR), and show that it gives rise to quite nontrivial system dynamics. In general, we are interested in the eventual convergence of the co-learning system to desirable states, as well as in the efficiency with which this convergence is attained. Our results on eventual convergence are analytic; the results on efficiency properties include analytic lower bounds as well as empirical upper bounds derived from rigorous computer simulations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1511/CS-TR-94-1511.pdf %R CS-TR-94-1512 %Z Tue, 19 Apr 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Abstraction Planning in Real Time %A Washington, Richard %D April 1994 %X When a planning agent works in a complex, real-world domain, it is unable to plan for and store all possible contingencies and problem situations ahead of time. The agent needs to be able to fall back on an ability to construct plans at run time under time constraints. This thesis presents a method for planning at run time that incrementally builds up plans at multiple levels of abstraction. The plans are continually updated by information from the world, allowing the planner to adjust its plan to a changing world during the planning process. All the information is represented over intervals of time, allowing the planner to reason about durations, deadlines, and delays within its plan. In addition to the method, the thesis presents a formal model of the planning process and uses the model to investigate planning strategies. The method has been implemented, and experiments have been run to validate the overall approach and the theoretical model. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1512/CS-TR-94-1512.pdf %R CS-TR-94-1513 %Z Tue, 03 May 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Construction of Normative Decision Models Using Abstract Graph Grammars %A Egar, John W. %D May 1994 %X This dissertation addresses automated assistance for decision analysis in medicine. In particular, I have investigated graph grammars as a representation for encoding how decision-theoretic models can be constructed from an unordered list of concerns. The modeling system that I have used requires a standard vocabulary to generate decision models; the models generated are qualitative, and require subsequent assessment of probabilities and utility values. This research has focused on the modeling of the qualitative structure of problems given a standard vocabulary and given that subsequent assessment of probabilities and utilities is possible. The usefulness of the graph-grammar representation depends on the graph-grammar formalism's ability to describe a broad spectrum of qualitative decision models, on its ability to maintain a high quality in the models it generates, and on its clarity in describing topological constraints to researchers who design and maintain the actual grammar. I have found that graph grammars can be used to generate automatically decision models that are comparable to those produced by decision analysts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1513/CS-TR-94-1513.pdf %R CS-TR-94-1514 %Z Wed, 11 May 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads %A Hailperin, Max %D May 1994 %X This thesis provides design and analysis of techniques for global load balancing on ensemble architectures running soft-real-time object-oriented applications with statistically periodic loads. It focuses on estimating the instantaneous average load over all the processing elements. The major contribution is the use of explicit stochastic process models for both the loading and the averaging itself. These models are exploited via statistical time-series analysis and Bayesian inference to provide improved average load estimates, and thus to facilitate global load balancing. This thesis explains the distributed algorithms used and provides some optimality results. It also describes the algorithms' implementation and gives performance results from simulation. These results show that our techniques allow more accurate estimation of the global system loading, resulting in fewer object migrations than local methods. Our method is shown to provide superior performance, relative not only to static load-balancing schemes but also to many adaptive load-balancing methods. Results from a preliminary analysis of another system and from simulation with a synthetic load provide some evidence of more general applicability. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1514/CS-TR-94-1514.pdf %R CS-TR-94-1515 %Z Mon, 06 Jun 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Retrieving Semantically Distant Analogies %A Wolverton, Michael %D June 1994 %X Techniques that have traditionally been useful for retrieving same-domain analogies from small single-use knowledge bases, such as spreading activation and indexing on selected features, are inadequate for retrieving cross-domain analogies from large multi-use knowledge bases. Blind or near-blind search techniques like spreading activation will be overwhelmed by combinatorial explosion as the search goes deeper into the KB. And indexing a large multi-use KB on salient features is impractical, largely because a feature that may be useful for retrieval in one task may be useless for another task. This thesis describes Knowledge-Directed Spreading Activation (KDSA), a method for retrieving analogies in a large semantic network. KDSA uses task-specific knowledge to guide a spreading activation search to a case or concept in memory that meets a desired similarity condition. The thesis also describes a specific instantiation of this method for the task of innovative design. KDSA has been validated in two ways. First, a theoretical model of knowledge base search demonstrates that KDSA is tractable for retrieving semantically distant analogies under a wide range of knowledge base configurations. Second, an implemented system that uses KDSA to find analogies for innovative design shows that the method is able to retrieve semantically distant analogies for a real task. Experiments with that system show trends as the knowledge base size grows that suggest the theoretical model's prediction of large knowledge base tractability is accurate. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1515/CS-TR-94-1515.pdf %R CS-TR-94-1516 %Z Mon, 08 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Framework for Reasoning Precisely with Vague Concepts %A Goyal, Nita %D May 1994 %X Many knowledge-based systems need to represent vague concepts such as ``old'' and ``tall''. The practical approach of representing vague concepts as precise intervals over numbers (e.g., ``old'' as the interval [70,110]) is well-accepted in Artificial Intelligence. However, there have been no systematic procedures, but only ad hoc methods to delimit the boundaries of intervals representing the vague predicates. A key observation is that the vague concepts and their interval boundaries are constrained by the underlying domain knowledge. Therefore, any systematic approach to assigning interval boundaries must take the domain knowledge into account. Hence, in the dissertation, we present a framework to represent the domain knowledge and exploit it to reason about the interval boundaries via a query language. This framework is comprised of a constraint language to represent logical constraints on vague concepts, as well as numerical constraints on the interval boundaries; a query language to request information about the interval boundaries; and an algorithm to answer the queries. The algorithm preprocesses the constraints by extracting the numerical information from the logical constraints and combines them with the given numerical constraints. We have implemented the framework and applied it to medical domain to illustrate its usefulness. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1516/CS-TR-94-1516.pdf %R CS-TR-94-1517 %Z Tue, 09 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reactive, Generative and Stratified Models of Probabilistic Processes %A Glabbeek, Rob J. van %A Smolka, Scott A. %A Steffen, Bernhard %D July 1994 %X We introduce three models of probabilistic processes, namely, reactive, generative and stratified. These models are investigated within the context of PCCS, an extension of Milner's SCCS in which each summand of a process summation expression is guarded by a probability and the sum of these probabilities is 1. For each model we present a structural operational semantics of PCCS and a notion of bisimulation equivalence which we prove to be a congruence. We also show that the models form a hierarchy: the reactive model is derivable from the generative model by abstraction from the relative probabilities of different actions, and the generative model is derivable from the stratified model by abstraction from the purely probabilistic branching structure. Moreover the classical nonprobabilistic model is derivable from each of these models by abstraction from all probabilities. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1517/CS-TR-94-1517.pdf %R CS-TR-94-1518 %Z Fri, 12 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T STeP: The Stanford Temporal Prover %A Manna, Z ohar %A Anuchitanukul, Anuchit %A Bjorner, Nikolaj %A Browne, Anca %A Chang, Edward %A Colon, Michael %A de Alfaro, Luca %A Devarajan, Harish %A Sipma, Henny %A Uribe, Tomas %D June 1994 %X We describe the Stanford Temporal Prover (STeP), a system being developed to support the computer-aided formal verification of concurrent and reactive systems based on temporal specifications. Unlike systems based on model-checking, STeP is not restricted to finite-state systems. It combines model checking and deductive methods to allow the verification of a broad class of systems, including programs with infinite data domains, N-process programs, and N-component circuit designs, for arbitrary N. In short, STeP has been designed with the objective of combining the expressiveness of deductive methods with the simplicity of model checking. The verification process is for the most part automatic. User interaction occurs mostly at the highest, most intuitive level, primarily through a graphical proof language of verification diagrams. Efficient simplification methods, decision procedures, and invariant generation techniques are then invoked automatically to prove resulting first-order verification conditions with minimal assistance. We describe the performance of the system when applied to several examples, including the N-process dining philosopher's program, Szymanski's N-process mutual exclusion algorithm, and a distributed N-way arbiter circuit. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1518/CS-TR-94-1518.pdf %R CS-TR-94-1519 %Z Fri, 26 Aug 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces %A Kavraki, Lydia %A Svestka, Petr %A Latombe, Jean-Claude %A Overmars, Mark %D August 1994 %X A new motion planning method for robots in static workspaces is presented. This method proceeds according to two phases: a learning phase and a query phase. In the learning phase, a probabilistic roadmap is constructed and stored as a graph whose nodes correspond to collision-free configurations and edges to feasible paths between these configurations. These paths are computed using a simple and fast local planner. In the query phase, any given start and goal configurations of the robot are connected to two nodes of the roadmap; the roadmap is then searched for a path joining these two nodes. The method is general and easy to implement. It can be applied to virtually any type of holonomic robot. It requires selecting certain parameters (e.g., the duration of the learning phase) whose values depend on the considered scenes, that is the robots and their workspaces. But these values turn out to be relatively easy to choose. Increased efficiency can also be achieved by tailoring some components of the method (e.g., the local planner) to the considered robots. In this paper the method is applied to planar articulated robots with many degrees of freedom. Experimental results show that path planning can be done in a fraction of a second on a contemporary workstation (approximately 150 MIPS), after learning for relatively short periods of time (a few dozen seconds). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1519/CS-TR-94-1519.pdf %R CS-TR-94-1520 %Z Thu, 08 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Adaptive Optimization for SELF: Reconciling High Performance with Exploratory Programming %A Holzle, Urs %D August 1994 %X Crossing abstraction boundaries often incurs a substantial run-time overhead in the form of frequent procedure calls. Thus, pervasive use of abstraction, while desirable from a design standpoint, may lead to very inefficient programs. Aggressively optimizing compilers can reduce this overhead but conflict with interactive programming environments because they introduce long compilation pauses and often preclude source-level debugging. Thus, programmers are caught on the horns of two dilemmas: they have to choose between abstraction and efficiency, and between responsive programming environments and efficiency. This dissertation shows how to reconcile these seemingly contradictory goals. Four new techniques work together to achieve this: - Type feedback achieves high performance by allowing the compiler to inline message sends based on information extracted from the runtime system. - Adaptive optimization achieves high responsiveness without sacrificing performance by using a fast compiler to generate initial code while automatically recompiling heavily used program parts with an optimizing compiler. - Dynamic deoptimization allows source-level debugging of optimized code by transparently recreating non-optimized code as needed. - Polymorphic inline caching speeds up message dispatch and, more significantly, collects concrete type information for the compiler. With better performance yet good interactive behavior, these techniques reconcile exploratory programming, ubiquitous abstraction, and high performance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1520/CS-TR-94-1520.pdf %R CS-TR-94-1521 %Z Thu, 08 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Chu Spaces : A Model for Concurrency %A Gupta, Vineet %D August 1994 %X A Chu space is a binary relation between two sets. In this thesis we show that Chu spaces form a non-interleaving model of concurrency which extends event structures while endowing them with an algebraic structure whose natural logic is linear logic. We provide several equivalent definitions of Chu spaces, including two pictorial representations. Chu spaces represent processes as automata or schedules, and Chu duality gives a simple way of converting between schedules and automata. We show that Chu spaces can represent various concurrency concepts like conflict, temporal precedence and internal and external choice, and they distinguish between causing and enabling events. We present a process algebra for Chu spaces including the standard combinators like parallel composition, sequential composition, choice, interaction, restriction, and show that the various operational identities between these hold for Chu spaces. The solution of recursive domain equations is possible for most of these operations, giving us an expressive specification and programming language. We define a history preserving equivalence between Chu spaces, and show that it preserves the causal structure of a process. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1521/CS-TR-94-1521.pdf %R CS-TR-94-1523 %Z Thu, 08 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Implementing Push-Relabel Method for the Maximum Flow Problem %A Cherkassky, Boris V. %A Goldberg, Andrew V. %D September 1994 %X We study efficient implementations of the push-relabel method for the maximum flow problem. The resulting codes are faster than the previous codes, and much faster on some problem families. The speedup is due to the combination of heuristics used in our implementation. We also exhibit a family of problems for which all known methods seem to have almost quadratic time growth rate. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1523/CS-TR-94-1523.pdf %R CS-TR-94-1524 %Z Thu, 08 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Continuous Verification by Discrete Reasoning %A de Alfaro, Luca %A Manna, Z ohar %D September 1994 %X Two semantics are commonly used for the behavior of real-time and hybrid systems: a discrete semantics, in which the temporal evolution is represented as a sequence of snapshots describing the state of the system at certain times, and a continuous semantics, in which the temporal evolution is represented by a series of time intervals, and therefore corresponds more closely to the physical reality. Powerful verification rules are known for temporal logic formulas based on the discrete semantics. This paper shows how to transfer the verification techniques of the discrete semantics to the continuous one. We show that if a temporal logic formula has the property of finite variability, its validity in the discrete semantics implies its validity in the continuous one. This leads to a verification method based on three components: verification rules for the discrete semantics, axioms about time, and some temporal reasoning to bring the results together. This approach enables the verification of properties of real-time and hybrid systems with respect to the continuous semantics. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1524/CS-TR-94-1524.pdf %R CS-TR-94-1525 %Z Mon, 12 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Differential BDDs %A Anuchitanukul, Anuchit %A Manna, Z ohar %D September 1994 %X In this paper, we introduce a class of Binary Decision Diagrams (BDDs) which we call Differential BDDs (DBDDs), and two transformations over DBDDs, called Push-up and Delta transformations. In DBDDs and its derived classes such as Push-up DBDDs or Delta DBDDs, in addition to the ordinary node-sharing in the normal Ordered Binary Decision Diagrams (OBDDs), some isomorphic substructures are collapsed together forming an even more compact representation of boolean functions. The elimination of isomorphic substructures coincides with the repetitive occurrences of the same or similar small components in many applications of BDDs such as in the representation of hardware circuits. The reduction in the number of nodes, from OBDDs to DBDDs, is potentially exponential while boolean manipulations on DBDDs remain efficient. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1525/CS-TR-94-1525.pdf %R CS-TR-94-1526 %Z Fri, 16 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Combining Experiential and Theoretical Knowledge in the Domain of Semiconductor Manufacturing %A Mohammed, John Llewelyn %D September 1994 %X Semiconductor Manufacturing is characterized by complexity and continual, rapid change. These characteristics reduce the effectiveness of traditional diagnostic expert systems: the knowledge represented cannot adapt to changes in the manufacturing plan because the dependence of the knowledge on the plan is not explicitly represented. It is impractical to manually encode all the dependencies in a complex plan. We address this problem in two ways. First, we employ model-based techniques to encode theoretical knowledge, so that symbolic simulation of a new manufacturing plan can automatically glean diagnostic information. Our representation is sufficiently detailed to capture the plan's inherent causal dependencies, yet sufficiently abstract to make symbolic simulation practical. This theoretical knowledge can adapt to changes in the manufacturing plan. However, the expressiveness and tractability of our representational machinery limit the range of phenomena that we can represent. Second, we describe Generic Rules, which combine the expressiveness of heuristic rules with the robustness of theoretical models. Generic Rules are general patterns for heuristic rules, associated with model-based restrictions on the situations in which the patterns can be instantiated to form rules for new contexts. In this way, theoretical knowledge is employed to encode the dependence of heuristic knowledge on the manufacturing plan. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1526/CS-TR-94-1526.pdf %R CS-TR-94-1527 %Z Wed, 12 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T From Knowledge to Belief %A Koller, Daphne %D October 1994 %X When acting in the real world, an intelligent agent must make decisions under uncertainty. The standard solution requires it to assign degrees of belief to the relevant assertions. These should be based on the agent's knowledge. For example, a doctor deciding on the treatment for a patient should use information about that patient, statistical correlations between symptoms and diseases, default rules, and more. The random-worlds method induces degrees of belief from very rich knowledge bases, expressed in a language that augments first-order logic with statistical statements and default rules (interpreted as qualitative statistics). The method is based on the principle of indifference, treating all possible worlds as equally likely. It naturally derives important patterns of reasoning such as specificity, inheritance, indifference to irrelevant information, and a default assumption of independence. Its expressive power and intuitive semantics allow it to deal well with examples that are too complex for most other reasoning systems. We use techniques from finite model theory to analyze the computational aspects of random worlds. The problem of computing degrees of belief is undecidable in general. However, for unary knowledge bases, a tight connection to the principle of maximum entropy often allows us to compute degrees of belief. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1527/CS-TR-94-1527.pdf %R CS-TR-94-1528 %Z Thu, 27 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Architecture-Altering Operations for Evolving the Architecture of a Multi-Part Program in Genetic Programming %A Koza, John R. %D October 1994 %X Previous work described a way to evolutionarily select the architecture of a multi-part computer program >From among preexisting alternatives in the population while concurrently solving a problem during a run of genetic programming. This report describes six new architecture-altering operations that provide a way to evolve the architecture of a multi-part program in the sense of actually changing the architecture of programs dynamically during the run. The new architecture-altering operations are motivated by the naturally occurring operation of gene duplication as described in Susumu Ohno's provocative 1970 book Evolution by Means of Gene Duplication as well as the naturally occurring operation of gene deletion. The six new architecture-altering operations are branch duplication, argument duplication, branch creation, argument creation, branch deletion and argument deletion. A connection is made between genetic programming and other techniques of automated problem solving by interpreting the architecture-altering operations as providing an automated way to specialize and generalize programs. The report demonstrates that a hierarchical architecture can be evolved to solve an illustrative symbolic regression problem using the architecture- altering operations. Future work will study the amount of additional computational effort required to employ the architecture-altering operations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1528/CS-TR-94-1528.pdf %R CS-TR-94-1529 %Z Mon, 31 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A knowledge-based method for temporal abstraction of clinical data %A Shahar, Yuval %D October 1994 %X This dissertation describes a domain-independent method specific to the task of abstracting higher-level concepts from time-stamped data. The framework includes a model of time, parameters, events and contexts. I applied my framework to several domains of medicine. My goal is to create, from time-stamped patient data, interval-based temporal abstractions such as "severe anemia for 3 weeks in the context of administering AZ T." The knowledge-based temporal-abstraction method decomposes the task of abstracting higher-level abstractions from input data into five subtasks. These subtasks are solved by five domain-independent temporal-abstraction mechanisms. The temporal-abstraction mechanisms depend on four domain-specific knowledge types. I implemented the knowledge-based temporal-abstraction method in the RESUME system. RESUME accepts input and returns output at all levels of abstraction; accepts input out of temporal order, modifying a view of the past or of the present, as necessary; generates context-sensitive, controlled output; and maintains several possible concurrent interpretations of the data. I evaluated RESUME in the domains of protocol-based care, monitoring of children's growth, and therapy of diabetes. A formal specification of a domain's temporal-abstraction knowledge supports acquisition, maintenance, reuse, and sharing of that knowledge. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1529/CS-TR-94-1529.pdf %R CS-TR-94-1530 %Z Wed, 09 Nov 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Computing Multi-Arm Manipulation Trajectories %A Koga, Yoshihito %D October 1994 %X This dissertation considers the manipulation task planning problem of automatically generating the trajectories for several cooperating robot arms to manipulate a movable object to a goal location among obstacles. The planner must reason that the robots may need to change their grasp of the object to complete the task, for example, by passing it from one arm to another. Furthermore, the computed velocities and accelerations of the arms must satisfy the limits of the actuators. Past work strongly suggests that solving this problem in a rigorous fashion is intractable. We address this problem in a practical two-phase approach. In step one, using a heuristic we compute a collision-free path for the robots and the movable object. For the case of multiple robot arms with many degrees of freedom, this step may fail to find the desired path even though it exists. Despite this limitation, experimental results of the implemented planner (for solving step one) show that it is efficient and reliable; for example, the planner is able to find complex manipulation motions for a system with seventy eight degrees of freedom. In step two, we then find the time-parameterization of the path such that the dynamic constraints on the robot are satisfied. In fact, we find the time-optimal solution for the given path. We show simulation results for various complex examples. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1530/CS-TR-94-1530.pdf %R CS-TR-94-1531 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T On-Line Manipulation Planning for Two Robot Arms in a Dynamic Environment %A Li, Tsai-Yen %A Latombe, Jean-Claude %D December 1994 %X In a constantly changing and partially unpredictable environment, robot motion planning must be on-line. The planner receives a continuous flow of information about occurring events and generates new plans, while previously planned motions are being executed. This paper describes an on-line planner for two cooperating arms whose task is to grab parts of various types on a conveyor belt and transfer them to their respective goals while avoiding collision with obstacles. Parts arrive on the belt in random order, at any time. Both goals and obstacles may be dynamically changed. This scenario is typical of manufacturing cells serving machine-tools, assembling products, or packaging objects. The proposed approach breaks the overall planning problem into subproblems, each involving a low-dimensional configuration or configuration-time space, and orchestrates very fast primitives solving these subproblems. The resulting planner has been implemented and extensively tested in a simulated environment, as well as with a real dual-arm system. Its competitiveness has been evaluated against an oracle making (almost) the best decision at any one time; the results show that the planner compares extremely well. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1531/CS-TR-94-1531.pdf %R CS-TR-94-1533 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Randomized Query Processing in Robot Motion Planning %A Raghavan, L. Kavraki, J-C. Latombe, R. Motwani, P. %D December 1994 %X The subject of this paper is the analysis of a randomized preprocessing scheme that has been used for query processing in robot motion planning. The attractiveness of the scheme stems from its general applicability to virtually any motion-planning problem, and its empirically observed success. In this paper we initiate a theoretical basis for explaining this empirical success. Under a simple assumption about the configuration space, we show that it is possible to perform a preprocessing step following which queries can be answered quickly. En route, we pose and give solutions to related problems on graph connectivity in the evasiveness model, and art-gallery theorems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1533/CS-TR-94-1533.pdf %R CS-TR-94-1522 %Z Thu, 08 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Compositional Verification of Reactive and Real-time Systems %A Chang, Edward %D December 1993 %X This thesis presents a compositional methodology for the verification of reactive and real-time systems. The correctness of a given system is established from the correctness of the system's components, each of which may be treated as a system itself and further reduced. When no further reduction is possible or desirable, global techniques for verification may be used to verify the bottom-level components. Transition modules are introduced as a suitable compositional model of computation. Various composition operations are defined on transition modules, including parallel composition, sequential composition, and iteration. A restricted assumption-guarantee style of specification is advocated, wherein the environment assumption is stated as a restriction on the environment's next-state relation. Compositional proof rules are provided in accordance with the safety-progress hierarchy of temporal properties. The compositional framework is then extended naturally to real-time transition modules and discrete-time metric temporal logic. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1522/CS-TR-94-1522.pdf %R CS-TR-94-1532 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Planning the Collision-Free Paths of an Actively Flexible Manipulator %A Banon, Jose %D December 1994 %X Most robot manipulators consist of a small sequence of rigid links connected by articulated joints. However, robot dexterity is considerably enhanced when the number of joints is large or infinite. Additional joints make it possible to manipulate objects in cluttered environments where non-redundant robots are useless. In this paper we consider a simulated actively flexible manipulator (AFM), i.e. a manipulator whose flexibility can be directly controlled by its actuators. We propose an efficient method for planning the collision-free paths of an AFM in a three-dimensional workspace. We implemented this method on a graphic workstation and experimented with it on several examples. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/94/1532/CS-TR-94-1532.pdf %R CS-TR-91-1350 %Z Thu, 01 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem solving seminar. %A Chang, Edward %A Phillips, Steven J. %A Ullman, Jeffrey D. %D February 1991 %X This report contains transcripts of the classroom discussions of Stanford's Computer Science problem solving course for Ph.D. students, CS304, during Winter quarter 1990, and the first CS204 class for undergraduates, in the Spring of 1990. The problems, and the solutions offered by the classes, span a large range of ideas in computer science. Since they constitute a study both of programming and research paradigms, and of the problem solving process, these notes may be of interest to students of computer science, as well as computer science educators. The present report is the ninth in a series of such transcripts, continuing the tradition established in STAN-CS-77-606 (Michael J. Clancy, 1977), STAN-CS-79-707 (Chris Van Wyk, 1979), STAN-CS-81-863 (Allan A. Miller, 1981), STAN-CS-83-989 (Joseph S. Weening, 1983), STAN-CS-83-990 (John D. Hobby, 1983), STAN-CS-85-1055 (Ramsey W. Haddad, 1985), STAN-CS-87-1154 (Tomas G. Rokicki, 1987), and STAN-CS-89-1269 (Kenneth A. Ross, 1989). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1350/CS-TR-91-1350.pdf %R CS-TR-91-1351 %Z Thu, 01 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sequence vs. pipeline parallel multiple joins in Paradata %A Z hu, Liping %A Keller, Arthur M. %A Wiederhold, Gio %D February 1991 %X In this report we analyze and compare hash-join based parallel multi-join algorithms for sequenced and pipelined processing. The BBN Butterfly machine serves as the host for the performance analysis. The sequenced algorithm handles the multiple join operations in a conventional sequenced manner, except that it distributes the work load of each operation among all processors. The pipelined algorithms handle the different join operations in parallel, by dividing the processors into several groups, with the data flowing through these groups. The detailed timing tests revealed the bus/memory contention that grows linearly with the number of processors. The existence of such a contention leads to an optimal region for the number of processors, given the join operands fixed. We present the analytical and experimental formulae for both algorithms, which incorporate this contention. We discuss the way of finding an optimal point, and give the heuristics for choosing the best processor's partition in pipelined processing. The study shows that the pipelined algorithms produce the first joined result sooner than the sequenced algorithm and need less memory to store the intermediate result. The sequenced algorithm, on the other hand, takes less time to finish the whole join operations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1351/CS-TR-91-1351.pdf %R CS-TR-91-1359 %Z Thu, 01 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T The benefits of relaxing punctuality %A Alur, Rajeev %A Feder, Tomas %A Henzinger, Thomas A. %D May 1991 %X The most natural, compositional way of modeling real-time systems uses a dense domain for time. The satisfiability of real-time constraints that are capable of expressing punctuality in this model is, however, known to be undecidable. We introduce a temporal language that can constrain the time difference between events only with finite (yet arbitrary) precision and show the resulting logic to be EXPSPACE-complete. This result allows us to develop an algorithm for the verification of timing properties of real-time systems with a dense semantics. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1359/CS-TR-91-1359.pdf %R CS-TR-91-1360 %Z Thu, 01 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sooner is safer than later. %A Henzinger, Thomas A. %D May 1991 %X It has been repeatedly observed that the standard safety-liveness classification of properties of reactive systems does not fit for real-time properties. This is because the implicit "liveness" of time shifts the spectrum towards the safety side. While, for example, response--that "something good" will happen, eventually--is a classical liveness property, bounded response--that "something good" will happen soon, within a certain amount of time--has many characteristics of safety. We account for this phenomenon formally by defining safety and liveness relative to a given condition, such as the progress of time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1360/CS-TR-91-1360.pdf %R CS-TR-91-1369 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Approximating matchings in parallel %A Fischer, Ted %A Goldberg, Andrew V. %A Plotkin, Serge %D June 1991 %X We show that for any constant k > O, a matching with cardinality at least 1 - 1/(k+1) times the maximum can be computed in NC. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1369/CS-TR-91-1369.pdf %R CS-TR-91-1370 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T An NQTHM mechanization of "An Exercise in the Verification of Multi-Process Programs" %A Nagayama, Misao %A Talcott, Carolyn %D June 1991 %X This report presents a formal verification of the local correctness of a mutex algorithm using the Boyer-Moore theorem prover. The formalization follows closely an informal proof of Manna and Pnuelli. The proof method of Manna and Pnueli is to first extract from the program a set of states and induced transition system. One then proves suitable invariants. There are two variants of the proof. In the first (atomic) variant, compound tests involving quantification over a finite set are viewed as atomic operations. In the second (molecular) variant, this assumption is removed, making the details of the transitions and proof somewhat more complicated. The original Manna-Pnueli proof was formulated in terms of finite sets. This led to concise and elegant informal proof, however one that is not easy to mechanize in the Boyer-Moore logic. In the mechanized version we use a dual isomorphic representation of program states based on finite sequences. Our approach was to outline the formal proof of each invariant, making explicit the case analyses, assumptions and properties of operations used. The outline served as our guide in developing the formal proof. The resulting sequence of events follows the informal plan quite closely. The main difficulties encountered were in discovering the precise form of the lemmas and hints necess to guide the theorem prover. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1370/CS-TR-91-1370.pdf %R CS-TR-91-1374 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Polynomial dual network simplex algorithms %A Orlin, James B. %A Plotkin, Serge A. %A Tardos, Eva %D August 1991 %X We show how to use polynomial and strongly polynomial capacity scaling algorithms for the transshipment problem to design a polynomial dual network simplex pivot rule. Our best pivoting strategy leads to an O(m2 log n) bound on the number of pivots, where n and m denotes the number of nodes and arcs in the input network. If the demands are integral and at most B, we also give an O(m(m + n log n) min(log nB, m log n))-time implementation of a strategy that requires somewhat more pivots. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1374/CS-TR-91-1374.pdf %R CS-TR-91-1375 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast approximation algorithms for multicommodity flow problems %A Leighton, Tom %A Makedon, Fillia %A Plotkin, Serge %A Stein, Clifford %A Tardos, Eva %A Tragoudas, Spyros %D August 1991 %X In this paper, we describe the first polynomial-time combinatorial algorithms for approximately solving the multicommodity flow problem. Our algorithms are significantly faster than the best previously known algorithms, that were based on linear programming. For a k-commodity multicommodity flow problem, the running time of our randomized algorithm is (up to log factors) the same as the time needed to solve k single-commodity flow problems, thus giving the surprising result that approximately computing a k-commodity maximum-flow is not much harder than computing about k single-commodity maximum-flows in isolation. Given any multicommodity flow problem as input, our algorithm is guaranteed to provide a feasible solution to a modified flow problem in which all capacities are increased by a (1 + epsilon)-factor, or to provide a proof that there is no feasible solution to the original problem. We also describe faster approximation algorithms for multicommodity flow problems with a special structure, such as those that arise in the "sparsest cut" problems and the uniform concurrent flow problems if k <= the square root of m. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1375/CS-TR-91-1375.pdf %R CS-TR-91-1377 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T An evaluation of left-lookikng, right-looking and multifrontal approaches to sparse Cholesky factorization on hierarchical memory machines %A Rothberg, Edward %A Gupta, Anoop %D August 1991 %X In this paper we present a comprehensive analysis of the performance of a variety of sparse Cholesky factorization methods on hierarchical-memory machines. We investigate methods that vary along two different axes. Along the first axis, we consider three different high-level approaches to sparse factorization: left-looking, right-looking, and multifrontal. Along the second axis, we consider the implementation of each of these high-level approaches using different sets of primitives. The primitives vary based on the structures they manipulate. One important structure in sparse Cholesky factorization is a single column of the matrix. We first consider primitives that manipulate single columns. These are the most commonly used primitives for expressing the sparse Cholesky computation. Another important structure is the supernode, a set of columns with identical non-zero structures. We consider sets of primitives that exploit the supemodal structure of the matrix to varying degrees. We find that primitives that manipulate larger structures greatly increase the amount of exploitable data reuse, thus leading to dramatically higher perfommance on hierarchical-memory machines. We observe performance increases of two to three times when comparing methods based on primitives that make extensive use of the supernodal structure to methods based on primitives that manipulate columns. We also find that the overall approach (left-looking, right-looking, or multifrontal) is less important for performance than the particular set of primitives used to implement the approach. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1377/CS-TR-91-1377.pdf %R CS-TR-91-1381 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Implementing hypertext database relationships through aggregations and exceptions %A Hara, Yoshinori %A Keller, Arthur M. %A Rathmann, Peter K. %A Wiederhold, Gio %D September 1991 %X In order to combine hypertext with database facilities, we show how to extract an effective storage structure from given instance relationships. The schema of the structure recognizes clusters and exceptions. Extracting high-level structures is useful for providing a high performance browsing environment as well as efficient physical database design, especially when handling large amounts of data. This paper focuses on a clustering method, ACE, which generates aggregations and exceptions from the original graph structure in order to capture high level relationships. The problem of minimizing the cost function is NP-complete. We use a heuristic approach based on an extended Kernighan-Lin algorithm. We demonstrate our method on a hypertext application and on a standard random graph, compared with its analytical model. The storage reductions of input database size in main memory were 77.2% and 12.3%, respectively. It was also useful for secondary storage organization for efflcient retrieval. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1381/CS-TR-91-1381.pdf %R CS-TR-91-1383 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Temporal proof methodologies for real-time systems %A Henzinger, Thomas A. %A Manna, Z ohar %A Pnueli, Amir %D September 1991 %X We extend the specification language of temporal logic, the corresponding verification framework, and the underlying computational model to deal with real-time properties of reactive systems. The abstract notion of timed transition systems generalizes traditional transition systems conservatively: qualitative fairness requirements are replaced (and superseded) by quantitative lower-bound and upper-bound timing constraints on transitions. This framework can model real-time systems that communicate either through shared variables or by message passing and real-time issues such as time-outs, process priorities (interrupts), and process scheduling. We exhibit two styles for the specification of real-time systems. While the first approach uses bounded versions of temporal operators, the second approach allows explicit references to time through a special clock variable. Corresponding to the two styles of specification, we present and compare two fundamentally different proof methodologies for the verification of timing requirements that are expressed in these styles. For the bounded-operatoT style, we provide a set of proof rules for establishing bounded-invariance and bounded-response properties of timed transition systems. This approach generalizes the standard temporal proof rules for verifying invariance and response properties conservatively. For the explicit-clock style, we exploit the observation that every time-bounded property is a safety property and use the standard temporal proof rules for establishing safety properties. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1383/CS-TR-91-1383.pdf %R CS-TR-91-1387 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Assembling polyhedra with single translations %A Wilson, Randall %A Schweikard, Achim %D October 1991 %X The problem of partitioning an assembly of polyhedral objects into two subassemblies that can be separated arises in assembly planning. We describe an algorithm to compute the set of all translations separating two polyhedra with n vertices in O(n4) steps and show that this is optimal. Given an assembly of k polyhedra with a total of n vertices, an extension of this algorithm identifies a valid translation and removable subassembly in O(k2 n4) steps if one exists. Based on the second algorithm a polynomial time method for finding a complete assembly sequence consisting of single translations is derived. An implementation incorporates several changes to achieve better average-case performance; experimental results obtained for composite objects consisting of isothetic polyhedra are described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1387/CS-TR-91-1387.pdf %R CS-TR-91-1389 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T The AGENT0 manual %A Torrance, Mark C. %A Viola, Paul A. %D April 1991 %X This document describes an implementation of AOP, an interpreter for programs written in a language called AGENTO. AGENTO is a first stab at a programming language for the paradigm of Agent-Oriented Programming. It is currently under development at Stanford under the direction of Yoav Shoham. This implementation is the work of Paul A. Viola of MIT and Mark C. Torrance of Stanford. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1389/CS-TR-91-1389.pdf %R CS-TR-91-1391 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A logic for perception and belief %A Shoham, Yoav %A del Val, Alvaro %D September 1991 %X We present a modal logic for reasoning about perception and belief, captured respectively by the operators P and B. The B operator is the standard belief operator used in recent years, and the P operator is similarly defined. The contribution of the paper is twofold. First, in terms of P we provide a definition of perceptual indistinguishability, such as arises out of limited visual acuity. The definition is concise, intuitive (we find), and avoids traditional paradoxes. Second, we explore the bimodal B--P system. We argue that the relationship between the two modalities varies among settings: The agent may or may not have confidence in its perception, may or may not be accurate in it, and so on. We therefore define a number of agent types corresponding to these various assumptions, and for each such agent type we provide a sound and complete axiomatization of the B--P system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1391/CS-TR-91-1391.pdf %R CS-TR-91-1392 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A classification of update methods for replicated databases %A Ceri, Stefano %A Houtsma, Maurice A. W. %A Keller, Arthur M. %A Samarati, Pierangela %D October 1991 %X In this paper we present a classification of the methods for updating replicated databases. The main contribution of this paper is to present the various methods in the context of a structured taxonomy, which accommodates very heterogeneous methods. Classes of update methods are presented through their general properties, such as the invariants that hold for them. Methods are reviewed both in their normal and abnormal behaviour (e.g., after a network partition). We show that several methods presented in the literature, sometimes in independent papers with no cross-reference, are indeed very much related, for instance because they share the same basic technique. We also show in what sense they diverge from the basic technique. This classification can serve as a basis for choosing the method that is most suitable to a specific application. It can also be used as a guideline to researchers who aim at developing new mechanisms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1392/CS-TR-91-1392.pdf %R CS-TR-91-1394 %Z Fri, 02 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Application-controlled physical memory using external page-cache management %A Harty, Kieran %A Cheriton, David R. %D October 1991 %X Next generation computer systems will have gigabytes of physical memory and processors in the 100 MIPS range or higher. Contrary to some conjectures, this trend requires more sophisticated memory management support for memory-bound computations such as scientific simulations and systems such as large-scale database systems, even though memory management for most programs will be less of a concern. We describe the design, implementation and evaluation of a virtual memory system that provides application control of physical memory using external page-cache management. In this approach, a sophisticated application is able to monitor and control the amount of physical memory it has available for execution, the exact contents of this memory, and the scheduling and nature of page-in and page-out using the abstraction of a physical page cache provided by the kernel. We claim that this approach can significantly improve performance for many memory-bound applications while reducing kernel complexity, yet does not complicate other applications or reduce their performance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/91/1394/CS-TR-91-1394.pdf %R CS-TR-89-1267 %Z Tue, 27 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A really temporal logic. %A Alur, Rajeev %A Henzinger, Thomas A. %D July 1989 %X We introduce a real-time temporal logic for the specification of reactive systems. The novel feature of our logic, TPTL, is the adoption of temporal operators as quantifiers over time variables; every modality binds a variable to the time(s) it refers to. TPTL is demonstrated to be both a natural specification language as well as a suitable formalism for verification and synthesis. We present a tableau-based decision procedure and model-checking algorithm for TPTL. Several generalizations of TPTL are shown to be highly undecidable. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1267/CS-TR-89-1267.pdf %R CS-TR-89-1248 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficiency of the network simplex algorithm for the maximum flow problem %A Goldberg, Andrew V. %A Grigoriadis, Michael D. %A Tarjan, Robert E. %D February 1989 %X Goldfarb and Hao have proposed a network simplex algorithm that will solve a maximum flow problem on an n-vertex, m-arc network in at most nm pivots and O(n2m) time. In this paper we describe how to implement their algorithm to run in O(nm log n) time by using an extension of the dynamic tree data structure of Sleator and Tarjan. This bound is less than a logarithmic factor larger than that of any other known algorithm for the problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1248/CS-TR-89-1248.pdf %R CS-TR-89-1250 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A sound and complete axiomatization of operational equivalence between programs with memory %A Mason, Ian %A Talcott, Carolyn %D March 1989 %X In this paper we present a formal system for deriving assertions about programs with memory. The assertions we consider are of the following three forms: (i) e diverges (i.e. fails to reduce to a value), written $\arru e$; (ii) $e_O$ and $e_1$ reduce to the same value and have exactly the same effect on memory, written $e_O \bksimlr e_1$; and (iii) $e_O$ and $e_1$ reduce to the same value and have the same effect on memory up to production of garbage (are strongly isomorphic), written $_O \bksimeq e_1$. The e, $e_j$ are expressions of a first-order Scheme- or Lisp-like language with the data operations atom, eq, car, cdr, cons, setcar, setcdr, the control primitives let and if, and recursive definition of function symbols. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1250/CS-TR-89-1250.pdf %R CS-TR-89-1255 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T METAFONTware %A Knuth, Donald E. %A Rokicki, Tomas G. %A Samuel, Arthur L. %D May 1989 %X This report contains the complete WEB documentation for four utility programs that are often used in conjunction with METAFONT: GFtype, GFtoPK, GFtoDVI, and MFT. This report is analogous to TeXware, published in 1986 (STAN-CS-86-1097). METAFONTware completes the set. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1255/CS-TR-89-1255.pdf %R CS-TR-89-1259 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interior-point methods in parallel computation %A Goldberg, Andrew V. %A Plotkin, Serge A. %A Shmoys, David B. %A Tardos, Eva %D May 1989 %X ln this paper we use interior-point methods for linear programing, developed in the context of sequential computation, to obtain a parallel algorithm for the bipartite matching problem. Our algorithm runs in $O^n$(SQRT m) time. Our results extend to the weighted bipartite matching problem and to the zero-one minimum-cost flow problem, yielding $O^n$((SQRT m) log C) algorithms. This improves previous bounds on these problems and illustrates the importance of interior-point methods in the context of parallel algorithm design. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1259/CS-TR-89-1259.pdf %R CS-TR-89-1261 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Pipelined parallel computations, and sorting on a pipelined hypercube. %A Mayr, Ernst W. %A Plaxton, C. Greg %D May 1989 %X This paper brings together a number of previously known techniques in order to obtain practical and efficient implementations of the prefix operation for the complete binary tree, hypercube and shuffle exchange families of networks. For each of these networks, we also provide a "pipelined" scheme for performing k prefix operations in O(k + log p) time on p processors. This implies a similar pipelining result for the "data distribution" operation of Ullman [16]. The data distribution primitive leads to a simplified implementation of the optimal merging algorithm of Varman and Doshi, which runs on a pipelined model of the hypercube [17]. Finally, a pipelined version of the multi-way merge sort of Nassimi and Sahni [10], running on the pipelined hypercube model, is described. Given p processors and n < p log p values to be sorted, the running time of the pipelined algorithm is O(log2 p/log((p log p)/n)). Note that for the interesting case n = p this yields a running time of 0(log2 p/log log p), which is asymptotically faster than Batcher's bitonic sort[3]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1261/CS-TR-89-1261.pdf %R CS-TR-89-1264 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Chebyshev polynomials are not always optimal %A Fischer, Bernd %A Freund, Roland %D June 1989 %X We are concerned with the problem of finding among all polynomials of degree at most n and normalized to be 1 at c the one with minimal uniform norm on Epsilon. Here, Epsilon is a given ellipse with both foci on the real axis and c is a given real point not contained in Epsilon. Problems of this type arise in certain iterative matrix computations, and, in this context, it is generally believed and widely referenced that suitably normalized Chebyshev polynomials are optimal for such constrained approximation problems. In this note, we show that this is not true in general. Moreover, we derive sufficient conditions which guarantee that Chebyshev polynomials are optimal. Also, some numerical examples are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1264/CS-TR-89-1264.pdf %R CS-TR-89-1266 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Multi-level shared caching techniques for scalability in VMP-MC %A Cheriton, David R. %A Goosen, Hendrik A. %A Boyle, Patrick D. %D May 1989 %X The problem of building a scalable shared memory multiprocessor can be reduced to that of building a scalable memory hierarchy, assuming interprocessor communication is handled by the memory system. In this paper, we describe the VMP-MC design, a distributed parallel multi-computer based on the VMP multiprocessor design, that is intended to provide a set of building blocks for configuring machines from one to several thousand processors. VMP-MC uses a memory hierarchy based on shared caches, ranging from on-chip caches to board-level caches connected by busses to, at the bottom, a high-speed fiber optic ring. In addition to describing the building block components of this architecture, we identify the key performance issues associated with the design and provide performance evaluation of these issues using trace-drive simulation and measurements from the VMP. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1266/CS-TR-89-1266.pdf %R CS-TR-89-1268 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Addition machines %A Floyd, Robert W. %A Knuth, Donald E. %D July 1989 %X An addition machine is a computing device with a finite number of registers, limited to the following six types of operations: read x {input to register x} x <-- y {copy register y to register x} x <-- x + y {add register y to register x} x <-- x - y {subtract register y from register x} if x >= y {compare register x to register y} write x {output from register x} The register contents are assumed to belong to a given set A, which is an additive subgroup of the real numbers. If A is the set of all integers, we say the device is an integer addition machine; if A is the set of all real numbers, we say the device is a real addition machine. We will consider how efficiently an integer addition machine can do operations such multiplication, division, greatest common divisor, exponentiation, and sorting. We will also show that any addition machine with at least six registers can compute the ternary operation x[y/z] with reasonable efficiency, given x, y, z in A with z not equal to 0. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1268/CS-TR-89-1268.pdf %R CS-TR-89-1269 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem solving seminar %A Ross, Kenneth A. %A Knuth, Donald E. %D July 1989 %X This report contains edited transcripts of the discussions held in Stanford's Computer Science problem solving course, CS304, during winter quarter 1989. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms were touched on during the discussions, these notes may be of interest to graduate students of computer science at other universities, as well as to their professors and to professional people in the "real world." The present report is the eighth in a series of such transcripts, continuing the tradition established in STAN-CS-77-606 (Michael J. Clancy, 1977), STAN-CS-79-707 (Chris Van Wyk, 1979), STAN-CS-81-863 (Allan A. Miller, 1981), STAN-CS-83-989 (Joseph S. Weening, 1983), STAN-CS-83-990 (John D. Hobby, 1983), STAN-CS-85-1055 (Ramsey W. Haddad, 1985) and STAN-CS-87-1154 (Tomas G. Rokicki, 1987). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1269/CS-TR-89-1269.pdf %R CS-TR-89-1273 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sirpent[TM]: a high-performance internetworking approach %A Cheriton, David R. %D July 1989 %X A clear target for computer communication technology is to support a high-performance global internetwork. Current internetworking approaches use either concatenated virtual circuits, as in X.75, or a "universal" internetwork datagram, as in the DoD Internet IP protocol and the IS0 connectionless network protocol (CLNP). Both approaches have significant disadvantages. This paper describes Sirpent[TM] (Source Internetwork Routing Protocol with Extended Network Transfer), a new approach to an internetwork architecture that makes source routing the basis for interconnection, rather than an option as in IP. Its benefits include simple switching with low per-packet processing and delay, support for accounting and congestion control, and scalability to a global internetwork. It also supports flexible, user-controlled routing such as required for security, policy-based routing and real-time applications. We also propose a specific internetwork protocol, called VIPER[TM], as a realization of the Sirpent approach. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1273/CS-TR-89-1273.pdf %R CS-TR-89-1275 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A new approach to stable matching problems %A Subramanian, Ashok %D August 1989 %X We show that Stable Matching problems are the same as problems about stable configurations of X-networks. Consequences include easy proofs of old theorems, a new simple algorithm for finding a stable matching, an understanding of the difference between Stable Marriage and Stable Roommates, NP-completeness of Three-party Stable Marriage, CC-completeness of several Stable Matching problems, and a fast parallel reduction from the Stable Marriage problem to the Assignment problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1275/CS-TR-89-1275.pdf %R CS-TR-89-1276 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the network complexity of selection %A Plaxton, C. Greg %D August 1989 %X The selection problem is to determine the kth largest out of a given set of n keys, and its sequential complexity is well known to be linear. Thus, given a p processor parallel machine, it is natural to ask whether or not an O(n/p) selection algorithm can be devised for that machine. For the EREW PRAM, Vishkin has exhibited a straightforward selection algorithm that achieves optimal speedup for n = Omega(p log p log log p) [18]. For the network model, the sorting result of Leighton [12] and the token distribution result of Peleg and Upfal [13] together imply that Vishkin's algorithm can be adapted to run in the same asymptotic time bound on a certain class of bounded degree expander networks. On the other hand, none of the network families currently of practical interest have sufficient expansion to permit an efficient implementation of Vishkin's algorithm. The main result of this paper is an Omega((n/p) log log p + log p) lower bound for selection on any network that satisfies a particular low expansion property. The class of networks satisfying this property includes all of the common network families such as the tree, multi-dimensional mesh, hypercube, butterfly and shuffle exchange. When n/p is sufficiently large (for example, greater than log2 p on the butterfly, hypercube and shuffle exchange), this result is matched by the upper bound presented in [14]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1276/CS-TR-89-1276.pdf %R CS-TR-89-1278 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The complexity of circuit value and network stability %A Mayr, Ernst W. %A Subramanian, Ashok %D August 1989 %X We develop a method for non-trivially restricting fanout in a circuit. We study the complexity of the Circuit Value problem and a new problem, Network Stability, when fanout is limited. This leads to new classes of problems within P. We conjecture that the new classes are different from P and incomparable to NC. One of these classes, CC, contains several natural complete problems, including Circuit Value for comparator circuits, Lex-first Maximal Matching, and problems related to Stable Marriage and Stable Roommates. When fanout is appropriately limited, we get positive results: a parallel algorithm for Circuit Value that runs in time about the square root of the number of gates, a linear-time sequential algorithm for Network Stability, and logspace reductions between Circuit Value and Network Stability. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1278/CS-TR-89-1278.pdf %R CS-TR-89-1280 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sticky bits and universality of consensus %A Plotkin, Serge A. %D August 1989 %X In this paper we consider implementation of atomic wait-free objects in the context of a shared-memory multiprocessor. We introduce a new primitive object, the "Sticky-Bit", and show its universality by proving that any safe implementation of a sequential object can be transformed into a wait-free atomic one using only Sticky Bits and safe registers. The Sticky Bit may be viewed as a memory-oriented version of consensus. In particular, the results of this paper imply "universality of consensus" in the sense that given an algorithm to achieve n-processor consensus, we can transform any safe implementation of a sequential object into a wait-free atomic one using polynomial number of additional safe bits. The presented results also imply that the Read-Modify-Write (RMW) hierarchy "collapses". More precisely, we show that although an object that supports a 1-bit atomic wait-free RMW is strictly more powerful than safe register and an object that supports 3-valued atomic wait-free RMW is strictly more powerful than 1-bit RMW, the 3-value RMW is universal in the sense that any RMW can be atomically implemented from a 3-value atomic RMW in a wait-free fashion. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1280/CS-TR-89-1280.pdf %R CS-TR-89-1281 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Load balancing on the hypoercube and shuffle-exchange %A Plaxton, C. Greg %D August 1989 %X Maintaining a balanced load is of fundamental importance on any parallel computer, since a strongly imbalanced load often leads to low processor utilization. This paper considers two load balancing operations: Balance and MultiBalance. The Balance operation corresponds to the token distribution problem considered by Peleg and Upfal [9] for certain expander networks. The MultiBalance operation balances several populations of distinct token types simultaneously. Efficient implementations of these operations will be given for the hypercube and shuffle-exchange, along with tight or near-tight lower bounds. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1281/CS-TR-89-1281.pdf %R CS-TR-89-1286 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast sparse matrix factorization on modern workstations %A Rothberg, Edward %A Gupta, Anoop %D October 1989 %X The performance of workstation-class machines has experienced a dramatic increase in the recent past. Relatively inexpensive machines which offer 14 MIPS and 2 MFLOPS performance are now available, and machines with even higher performance are not far off. One important characteristic of these machines is that they rely on a small amount of high-speed cache memory for their high performance. In this paper, we consider the problem of Cholesky factorization of a large sparse positive definite system of equations on a high performance workstation. We find that the major factor limiting performance is the cost of moving data between memory and the processor. We use two techniques to address this limitation; we decrease the number of memory references and we improve cache behavior to decrease the cost of each reference. When run on benchmarks from the Harwell-Boeing Sparse Matrix Collection, the resulting factorization code is almost three times as fast as SPARSPAK on a DECStation 3100. We believe that the issues brought up in this paper will play an important role in the effective use of high performance workstations on large numerical problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1286/CS-TR-89-1286.pdf %R CS-TR-89-1290 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reading list for the Qualifying Examination in Artificial Intelligence %A Myers, Karen %A Subramanian, Devika %A Z abih, Ramin %D November 1989 %X This report contains the reading list for the Qualifying Examination in Artificial Intelligence. Areas covered include search, representation, reasoning, planning and problem solving, learning, expert systems, vision, robotics, natural language, perspectives and AI programming. An extensive bibliography is also provided. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1290/CS-TR-89-1290.pdf %R CS-TR-89-1296 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Completing the temporal picture %A Manna, Z ohar %A Pnueli, Amir %D December 1989 %X The paper presents a relatively complete proof system for proving the validity of temporal properties of reactive programs. The presented proof system improves oll previous temporal systems, such as [MP83a] and [MP83b], in that it reduces the validity of program properties into pure assertional reasoning, not involving additional temporal reasoning. The proof system is based on the classification of temporal properties according to the Borel hierarchy, providing an appropriate proof rule for each of the main classes, such as safety, response, and progress properties. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1296/CS-TR-89-1296.pdf %R CS-TR-89-1288 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Programming and proving with function and control abstractions %A Talcott, Carolyn %D October 1989 %X Rum is an intensional semantic theory of function and control abstractions as computation primitives. It is a mathematical foundation for understanding and improving current practice in symbolic (Lisp-style) computation. The theory provides, in a single context, a variety of semantics ranging from structures and rules for carrying out computations to an interpretation as functions on the computation domain. Properties of powerful programming tools such as functions as values, streams, aspects of object oriented programming, escape mechanisms, and coroutines can be represented naturally. In addition a wide variety of operations on programs can be treated including program transformations which introduce function and control abstractions, compiling morphisms that transform control abstractions into function abstractions, and operations that transform intensional properties of programs into extensional properties. The theory goes beyond a theory of functions computed by programs, providing tools for treating both intensional and extensional properties of programs. This provides operations on programs with meanings to transform as well as meanings to preserve. Applications of this theory include expressing and proving properties of particular programs and of classes of programs and studying mathematical properties of computation mechanisms. Additional applications are the design and implementation of interactive computation systems and the mechanization of reasoning about computation. These notes are based on lectures given at the Western Institute of Computer Science summer program, 31 July - 1 August 1986. Here we focus on programming and proving with function and control abstractions and present a variety of example programs, properties, and techniques for proving these properties. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1288/CS-TR-89-1288.pdf %R CS-TR-89-1244 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Software Performance on Nonlinear Least-Squares Problems %A Fraley, Christina %D January 1989 %X This paper presents numerical results for a large and varied set of problems using sofware that is widely available and has undergone extensive testing. The algorithms implemented in this software include Newton-based linesearch and trust-region methods for unconstrained optimization, as well as Gauss-Newton, Levenberg-Marquardt, and special quasi-Newton methods for nonlinear least squares. Rather than give a critical assessment of the software itself, our original purpose was to use the best available software to compare the underlying algorithms, to identify classes of problems for each method on which the performance is either very good or very poor and to provide benchmarks for future work in nonlinear least squares and unconstrained optimization. The variability in the results made it impossible to meet either of the first two goals; however the results are significant as a step toward explaining why thesse aims are so difficult to accomplish. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/89/1244/CS-TR-89-1244.pdf %R CS-TR-90-1298 %Z Thu, 22 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Leases: an efficient fault-tolerant mechanism for distributed file cache consistency. %A Gray, Cary G. %A Cheriton, David R. %D January 1990 %X Caching introduces the overhead and complexity of ensuring consistency, reducing some of its performance benefits. In a distributed system, caching must deal with the additional complications of commumcation and host failures. Leases are proposed as a time-based mechanism that provides efficient consistent access to cached data in distributed systems. Non-Byzantine failures affect performance, not correctness ,with their effect minimized by short leases. An analytic model and an evaluation for file access in the V system show that leases of short duration provide good performance. The impact of leases on performance grows more significant in systems of larger scale and higher processor performance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1298/CS-TR-90-1298.pdf %R CS-TR-90-1304 %Z Tue, 06 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A model of object-identities and values %A Matsushima, Toshiyuki %A Wiederhold, Gio %D February 1990 %X An algebraic formalization of the object-orlented data model is proposed. The formalism reveals that the semantics of the object-oriented model consists of two portions. One is expressed by an algebraic construct, which has essentially a value-oriented semantics. The other is expressed by object-identities, which characterize the essential difference of the object-oriented model and value-oriented models, such as the relational model and the logical database model. These two portions are integrated by a simple commutativity of modeling functions. The formalism includes the expression of integrity constraints in its construct, which provides the natural integration of the logical database model and the object-oriented database model. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1304/CS-TR-90-1304.pdf %R CS-TR-90-1305 %Z Tue, 06 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A comparative evaluation of nodal and supernodal parallel sparse matrix factorization: detailed simulation results %A Rathberg, Edward %A Gupta, Anoop %D February 1990 %X In this paper we consider the problem of factoring a large sparse system of equations on a modestly parallel shared-memory multiprocessor with a non-trivial memory hierarchy. Using detailed multiprocessor simulation, we study the behavior of the parallel sparse factorization scheme developed at the Oak Ridge National Laboratory. We then extend the Oak Ridge scheme to incorporate the notion of supernodal elimination. We present detailed analyses of the sources of performance degradation for each of these schemes. We measure the impact of interprocessor communication costs, processor load imbalance, overheads introduced in order to distribute work, and cache behavior on overall parallel performance. For the three benchmark matrices which we study, we find that the supernodal scheme gives a factor of 1.7 to 2.7 performance advantage for 8 processors and a factor of 0.9 to 1.6 for 32 processors. The supemodal scheme exhibits higher performance due mainly to the fact that it executes many fewer memory operations and produces fewer cache misses. However, the natural task grain size for the supernodal scheme is much larger than that of the Oak Ridge scheme, making effective distnbution of work more difficult, especially when the number of processors is large. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1305/CS-TR-90-1305.pdf %R CS-TR-90-1307 %Z Tue, 06 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Real-time logics: complexity and expressiveness %A Alur, Rajeev %A Henzinger, Thomas A. %D March 1990 %X The theory of the natural numbers with linear order and monadic predicates underlies propositional linear temporal logic. To study temporal logics for real-time systems, we combine this classical theory of infinite state sequences with a theory of time, via a monotonic function that maps every state to its time. The resulting theory of timed state sequences is shown to be decidable, albeit nonelementary, and its expressive power is characterized by omega-regular sets. Several more expressive variants are proved to be highly undecidable. This framework allows us to classify a wide variety of real-time logics according to their complexity and expressiveness. In fact, it follows that most formalisms proposed in the literature cannot be decided. We are, however, able to identify two elementary real-time temporal logics as expressively complete fragments of the theory of timed state sequences, and give tableau-based decision procedures. Consequently, these two formalisms are well-suited for the specification and verification of real-time systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1307/CS-TR-90-1307.pdf %R CS-TR-90-1312 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A validation structure based theory of plan modification and reuse %A Kambhampati, Subbarao %A Hendler, James A. %D June 1990 %X A framework for the flexible and conservative modification of plans enables a planner to modify its plans in response to incremental changes in their specifications, to reuse its existing plans in new problem situations, and to efficiently replan in response to execution time failures. We present a theory of plan modification applicable to hierarchical nonlinear planning. Our theory utilizes the validation structure of stored plans to yield a flexible and conservative plan modification framework. The validation structure, which constitutes a hierarchical explanation of correctness of the plan with respect to the planner's own knowledge of the domain, is annotated on the plan as a by-product of initial planning. Plan modification is formalized as a process of removing inconsistencies in the validation structure of a plan when it is being reused in a new (changed) planning situation. The repair of these inconsistencies involves removing unnecessary parts of the plan and adding new non-primitive tasks to the plan to establish missing or failing validations. The resultant partially reduced plan (with a consistent validation structure) is sent to the planner for complete reduction. We discuss the development of this theory in the PRIAR system, present an empirical evaluation of this theory, and characterize its completeness, coverage, efficiency and limitations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1312/CS-TR-90-1312.pdf %R CS-TR-90-1313 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Book review: Potokovye Algoritmy (Flow Algorithms) by G. M. Adel'son-Vel'ski, E. A. Dinic, and A. V. Karzanov. %A Goldberg, Andrew V. %A Gusfield, Dan %D June 1990 %X This is a review of the book "Flow Algorithms" by Adel'son-Vel'ski, Dinic, and Karzanov, well-known researchers in the area of algorithm design and analysis. This remarkable book, published in 1975, is written in Russian and has never been translated into English. What is remarkable about the book is that it describes many major results obtained in the Soviet Union (and originally published in papers by 1976) that were independently discovered later (and in some cases much later) in the West. The book also contains some minor results that we believe are still unknown in the West. The book is well-written and a pleasure to read, at least for someone fluent in Russian. Although the book is fifteen years old and we believe that all the major results contained in it are known in the West by now, the book is still of great historical importance. Hence a complete review is in order. [from the Introduction] %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1313/CS-TR-90-1313.pdf %R CS-TR-90-1314 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems %A Koza, John R. %D June 1990 %X Many seemingly different problems in artificial intelligence, symbolic processing, and machine learning can be viewed as requiring discovery of a computer program that produces some desired output for particular inputs. When viewed in this way, the process of solving these problems becomes equivalent to searching a space of possible computer programs for a most fit individual computer program. The new "genetic programming" paradigm described herein provides a way to search for this most fit individual computer program. In this new "genetic programming" paradigm, populations of computer programs are genetically bred using the Darwinian principle of survival of the fittest and using a genetic crossover (recombination) operator appropriate for genetically mating computer programs. In this paper, the process of formulating and solving problems using this new paradigm is illustrated using examples from various areas. Examples come from the areas of machine learning of a function; planning; sequence induction; function function identification (including symbolic regression, empirical discovery, "data to function" symbolic integration, "data to function" symbolic differentiation); solving equations, including differential equations, integral equations, and functional equations); concept formation; automatic programming; pattern recognition, time-optimal control; playing differential pursuer-evader games; neural network design; and finding a game-playing strategyfor a discrete game in extensive form. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1314/CS-TR-90-1314.pdf %R CS-TR-90-1318 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Techniques for improving the performance of sparse matrix factorization on multiprocessor workstations %A Rothberg, Edward %A Gupta, Anoop %D June 1990 %X In this paper we look at the problem of factoring large sparse systems of equations on high-performance multiprocessor workstations. While these multiprocessor workstations are capable of very high peak floating point computation rates, most existing sparse factorization codes achieve only a small fraction of this potential. A major limiting factor is the cost of memory accesses performed during the factorization. ln this paper, we describe a parallel factorization code which utilizes the supernodal structure of the matrix to reduce the number of memory references. We also propose enhancements that significantly reduce the overall cache miss rate. The result is greatly increased factorization performance. We present experimental results from executions of our codes on the Silicon Graphics 4D/380 multiprocessor. Using eight processors, we find that the supernodal parallel code achieves a computation rate of approximately 40 MFLOPS when factoring a range of benchmark matrices. This is more than twice as fast as the parallel nodal code developed at the Oak Ridge National Laboratory running on the SGI 4D/380. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1318/CS-TR-90-1318.pdf %R CS-TR-90-1321 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Tools and rules for the practicing verifier %A Manna, Z ohar %A Pnueli, Amir %D July 1990 %X The paper presents a minimal proof theory which is adequate for proving the main important temporal properties of reactive programs. The properties we consider consist of the classes of invariance, response, and precedence properties. For each of these classes we present a small set of rules that is complete for verifying properties belonging to this class. We illustrate the application of these rules by analyzing and verifying the properties of a new algorithm for mutual exclusion. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1321/CS-TR-90-1321.pdf %R CS-TR-90-1323 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Protograms %A Mozes, Eyal %A Shoham, Yoav %D July 1990 %X Motivated largely by tasks that require control of complex processes in a dynamic environment, we introduce a new computational construct called a protogram. A protogram is a program specifying an abstract course of action, a course that allows for a range of specific actions, from which a choice is made through interaction with other protograms. We discuss the intuition behind the notion, and then explore some of the details involved in implementing it. Specifically, we (a) describe a general scheme of protogram interaction, (b) describe a protogram interpreter that has been implemented, dealing with some special cases, (c) describe three applications of the protogram interpreter, one in data processing and two in robotics (both currently only implemented as simulations), (d) describe some more general possible implementations of a protogram interpreter, and (e) discuss how protograms can be useful for the Gofer project. We also briefly discuss the origins of protograms in psychology and linguistics, compare protograms to blackboard and subsumption architectures, and discuss directions for future research. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1323/CS-TR-90-1323.pdf %R CS-TR-90-1324 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the complexity of monotonic inheritance with roles %A Guerreiro, Ramiro A. de T. %A Hemerly, S. %A Shoham, Yoav %D July 1990 %X We investigate the complexity of reasoning with monotonic inheritance hierarchies that contain, beside ISA edges, also ROLE (or FUNCTION) edges. A ROLE edge is an edge labelled with a name such as spouse of or brother of. We call such networks ISAR networks. Given a network with n vertices and m edges, we consider two problems: ($P_1$) determining whether the network implies an isa relation between two particular nodes, and ($P_2$) determining all isa relations implied by the network. As is well known, without ROLE edges the time complexity of $P_1$, is O(m), and the time complexity of $P_2$ is O($n^3$). Unfortunately, the results do not extend naturally to ISAR networks, except in a very restricted case. For general ISAR network we first give an polynomial algorithm by an easy reduction to proposional Horn theory. As the degree of the polynomial is quite high (O(m$n^4$) for $P_1$, O(m$n^6$) for $P_2$), we then develop a more direct algorithm. For both $P_1$ and $P_2$ its complexity is O($n^3 + m^2$). Actually, a finer analysis of the algorithm reveals a complexity of O(nr(log r) + $n^2$r+ $n^3), where r is the number of different ROLE labels. One corolary is that if we fix the number of ROLE labels, the complexity of our algorithm drops back to O($n^3$). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1324/CS-TR-90-1324.pdf %R CS-TR-90-1329 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T An interleaving model for real time. %A Henzinger, Thomas A. %A Manna, Z ohar %A Pnueli, Amir %D September 1990 %X The interleaving model is both adequate and sufficiently abstract to allow for the practical specification and verification of many properties of concurrent systems. We incorporate real time into this model by defining the abstract notion of a real-time transition system as a conservative extension of traditional transition systems: qualitative fairness requirements are replaced (and superseded) by quantitative lower-bound and upper-bound real-time requirements for transitions. We present proof rules to establish lower and upper real-time bounds for response properties of real-time transition systems. This proof system can be used to verify bounded-invariance and bounded-response properties, such as timely terrnination of shared-variables multi-process systems, whose semantics is defined in terms of real-time transition systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1329/CS-TR-90-1329.pdf %R CS-TR-90-1330 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel ICCG on a hierarchical memory multiprocessor - addressing the triangular solve bottleneck %A Rothberg, Edward %A Gupta, Anoop %D October 1990 %X The incomplete Cholesky conjugate gradient (ICCG) algorithm is a commonly used iterative method for solving large sparse systems of equations. In this paper, we study the parallel solution of sparse triangular systems of equations, the most difficult aspect of implementing the ICCG method on a multiprocessor. We focus on shared-memory multiprocessor architectures with deep memory hierarchies. On such architectures we find that previously proposed parallelization approaches result in little or no speedup. The reason is that these approaches cause significant increases in the amount of memory system traffic as compared to a sequential approach. Increases of as much as a factor of 10 on four processors were observed. In this paper we propose new techniques for limiting these increases, including data remappings to increase spatial locality, new processor synchronization techniques to decrease the use of auxiliary data structures, and data partitioning techniques to reduce the amount of interprocessor communication. With these techniques, memory system traffic is reduced to as little as one sixth of its previous volume. The resulting speedups are greatly improved as well, although they are still much less than linear. We discuss the factors that limit further speedups. We present both simulation results and results of experiments on an SGI 4D/340 multiprocessor. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1330/CS-TR-90-1330.pdf %R CS-TR-90-1337 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A simplifier for untyped lambda expressions %A Galbiati, Louis %A Talcott, Carolyn %D October 1990 %X Many applicative programming languages are based on the call-by-value lambda calculus. For these languages tools such as compilers, partial evaluators, and other transformation systems often make use of rewriting systems that incorporate some form of beta reduction. For purposes of automatic rewriting it is important to develop extensions of beta-value reduction and to develop methods for guaranteeing termination. This paper describes an extension of beta-value reduction and a method based on abstract interpretation for controlling rewriting to guarantee termination. The main innovations are (1) the use of rearrangement rules in combination with beta-value reduction to increase the power of the rewriting system and (2) the definition of a non-standard interpretation of expressions, the generates relation, as a basis for designing terminating strategies for rewriting. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1337/CS-TR-90-1337.pdf %R CS-TR-90-1340 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Programming in QLisp %A Mason, Ian A. %A Pehoushek, Joseph D. %A Talcott, Carolyn L. %A Weening, Joseph S. %D October 1990 %X Qlisp is an extension of Common Lisp, to support parallel programming. It was initially designed by John McCarthy and Richard Gabriel in 1984. Since then it has been under development both at Stanford University and Lucid, Inc. and has been implemented on several commercial shared-memory parallel computers. Qlisp is a queue-based, shared-memory, multi-processing language. This report is a tutorial introduction to the Stanford dialect of Qlisp. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1340/CS-TR-90-1340.pdf %R CS-TR-90-1342 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Modeling concurrency with geometry %A Pratt, Vaughan %D November 1990 %X The phenomena of branching time and true or noninterleaving concurrency find their respective homes in automata and schedules. But these two models of computation are formally equivalent via Birkhoff duality, an equivalence we expound on here in tutorial detail. So why should these phenomena prefer one over the other? We identify dimension as the culprit: 1-dimensional automata are skeletons permitting only interleaving concurrency, whereas rrue n-fold concurrency resides in transitions of dimension n. The truly concurrent automaton dual to a schedule is not a skeletal distributive lattice but a solid one! We introduce true nondeterminism and define it as monoidal homotopy; from this perspective nondeterminism in ordinary automata arises from forking and joining creating nontrivial homotopy. The automaton dual to a poset schedule is simply connected whereas that dual to an event structure schedule need not be, according to monoidal homotopy though not to group homotopy. We conclude with a formal definition of higher dimensional automaton as an n-complex or n-category, whose two essential axioms are associativity of concatenation within dimension and an interchange principle between dimensions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1342/CS-TR-90-1342.pdf %R CS-TR-90-1343 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Action logic and pure induction %A Pratt, Vaughan %D November 1990 %X In Floyd-Hoare logic, programs are dynamic while assertions are static (hold at states). In action logic the two notions become one, with programs viewed as on-the-fly assertions whose truth is evaluated along intervals instead of at states. Action logic is an equational theory ACT conservatively extending the equational theory REG of regular expressions with operations preimplication a --> b (had a then b) and postimplication b <-- a (b if-ever a). Unlike REG, ACT is finitely based, makes $a^*$ reflexive transitive closure, and has an equivalent Hilbert system. The crucial axiom is that of pure induction, ${(a --> a)}^*$ = a --> a. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1343/CS-TR-90-1343.pdf %R CS-TR-90-1344 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T ParaDiGM: a highly scalable shared-memory multi-computer architecture %A Cheriton, David R. %A Goosen, Hendrik A. %A Boyle, Patrick D. %D November 1990 %X ParaDiGM is a highly scalable shared-memory multi-computer architecture. It is being developed to demonstrate the feasibility of building a relatively low-cost shared-memory parallel computer that scales to large configurations, and yet provides sequential programs with performance comparable to a high-end microprocessor. A key problem is building a scalable memory hierarchy. In this paper we describe the ParaDiGm architecture, highlighting the innovations of our approach and presenting results of our evaluation of the design. We envision that scalable shared-memory multiprocessors like ParaDiGM will soon become the dominant form of parallel processing, even for very large-scale computation, providing a uniform platform for parallel programming systems and applications. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1344/CS-TR-90-1344.pdf %R CS-TR-90-1345 %Z Wed, 14 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Nonholonomic motion planning versus controllability via the multibody car system example %A Laumond, Jean-Paul %D December 1990 %X A multibody car system is a non-nilpotent, non-regular, triangularizable and well-controllable system. One goal of the current paper is to prove this obscure assertion. But its main goal is to explain and enlighten what it means. Motion planning is an already old and classical problem in Robotics. A few years ago a new instance of this problem has appeared in the literature: motion planning for nonholonomic systems. While useful tools in motion planning come from Computer Science and Mathematics (Computational Geometry, Real Algebraic Geometry), nonholonomic motion planning needs some Control Theory and more Mathematics (Differential Geometry). First of all, this paper tries to give a computational reading of the tools from Differential Geometric Control Theory required by planning. Then it shows that the presence of obstacles in the real world of a real robot challenges Mathematics with some difficult questions which are topological in nature, and have been solved only recently, within the framework of Sub-Riemannian Geometry. This presentation is based upon a reading of works recently developed by (1) Murray and Sastry, (2) Lafferiere and Sussmann, and (3) Bellaiche, Jacobs and Laumond. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/90/1345/CS-TR-90-1345.pdf %R CS-TR-95-1534 %Z Mon, 23 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Partial Information Based Integrity Constraint Checking %A Gupta, Ashish %D January 1995 %X Integrity constraints are useful for specifying consistent states of a database, especially in distributed database systems where data may be under the control of multiple database managers. Constraints need to be checked when the underlying database is updated. Integrity constraint checking in a distributed environment may involve a distributed transaction and the expenses associated with it: two phase commit protocols, distributed concurrency control, network communication costs, and multiple interface layers if the databases are heterogeneous. The information used for constraint checking may include the contents of base relations, constraint specifications, updates to the databases, schema restrictions, stored aggregates etc. We propose using only a subset of the information potentially available for constraint checking. Thus, only data that is local to a site may be used for constraint checking thus avoiding distributed transactions. The approach is useful also in centralized systems because relatively inexpensively accessible subsets may be used for constraint checking. We discuss constraint checking for the following three subsets of the afore mentioned information. 1. Constraint Subsumption: How to check one constraint C using a set of other constraint specifications {C0,...,Cn} and no data, and the knowledge that the constraints in set {C0,...,Cn} hold in the database? 2. Irrelevant Updates. How to check a constraint C using the database update, a set of other constraints {C0,...,Cn, and the knowledge that the constraints {C,C0,...,Cn} all hold before the update? 3. Local Checking. How to check a constraint C using the database update, the contents of the updated relation, a set of other constraints {C0,...,Cn}, and the knowledge that the constraints {C,C0,...,Cn} all hold before the update? Local checking is the main focus and the main contribution of this thesis. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1534/CS-TR-95-1534.pdf %R CS-TR-95-1535 %Z Mon, 23 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Random Networks in Configuration Space for Fast Path Planning %A Kavraki, Lydia E. %D January 1995 %X In the main part of this dissertation we present a new path planning method which computes collision-free paths for robots of virtually any type moving among stationary obstacles. This method proceeds according to two phases: a preprocessing phase and a query phase. In the preprocessing phase, a probabilistic network is constructed and stored as a graph whose nodes correspond to collision-free configurations and edges to feasible paths between these configurations. In the query phase, any given start and goal configurations of the robot are connected to two nodes of the network; the network is then searched for a path joining these two nodes. We apply our method to articulated robots with many degrees of freedom. Experimental results show that path planning can be done in a fraction of a second on a contemporary workstation ($\approx$ 150 MIPS), after relatively short preprocessing times (a few dozen to a few hundred seconds). In the second part of this dissertation, we present a new method that uses the the Fast Fourier Transform to compute the obstacle map required by certain path planning algorithms. In the final part of this dissertation, we consider a problem from assembly planning. In assembly planning we are interested in generating feasible sequences of motions that construct a mechanical product from its individual parts. We prove that the monotone assembly partitioning problem in the plane is NP-complete. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1535/CS-TR-95-1535.pdf %R CS-TR-95-1536 %Z Mon, 23 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Locomotion With A Unit-Modular Reconfigurable Robot %A Yim, Mark %D January 1995 %X A unit-modular robot is a robot that is composed of modules that are all identical. Here we study the design and control of unit-modular dynamically reconfigurable robots. This is based upon the design and construction of a robot called Polypod. We further choose statically stable locomotion as the task domain to evaluate the design and control strategy. The result is the creation of many unique locomotion modes. To gain insight into the capabilities of robots like Polypod we examine locomotion in general by building a functional taxonomy of locomotion. We show that Polypod is capable of generating all classes of statically stable locomotion, a feature unique to Polypod. Next, we propose methods to evaluate vehicles under different operating conditions such as different terrain conditions. We then evaluate and compare each mode of locomotion on Polypod. This study leads to interesting insights into the general characteristics of the corresponding classes of locomotion. Finally, since more modules are expected to increase robot capability, it is important to examine the limit to the number of modules that can be put together in a useful form. We answer this question by investigating the issues of structural stability, actuator strength, computation and control requirements. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1536/CS-TR-95-1536.pdf %R CS-TR-95-1537 %Z Mon, 23 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Real-Time Modification of Collision-Free Paths %A Quinlan, Sean %D January 1995 %X The modification of collision-free paths is proposed as the basis for a new framework to close the gap between global path planning and real-time sensor-based robot control. A physically-based model of a flexible string-like object, called an elastic band, is used to determine the modification of a path. The initial shape of the elastic is the free path generated by a planner. Subjected to artificial forces, the elastic band deforms in real time to a short and smooth path that maintains clearance from the obstacles. The elastic continues to deform as changes in the environment are detected by sensors, enabling the robot to accommodate uncertainties and react to unexpected and moving obstacles. While providing a tight connection between the robot and its environment, the elastic band preserves the global nature of the planned path. The greater part of this thesis deals with the design and implementation of elastic bands, with emphasis on achieving real-time performance even for robots with many degrees of freedom. To achieve these goals, we propose the concept of bubbles of free-space---a region of free-space around a given configuration of the robot generated from distance information. We also develop a novel algorithm for efficiently computing the distance between non-convex objects and a real-time algorithm for calculating a discrete approximation to the time-optimal parameterization of a path. These various developments are combined in a system that demonstrates the elastic band framework for a Puma 560 manipulator. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1537/CS-TR-95-1537.pdf %R CS-TR-95-1538 %Z Fri, 27 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T 1994 Publications Summary of the Stanford Database Group %A Hammer, Joachim %D January 1995 %X This technical report contains the first four pages of papers written by members of the Stanford Database Group during 1994. We believe that the first four pages convey the main ideas behind each paper better than a simple title and abstract does. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1538/CS-TR-95-1538.pdf %R CS-TR-95-1539 %Z Tue, 31 Jan 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reasoning About Uncertainty in Robot Motion Planning %A Lazanas, Anthony %D August 1994 %X In this thesis, we investigate the effects of uncertainty on the difficulty of robot motion planning, and we study the tradeoff between physical and computational complexity. We present a formulation of the general robot motion planning with uncertainty problem, so that a complete, correct, polynomial planner can be derived. The key idea is the existence of reduced uncertainty regions in the workspace (landmark regions). Planning is performed using the preimage backchaining method. We extend the standard definition of a ``nondirectional preimage'' to the case where a motion command depends on an arbitrary number of control parameters. The resulting multi-dimensional preimage can be represented with a polynomial number of 2-D slices, each computed for a critical combination of values of the parameters. We present implemented algorithms for one parameter (the commanded direction of motion) and for two parameters (the commanded direction of motion and the directional uncertainty). Experimentation with the algorithm using a real mobile robot has been successful. By engineering the workspace, we have been able to satisfy all the assumptions of our planning model. As a result, the robot has been able to operate for long periods of time with no failures. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1539/CS-TR-95-1539.pdf %R CS-TR-95-1542 %Z Fri, 10 Feb 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel Genetic Programming on a Network of Transputers %A Koza, John R. %A Andre, David %D January 1995 %X This report describes the parallel implementation of genetic programming in the C programming language using a PC 486 type computer (running Windows) acting as a host and a network of transputers acting as processing nodes. Using this approach, researchers of genetic algorithms and genetic programming can acquire computing power that is intermediate between the power of currently available workstations and that of supercomputers at a cost that is intermediate between the two. A comparison is made of the computational effort required to solve the problem of symbolic regression of the Boolean even-5-parity function with different migration rates. Genetic programming required the least computational effort with an 8% migration rate. Moreover, this computational effort was less than that required for solving the problem with a serial computer and a panmictic population of the same size. That is, apart from the nearly linear speed-up in executing a fixed amount of code inherent in the parallel implementation of genetic programming, parallelization delivered more than linear speed-up in solving the problem using genetic programming. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1542/CS-TR-95-1542.pdf %R CS-TR-95-1543 %Z Tue, 14 Feb 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stereo Without Search %A Tomasi, Carlo %A Manduchi, Roberto %D February 1995 %X Search is not inherent in the correspondence problem. We propose a representation of images, called intrinsic curves, that combines the ideas of associative storage of images with connectedness of the representation: intrinsic curves are the paths that a set of local image descriptors trace as an image scanline is traversed from left to right. Curves become surfaces when full images are considered instead of scanlines. Because only the path in the space of descriptors is used for matching, intrinsic curves lose track of space, and are invariant with respect to disparity under ideal circumstances. Establishing stereo correspondences then becomes a trivial lookup problem. We also show how to use intrinsic curves to match real images in the presence of noise, brightness bias, contrast fluctuations, and moderate geometric distortion, and we show how intrinsic curves can be used to deal with image ambiguity and occlusions. We carry out experiments on single-scanline matching to prove the feasibility of the approach and illustrate its main features. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1543/CS-TR-95-1543.pdf %R CS-TR-95-1546 %Z Fri, 17 Mar 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Symbolic Approximations for Verifying Real-Time Systems %A Wong-Toi, Howard %D December 1994 %X Real-time systems are appearing in more and more applications where their proper operation is critical, e.g. transport controllers and medical equipment. However they are extremely difficult to design correctly. One approach to this problem is the use of formal description techniques and automatic verification. Unfortunately automatic verification suffers from the state-explosion problem even without considering timing information. This thesis proposes a state-based approximation scheme as a heuristic for efficient yet accurate verification. We first describe a generic iterative approximation algorithm for checking safety properties of a transition system. Successively more accurate approximations of the reachable states are generated until the specification is provably satisfied or not. The algorithm automatically decides where the analysis needs to be more exact, and uses state partitioning to force the approximations to converge towards a solution. The method is complete for finite-state systems. The algorithm is applied to systems with hard real-time bounds. State approximations are performed over both timing information and control information. We also approximate the system's transition structure. Case studies include some timing properties of the MAC sublayer of the Ethernet protocol, the tick-tock service protocol, and a timing-based communication protocol where the sender's and receiver's clocks advance at variable rates. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1546/CS-TR-95-1546.pdf %R CS-TR-95-1540 %Z Fri, 03 Feb 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Model-Matching and Individuation for Model-Based Diagnosis %A Murdock, Janet L. %D January 1995 %X In model-based systems that reason about the physical world, models are attached to portions of the physical system. To make model-based systems more extensible and re-usable, this thesis explores automating model-matching. Models address particular individuals, portions of the physical world identified as separate entities. If the set of models is not fixed, one cannot carve the physical system into a fixed set of individuals. Our goals are to develop methods for matching and individuating and identify characteristics of physical equipment and models required by those methods. Our approach is to identify a set of characteristics, build a system which used them, and test re-usability and extensibility. If the system correctly defines individuals and matches models, even when models calls for individuals not previously defined, then we can conclude that we have identified some subset of the characteristics required. The system matches models to a series of equipment descriptions, simulating re-use. We also add a number of models, extending the system, having it match the new models. Our investigation shows characteristics required are the 3-dimensional space and how the space is filled by functional components, phases, materials, and parameters. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1540/CS-TR-95-1540.pdf %R CS-TR-95-1541 %Z Tue, 07 Feb 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Random Sampling in Graph Optimization Problems %A Karger, David R. %D February 1995 %X The representative random sample is a central concept of statistics. It is often possible to gather a great deal of information about a large population by examining a small sample randomly drawn from it. This approach has obvious advantages in reducing the investigator's work, both in gathering and in analyzing the data. We apply the concept of a representative sample to combinatorial optimization. Our focus is optimization problems on undirected graphs. Highlights of our results include: The first (randomized) linear time minimum spanning tree algorithm; A (randomized) minimum cut algorithm with running time roughly O(n^2) as compared to previous roughly O(n^3) time bounds, as well as the first algorithm for finding all approximately minimal cuts and multiway cuts; An efficient parallelization of the minimum cut algorithm, providing the first parallel (RNC) algorithm for minimum cuts; A derandomization finding minimum cut in NC; Provably accurate approximations to network reliability; Very fast approximation algorithms for minimum cuts, s-t minimum cuts, and maximum flows; Significantly improved polynomial-time approximation bounds for network design problems; For coloring 3-colorable graphs, improvements in the approximation bounds from O(n^{3/8}) to O(n^{1/4}); An analysis of random sampling in Matroids. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1541/CS-TR-95-1541.pdf %R CS-TR-95-1544 %Z Tue, 14 Feb 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Diameter Verification and Boolean Matrix Multiplication. %A Basch, Julien %A Khanna, Sanjeev %A Motwani, Rajeev %D February 1995 %X We present a practical algorithm that verifies whether a graph has diameter 2 in time O(n^{3} / log^{2} n}). A slight adaptation of this algorithm yields a boolean matrix multiplication algorithm which runs in the same time bound; thereby allowing us to compute transitive closure and verification of the diameter of a graph for any constant $d$ in O(n^{3} / log^{2} n}) time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1544/CS-TR-95-1544.pdf %R CS-TR-95-1545 %Z Tue, 14 Feb 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Approximation Algorithms for the Largest Common Subtree Problem. %A Khanna, Sanjeev %A Motwani, Rajeev %A Yao, Frances F. %D February 1995 %X The largest common subtree problem is to find a largest subtree which occurs as a common subgraph in a given collection of trees. We show that in case of bounded degree trees, we can achieve an approximation ratio of O(( n*loglog n ) / log^{2} n). In case of unbounded degree nodes, we give an algorithm with approximation ratio O(( n*(loglog n)^{2}) / log^{2} n) when the trees are unlabeled. An approximation ratio of O(( n*(loglog n)^{2} ) / log^{2} n) is also achieved for the case of labeled unbounded degree trees provided the number of distinct labels is O(log^{O(1)} n). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1545/CS-TR-95-1545.pdf %R CS-TR-95-1547 %Z Fri, 05 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sharp, Reliable Predictions using Supervised Mixture Models %A Roy, H. Scott %D March 1995 %X This dissertation develops a new way to make probabilistic predictions from a database of examples. The method looks for regions in the data where different predictions are appropriate, and it naturally extends clustering algorithms that have been used with great success in exploratory data analysis. In probabilistic terms, the new method looks at the same models as before, but it only evaluates them for the conditional probability they assign to a single feature rather than the joint probability they assign to all features. A good models is therefore forced to classify the data in a way that is useful for a single, desired prediction, rather than just identifying the strongest overall pattern in the data. The results of this dissertation extend the clean, Bayesian approach of the unsupervised AutoClass system to the supervised learning problems common in everyday practice. Highlights include clear probabilistic semantics, prediction and use of discrete, categorical, and continuous data, priors that avoid the overfitting problem, an explicit noise model to identify unreliable predictions, and the ability to handle missing data. A computer implementation, MultiClass, validates the ideas with performance that exceeds neural nets, decision trees, and other current supervised machine learning systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1547/CS-TR-95-1547.pdf %R CS-TR-95-1549 %Z Thu, 11 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Dynamic Selection of Models %A Rutledge, Geoffrey William %D March 1995 %X This dissertation develops an approach to high-stakes, model-based decision making under scarce computation resources, bringing together concepts and techniques from the disciplines of decision analysis, statistics, artificial intelligence, and simulation. A method is developed and implemented to solve a time-critical decision problem in the domain of critical-care medicine. This method selects models that balance the prediction accuracy and the need for rapid action. Under a computation-time constraint, the optimal model for a model-based control application is a model that maximizes the tradeoff of model benefit (a measure of how accurately the model predicts the effects of alternative control settings) and model cost (a measure of the length of the model-induced computation delay). This work describes a real-time algorithm that selects, from a graph of models (GoM), a model that is accurate and that is computable within a time constraint. The DSM algorithm is a metalevel reasoning strategy that relies on a dynamic-selection-of-models (DSM) metric to guide the search through a GoM that is organized according to the simplifying assumptions of the models. The DSM metric balances an estimate of the probability that a model will achieve the required prediction accuracy and the cost of the expected model-induced computation delay. The DSM algorithm provides an approach to automated reasoning about complex systems that applies at any level of computation-resource or computation-time constraint. The DSM algorithm is implemented in Konan, a program that performs dynamic selection of patient-specific models from a GoM of quantitative physiologic models. Konan selects models that allow a model-based control application (a ventilator-management advisor) to make real-time decisions for the control settings of a mechanical ventilator. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1549/CS-TR-95-1549.pdf %R CS-TR-95-1550 %Z Wed, 24 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Theory and Design of a Hybrid Pattern Recognition System %A Drakopoulos, John A. %D May 1995 %X Pattern recognition methods can be divided into four different categories: statistical or probabilistic, structural, possibilistic or fuzzy, and neural methods. A formal analysis shows that there is a computational complexity versus representational power trade-off between probabilistic and possibilistic or fuzzy set measures, in general. Furthermore, sigmoidal theory shows that fuzzy set membership can be represented effectively by sigmoidal functions. Those results and the formalization of sigmoidal functions and subsequently multi-sigmoidal functions and neural networks led to the development of a hybrid pattern recognition system called tFPR. tFPR is a hybrid fuzzy, neural, and structural pattern recognition system that uses fuzzy sets to represent multi-variate pattern classes that can be either static or dynamic depending on time or some other parameter space. The membership functions of the fuzzy sets that represent pattern classes are modeled in three different ways. Simple sigmoidal configurations are used for simple patterns, a structural pattern recognition method is used for dynamic patterns, and multi-sigmoidal neural networks are used for pattern classes for which is difficult to obtain a formal definition. Although efficiency is a very important consideration in tFPR, the main issues are knowledge acquisition and knowledge representation (in terms of pattern class descriptions). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1550/CS-TR-95-1550.pdf %R CS-TR-95-1548 %Z Wed, 10 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Routing and Admission Control in General Topology Networks %A Gawlick, Rainer %A Kamath, Anil %A Plotkin, Serge %A Ramakrishnan, K. G. %D May 1995 %X Emerging high speed Broadband Integrated Services Digital Networks (B-ISDN) will carry traffic for services such as video-on-demand and video teleconferencing -- that require resource reservation along the path on which the traffic is sent. As a result, such networks will need effective {\em admission control} algorithms. The simplest approach is to use greedy admission control; in other words, accept every resource request that can be physically accommodated. However, in the context of symmetric loss networks (networks with a complete graph topology), non-greedy admission control has been shown to be more effective than greedy admission control. This paper suggests a new {\em non-greedy} routing and admission control algorithm for {\em general topology} networks. In contrast to previous algorithms, our algorithm does not require advance knowledge of the traffic patterns. Our algorithm combines key ideas from a recently developed theoretical algorithm with a stochastic analysis developed in the context of reservation-based algorithms. We evaluate the performance of our algorithm using extensive simulations on an existing commercial network topology and on variants of that topology. The simulations show that our algorithm outperforms greedy admission control over a broad range of network environments. The simulations also illuminate some important characteristics of our algorithm. For example, we characterize the importance of the implicit routing effects of the admission control part of our algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1548/CS-TR-95-1548.pdf %R CS-TR-95-1552 %Z Mon, 17 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Embedded Teaching of Reinforcement Learners %A Brafman, Ronen I. %A Tennenholtz, Moshe %D June 1995 %X Knowledge plays an important role in an agent's ability to perform well in its environment. Teaching can be used to improve an agent's performance by enhancing its knowledge. We propose a specific model of teaching, which we call embedded teaching. An embedded teacher is an agent situated with a less knowledgeable ``student'' in a common environment. The teacher's goal is to lead the student to adopt a particular desired behavior. The teacher's ability to teach is affected by the dynamics of the common environment and may be limited by a restricted repertoire of actions or uncertainty about the outcome of actions; we explicitly represent these limitations as part of our model. In this paper, we address a number of theoretical issues including the characterization of a challenging embedded teaching domain and the computation of optimal teaching policies. We then incorporate these ideas in a series of experiments designed to evaluate our ability to teach two types of reinforcement learners. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1552/CS-TR-95-1552.pdf %R CS-TR-95-1553 %Z Wed, 19 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Modeling techniques and algorithms for probabilistic model-based diagnosis and repair %A Srinivas, Sampath %D July 1995 %X Model-based diagnosis centers on the use of a behavioral model of a system to infer diagnoses of anomalous behavior. For model-based diagnosis techniques to become practical, some serious problems in the modeling of uncertainty and in the tractability of uncertainty management have to be addressed. These questions include: How can we tractably generate diagnoses in large systems? Where do the prior probabilities of component failure come from when modeling a system? How do we tractably compute low-cost repair strategies? How can we do diagnosis even if only partial descriptions of device operation are available? This dissertation seeks to bring model-based diagnosis closer to being a viable technology by addressing these problems. We develop a set of tractable algorithms and modeling techniques that address each of the problems introduced above. Our approach synthesizes the techniques used in model-based diagnosis and techniques from the field of Bayesian networks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1553/CS-TR-95-1553.pdf %R CS-TR-95-1554 %Z Mon, 24 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Computer Science Technical Report (CS-TR) Project: Considerations from the Library Perspective. %A Lasher, Rebecca %A Reich, Vicky %A Anderson, Greg %D July 1995 %X In 1992 the Advanced Research Projects Agency (ARPA) funded a three year grant to investigate the questions related to large-scale, distributed, digital libraries. The award focused research on Computer Science Technical Reports (CS-TR) and was granted to the Corporation for National Research Initiatives (CNRI) and five research universities. The ensuing collaborative research has focused on a broad spectrum of technical, social, and legal issues, and has encompassed all aspects of a very large, heterogeneous distributed digital library environment: acquisition, storage, organization, search, retrieval, display, use and intellectual property. The initial corpus of this digital library is a coherent digital collection of CS-TRs created at the five participating universities: Carnegie Mellon, Cornell, MIT, Stanford, and the University of California at Berkeley. The Corporation for National Research Initiatives serves as a collaborator and agent for the project. This technical report summarizes the accomplishments and collaborative efforts of the CS-TR project from a librarian's perspective; to do this we address the following questions: 1. Why do librarians and computer scientists make good research partners? 2. What has been learned? 3. What new questions have been articulated? 4. How can the accomplishments be moved into a service environment? 5. What actions and activities might follow from this effort? %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1554/CS-TR-95-1554.pdf %R CS-TR-95-1551 %Z Mon, 10 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Two Methods for Checking Formulas of Temporal Logic %A McGuire, Hugh W. %D June 1995 %X This dissertation presents two methods for determining satisfiability or validity of formulas of Discrete Metric Annotated Linear Temporal Logic. This logic is convenient for representing and verifying properties of reactive and concurrent systems, including software and electronic circuits. The first method presented here is an algorithm for automatically deciding whether any given propositional temporal formula is satisfiable. This new algorithm efficiently extends the classical `semantic tableau'-algorithm to formulas with temporal operators which refer to the past or are metric. Then, whereas classical proofs of correctness for such algorithms are existential, the proof here is constructive; it shows that for any given formula being checked, any model of the formula is embedded in the tableau. The second method presented in this dissertation is a deduction-calculus for determining the validity of predicate temporal formulas. This new deduction-calculus employs a refined, conservative version of classical approaches involving translation from temporal forms to first-order expressions with time reified. Here, quantifications are elided, and addition is used instead of classical complicated combinations of comparisons. This scheme facilitates integration of powerful techniques such as associative-commutative unification and a Presburger decision-algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1551/CS-TR-95-1551.pdf %R CS-TR-95-1556 %Z Tue, 12 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Solving Unweighted and Weighted Bipartite Matching Problems in Theory and Practice %A Kennedy, J. Robert, Jr. %D August 1995 %X The push-relabel method has been shown to be efficient for solving maximum flow and minimum cost flow problems in practice, and periodic global updates of dual variables have played an important role in the best implementations. Nevertheless, global updates had not been known to yield any theoretical improvement in running time. In this work, we study techniques for implementing push-relabel algorithms to solve bipartite matching and assignment problems. We show that global updates yield a theoretical improvement in the bipartite matching and assignment contexts, and we develop a suite of efficient cost-scaling push-relabel implementations to solve assignment problems. For bipartite matching, we show that a push-relabel algorithm using global updates matches the best time bound known (roughly the number of edges times the square root of the number of nodes --- better for dense graphs) and performs worse by a factor of the square root of the number of nodes without the updates. We present a similar result for the assignment problem, for which an algorithm that assumes integer costs has running time asymptotically dominated by the number of edges times the number of nodes times a scaling factor logarithmic in the number of nodes and the largest magnitude of an edge cost in the problem. The bound we obtain matches the best cost-scaling bound known. We develop cost-scaling push-relabel implementations that take advantage of the assignment problem's special structure, and compare our codes against the best codes from the literature. The results show that the push-relabel method is very promising for practical use. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1556/CS-TR-95-1556.pdf %R CS-TR-95-1555 %Z Mon, 11 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Real-time Database Experiences in Network Management Application %A Kiriha, Yoshiaki %D September 1995 %X This report discusses our experiences with real-time databases in the context of a network management system, in particular a MIB (Management Information Base) implementation. We propose an active and real-time MIB (ART-MIB) architecture that utilizes a real-time database system. The ART-MIB contains a variety of modules, such as transaction manager, task manager, and resource manager. Among the functionalities provided by ART-MIB, we focus on transaction scheduling within a memory based real-time database system. For the developed ART-MIB prototype, we have evaluated two typical real-time transaction scheduling algorithms: earliest deadline first (EDF) and highest value first (HVF). The main results of our performance comparison show that EDF outperforms HVF under a low load; however, HVF outperforms EDF in an overload situation. Furthermore, the fact that the performance crossover point closely depends on the magnitude of the scheduler queue, has been validated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1555/CS-TR-95-1555.pdf %R CS-TR-95-1557 %Z Wed, 11 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Hierarchical Models of Synchronous Circuits for Formal Verification and Substitution %A Wolf, Elizabeth Susan %D October 1995 %X We develop a mathematical model of synchronous sequential circuits that supports both formal hierarchical verification and substitution. We have implemented and proved the correctness of automatic decision procedures for both of these applications using these models. For hierarchical verification, we model synchronous circuit specifications and implementations uniformly. Each of these descriptions provides both a behavioral and a structural view of the circuit or specification being modeled. We compare the behavior of a circuit model to a requirements specification in order to determine whether the circuit is an acceptable implementation of the specification. Our structural view of a circuit provides the capability to plug in one circuit component in place of another. We derive a requirements specification for the acceptable replacement components, in terms of the desired behavior of the full circuit. We also support nondeterministic specifications, which capture the minimum requirements of a circuit. Previous formalisms have relied on syntactic methods for distinguishing apparent from actual unlatched feedback loops in hierarchical hardware designs. However, these methods are not applicable to nondeterministic models. Our model of the behavior of a synchronous circuit within a single clock cycle provides a semantic method to identify cyclic dependencies even in the presence of nondeterminism. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1557/CS-TR-95-1557.pdf %R CS-TR-95-1558 %Z Mon, 04 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Designing an Academic Firewall: Policy, Practice and Experience With SURF %A Greenwald, Michael B. %A Singhal, Sandeep K. %A Stone, Jonathan R. %A Cheriton, David R. %D December 1995 %X Corporate network firewalls are well-understood and are becoming commonplace. These firewalls establish a security perimeter that aims to block (or heavily restrict) both incoming and outgoing network communication. We argue that these firewalls are neither effective nor appropriate for academic or corporate research environments needing to maintain information security while still supporting the free exchange of ideas. In this paper, we present the Stanford University Research Firewall (SURF), a network firewall design that is suitable for a research environment. While still protecting information and computing resources behind the firewall, this firewall is less restrictive of outward information flow than the traditional model; can be easily deployed; and can give internal users the illusion of unrestricted e-mail, anonymous FTP, and WWW connectivity to the greater Internet. Our experience demonstrates that an adequate firewall for a research environment can be constructed for minimal cost using off-the-shelf software and hardware components. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1558/CS-TR-95-1558.pdf %R CS-TR-95-1559 %Z Fri, 12 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the number of equilibrium placements of mass distributions in elliptic potential fields %A Kavraki, Lydia E. %D December 1995 %X Recent papers have demonstrated the use of force fields for mechanical part orientation. The force field is realized on a plane on which the part is placed. The forces exerted on the part's contact surface translate and rotate the part to an equilibrium orientation. Part manipulation by force fields is very attractive since it requires no sensing. We describe force fields that result from elliptic potentials and induce only 2 stable equilibrium orientations for most parts. The proposed fields represent a considerable improvement over previously developed force fields which produced O(n) equilibria for polygonal parts with n vertices. The successful realization of these force fields could significantly affect part manipulation in industrial automation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1559/CS-TR-95-1559.pdf %R CS-TR-95-1560 %Z Fri, 12 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Wrappers for Performance Enhancements and Oblivious Decision Graphs. %A Kohavi, Ron %D September 1995 %X In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs). For accuracy estimation, we investigate cross-validation and the~.632 bootstrap. We show examples where they fail and conduct a large scale study comparing them. We conclude that repeated runs of five-fold cross-validation give a good tradeoff between bias and variance for the problem of model selection used in later chapters. We define the wrapper approach and use it for feature subset selection and parameter tuning. We relate definitions of feature relevancy to the set of optimal features, which is defined with respect to both a concept and an induction algorithm. The wrapper approach requires a search space, operators, a search engine, and an evaluation function. We investigate all of them in detail and introduce compound operators for feature subset selection. Finally, we abstract the search problem into search with probabilistic estimates. We introduce decision tables with a default majority rule (DTMs) to test the conjecture that feature subset selection is a very powerful bias. The accuracy of induced DTMs is surprisingly powerful, and we concluded that this bias is extremely important for many real-world datasets. We show that the resulting decision tables are very small and can be succinctly displayed. We study properties of oblivious read-once decision graphs (OODGs) and show that they do not suffer from some inherent limitations of decision trees. We describe a a general framework for constructing OODGs bottom-up and specialize it using the wrapper approach. We show that the graphs produced are use less features than C4.5, the state-of-the-art decision tree induction algorithm, and are usually easier for humans to comprehend. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1560/CS-TR-95-1560.pdf %R CS-TR-95-1561 %Z Tue, 16 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Techniques for Efficient Formal Verification Using Binary Decision Diagrams %A Hu, Alan John %D December 1995 %X The appeal of automatic formal verification is that it's automatic -- minimal human labor and expertise should be needed to get useful results and counterexamples. BDD(binary decision diagram)-based approaches have promised to allow automatic verification of complex, real systems. For large classes of problems, however, (including many distributed protocols, multiprocessor systems, and network architectures) this promise has yet to be fulfilled. Indeed, the few successes have required extensive time and effort from sophisticated researchers in the field. This thesis identifies several common obstacles to BDD-based automatic formal verification and proposes techniques to overcome them by avoiding building certain problematic BDDs needed in the standard approaches and by exploiting automatically generated and user-supplied don't-care information. Several examples illustrate the effectiveness of the new techniques in enlarging the envelope of problems that can routinely be verified automatically. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1561/CS-TR-95-1561.pdf %R CS-TR-95-1562 %Z Tue, 16 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T STeP: The Stanford Temporal Prover (Educational Release) User's Manual %A Bjorner, Nikolaj %A Browne, Anca %A Chang, Eddie %A Colon, Michael %A Kapur, Arjun %A Manna, Z ohar %A Sipma, Henny B. %A Uribe, Tomas E. %D November 1995 %X The STeP (Stanford Temporal Prover) system supports the computer-aided verification of reactive and real-time systems. It combines deductive methods with algorithmic techniques to allow the verification of a broad class of systems, including infinite-state systems and parameterized N-process programs. STeP provides the visual language of verification diagrams that allow the user to construct proofs hierarchically, starting from a high-level proof sketch. The availability of automatically generated bottom-up and top-down invariants and an integrated suite of decision procedures allow most verification conditions to be checked without user intervention. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/95/1562/CS-TR-95-1562.pdf %R CS-TR-87-1142 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Heuristic Refinement for Spacial Constraint Satisfaction Problems %A Brinkley, J. %A Buchanan, B. %A Altman, R. %A Duncan, B. %A Cornelius, C. %D January 1987 %X The problem of arranging a set of physical objects according to a set of constraints is formulated as a geometric constraint satisfaction problem (GCSP), in which the variables are the objects, the possible locations of the objects are the possible values for the variables, and the constraints are geometric constraints between objects. A GCSP is a type of multidimensional constraint satisfaction problem in which the number of objects and/or the number of possible locations per object is too large to permit direct solution by backtrack search. A method is described for reducing these numbers by refinement along two dimensions. The number of objects is reduced by refinement of the structure, representing a group of objects as a single abstract object before considering each object individually. The abstraction used depends on domain specific knowledge. The number of locations per object is reduced by applying node and arc consistency algorithms to refine the accessible volume of each object. Heuristics are employed to control the order of operations (and hence to affect the efficiency of search) but not to change the correctness in the sense that no solutions that would be found by backtrack search are eliminated. Application of the method to the problem of protein structure determination is described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1142/CS-TR-87-1142.pdf %R CS-TR-87-1144 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Considerations for Multiprocessor Typologies %A Byrd, Gregory %A Delagi, Bruce %D January 1987 %X Choosing a multiprocessor interconnection topology may depend on high-level considerations, such as the intended application domain and the expected number of processors. It certainly depends on low-level implementation details, such as packaging and communications protocols. We first use rough measures of cost and performance to characterize several topologies. We then examine how implementation details can affect the realizable performance of a topology. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1144/CS-TR-87-1144.pdf %R CS-TR-87-1146 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Point-to-Point Multicast Communications Protocol %A Byrd, Gregory %A Nakano, Russell %A Delagi, Bruce %D February 1987 %X Many network topologies have been proposed for connecting a large number of processor-memory pairs in a high-performance multiprocessor system. In terms of performance, however, the communications protocol decisions may be as crucial as topology. This paper describes a protocol to support point-to-point interprocessor communications with multicast. Dynamic, cut- through routing with local flow control is used to provide a high-throughput, low latency communications path between processors. In addition, multicast transmissions are available, in which copies of a packet are sent to multiple destinations using common resources as much as possible. Special packet terminators and selective buffering are introduced to avoid deadlock during multicasts. A simulated implementation of the protocol is also described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1146/CS-TR-87-1146.pdf %R CS-TR-87-1147 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Layered Environment for Reasoning about Action %A Hayes-Roth, B. %A Garvey, A. %A Johnson, M. V. %A Hewett, M. %D November 1986 %X An intelligent systems reasons about -- controls, explains, learns about -- its action, thereby improving its efforts to achieve goals and function in its environment. In order to perform effectively, a system must have knowledge of the actions it can perform, the events and states that occur, and the relationships among instances of those actions, events and states. We represent such knowledge in a hiearchy of knowledge abstractions and impose uniform standards of knowledge content and representation on modules within each hierarchical level. We refer to the evolving set of such modules as the BB* environment. To illustrate, we describe selected elements of BB*: * the foundational BB1 architecture * the ACCORD framework for solving arrangement problems by means of an assembly method * two applications of BB1-ACCORD, the PROTEAN system for modeling protein structures and the SIGHTPLAN system for designing construction-site layouts * two hypothetical multifaceted systems that integrate ACCORD, PROTEAN and SIGHTPLAN with other possible BB* frameworks and applications. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1147/CS-TR-87-1147.pdf %R CS-TR-87-1148 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Instrumented Architectural Simulation System %A Delagi, B. %A Saraiya, N. %A Nishimura, S. %A Byrd, G. %D January 1987 %X Simulation of systems at an architectural level can offer an effective way to study critical design choices if 1. the performance of the simulator is adequate to examine designs executing significant code bodies -- not just toy problems or small application fragments 2. the details of the simulation include the critical details of the design 3. The view of the design presented by the simulator instrumentation leads to useful insights on the problems with the design 4. there is enough flexibility in the simulation system so that the asking of unplanned questions is not suppressed by the weight of the mechanics involved in making changes either in the design or its measurement. A simulation system with these goals is described together with the approach to its implementation. Its application to the study of a particular class of multiprocessor hardware system architectures is illustrated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1148/CS-TR-87-1148.pdf %R CS-TR-87-1149 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Proceedings from the Nineteenth Annual Meeting of the Stanford Computer Forum %A Millen, K. Mac %A Diaz-Barriga, A. %A Tajnai, C. %D February 1987 %X Operating for almost two decades, the Stanford Computer Forum is a cooperative venture of the Computer Science Department and the Computer Systems Laboratory (a laboratory operated jointly by the Computer Science and Electrical Engineering Departments). CSD and CSL are intemationally recognized for their excellence; their faculty members, research staff, and students are widely known for leadership in developing new ideas and trends in the organization, design and use of computers. They are in the forefront of applying research results to a wide range of applications. The Forum holds an annual meeting in February to which three representatives of each member company are invited. The meeting lasts two days and features technical sessions at which timely computer research at Stanford is described by advanced graduate students and faculty members. There are opportunities for informal discussions to complement the presentations. This report includes information on the Forum, the program, abstracts of the talks and viewgraphs used in the presentations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1149/CS-TR-87-1149.pdf %R CS-TR-87-1153 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Optimum Grip of a Polygon %A Markenscoff, Xanthippi %A Papadimitriou, Christos %D April 1987 %X It has been shown by Baker, Fortune and Grosse that any two-dimensional polygonal object can be prehended stably with three fingers, so that its weight (along the third dimension) is balanced. Besides, in this paper we show that form closure of a polygon object can be achieved by four fingers (previous proofs were not complete). We formulate and solve the problem of finding the optimum stable grip or form closure of any given polygon. For stable grip it is most natural to minimize the forces needed to balance through friction the object's weight along the third dimension. For form closure, we minimize the worst-case forces needed to balance any unit force acting on the center of gravity of the object. The mathematical techniques used in the two instances are an interesting mix of Optimization and Euclidean geometry. Our results lead to algorithms for the efficient computation of the optimum grip in each case. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1153/CS-TR-87-1153.pdf %R CS-TR-87-1154 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Programming and Problem-Solving Seminar %A Rokicki, T. G. %A Knuth, D. E. %D April 1987 %X This report contains edited transcripts of the discussions held in Stanford's course CS304, Problem Seminar, during winter quarter 1987. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms were touched on during the discussions, these notes may be of interest to graduate students of computer science at other universities, as well to their professors and to professional people in the "real world." The present report is the seventh in a series of such transcripts, continuing the tradition established in STAN- CS-77-606 (Michael J. Clancy, 1977), STAN-CS-79-707 (Chris Van Wyk, 1979), STAN-CS-81-863 (Allan A. Miller, 1981), STAN-CS-83-989 (Joseph S. Weening, 1983), STAN-CS-83-990 (John D. Hobby, 1983), and STAN-CS-85-1055 (Ramsey W. Haddad, 1985). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1154/CS-TR-87-1154.pdf %R CS-TR-87-1155 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Experiments in Automatic Theorem Proving %A Bellin, G. %A Ketonen, J. %D December 1986 %X The experiments described in this report are proofs in EKL of properties of different LISP programs operating different representations of the same mathematical structures -- finite permutations. EKL is an interactive proof checker based upon the language of higher order logic, higher order unification and a decision procedure for a fragment of first order logic. The following questions are asked: What representations of mathematical structure and facts are better suited for formalization and also applicable to several interesting situations? What methods and strategies will make it possible to prove automatically an extensive body of mathematical knowledge? Can higher order logic be conveniently applied in the proof of elementary facts? The fact (*) that finite permutations form a group is proved from the axioms of arithmetic and elementary set theory, via the "Pigeon Hole Principle" (PHP). Permutations are represented (1) as association lists and (2) as lists of numbers. In representation (2) operations on permutations are represented (2.1) using predicates (2.2) using functions. Proofs of (*) using the different representations are compared. The results and conclusions include the following. Methods to control the rewriting process and to replace logic inference by high order rewriting are presented. PHP is formulated as a second order statement which is then easily applied to (1) and (2). This demonstrates the value of abstract, higher order formulation of facts for application in different contexts. A case is given in which representation of properties of programs by predicates may be more convenient than by functions. Evidence is given that convenient organization of proofs into lemmata is essential for large scale computer aided theorem proving. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1155/CS-TR-87-1155.pdf %R CS-TR-87-1156 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Dynamic Tree Expression Problem %A Mayr, Ernst W. %D May 1987 %X Presented is a uniform method for obtaining efficient parallel algorithms for a rather large class of problems. The method is based on a logic programming model, and it derives its efficiency form fast parallel routines for the evaluation of expression trees. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1156/CS-TR-87-1156.pdf %R CS-TR-87-1157 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Network Implementation of the DTEP Algorithm %A Mayr, E. W. %A Plaxton, C. G. %D May 1987 %X The dynamic tree expression problem (DTEP) was defined in [Ma87]. In this paper, efficient implementations of the DTEP algorithm are developed for the hypercube, butterfly, perfect shuffle and multidimensional mesh of trees families of networks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1157/CS-TR-87-1157.pdf %R CS-TR-87-1159 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Muir: A Tool for Language Design %A Winograd, Terry A. %D May 1987 %X Muir is a language design environment, intended for use in creating and experimenting with languages such as programming languages, specification languages, grammar forrnalisms, and logical notations. It provides facilities for a language designer to create a language specification, which controls the behavior of generic language manipulating tools typically found in a language-specific environment, such as structure editors, interactive interfaces, storage management and attribute analysis. It is oriented towards use with evolving languages, providing for mixed structures (combining different versions), semi-automated updating of structures from one language version to another, and incremental language specification. A new hierarchical grammar formalism serves as the framework for language specification, with multiple presentation formalisms and a unified interactive environment based on an extended notion of edit operations. A prototype version is operating and has been tested on a small number of languages. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1159/CS-TR-87-1159.pdf %R CS-TR-87-1160 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Strategic Computing Research and the Universities %A Winograd, Terry A. %D March 1987 %X The Strategic Computing Initiative offers the potential of new research funds for university computer science departments. As with all funds, they bring benefits and can have unwanted strings attached. In the case of military funding, the web of attached strings can be subtle and confusing. The goal of this paper is to delineate some of these entanglements and perhaps provide some guidance for loosening and eliminating them. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1160/CS-TR-87-1160.pdf %R CS-TR-87-1166 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel Execlltion of OPSS in QLISP %A Okuna, H. G. %A Gupta, A. %D June 1987 %X Production systems (or rule-based systems) are widely used for the development of expert systems. To speed-up the execution of production systems, a number of different approaches are being taken, a majority of them being based on the use of parallelism. In this paper, we explore the issues involved in the parallel implementation of OPS5 (a widely used production-system language) in QLISP (a parallel dialect of Lisp proposed by John McCarthy and Richard Gabriel). This paper shows that QLISP can easily encode most sources of parallelism in OPS5 that have been previously discussed in literature. This is significant because the OPS5 interpreter is the first large program to be encoded in QLISP, and as a result, this is the first practical demonstration of the expressive power of QLISP. The paper also lists the most commonly used QLISP constructs in the parallel implementation (and the contexts in which they are used), which serve as a hint to the QLISP implementor about what to optimize. Also discussed is the exploitation of speculative parallelism in RHS-evaluation for OPSS. This has not been previously discussed in the literature. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1166/CS-TR-87-1166.pdf %R CS-TR-87-1168 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Representing Control Knowledge as Abstract Task and Metarules %A Clancey, W. J. %A Bock, C. %D April 1985 %X A poorly designed knowledge base can be as cryptic as an arbitrary program and just as difficult to maintain. Representing inference procedures abstractly, separately from domain facts and relations, makes the design more transparent and explainable. The combination of abstract procedures and a relational language for organizing domain knowledge provides a generic framework for constructing knowledge bases for related problems in other domains and also provides a useful starting point for studying the nature of strategies. In HERACLES, inference procedures are represented as abstract metarules, expressed in a form of the predicate calculus, organized and controlled as rule sets. A compiler converts the rules into Lisp code and allows domain relations to be encoded as arbitrary data structures for efficiency. Examples are given of the explanation and teaching capabilities afforded by this representation. Different perspectives for understanding HERACLES' inference procedure and how it defines knowledge bases are discussed in some detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1168/CS-TR-87-1168.pdf %R CS-TR-87-1170 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Viewing Knowledge Bases as Qualitative Models %A Clancey, William J. %D May 1986 %X The concept of a qualitative model provides a unifying perspective for understanding how expert systems differ from conventional programs. Knowledge bases contain qualitative models of systems in the world, that is primarily non-numeric descriptions that provide a basis for explaining and predicting behavior and formulating action plans. The prevalent view that a qualitative model must be a simulation, to the exclusion of prototypic and behavioral descriptions, has fragmented our field, so that we have failed to usefully synthesize what we have learned about modeling processes. For example, our ideas about "scoring functions" and "casual network traversal," developed apart from a modeling perspective, have obscured the inherent explanatory nature of diagnosis. While knowledge engineering has greatly benefited from the study of human experts as a means of informing model construction, overemphasis on modeling the expert's knowledge has detracted from the primary objective of modeling a system in the world. Placing AI squarely in the evolutionary line of telelogic and topologic modeling, this talk argues that the study of network representations has established a foundation for a science and engineering of qualitative models. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1170/CS-TR-87-1170.pdf %R CS-TR-87-1173 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Review of Winograd and Flores' Understanding Computers and Cognition %A Clancey, William J. %D July 1986 %X AI researchers and cognitive scientists commonly believe that thinking involves manipulating representions. Thinking involves search, inference, and making choice. This is how we model reasoning and what goes on in the brain is similar. Winograd and Flores present a radically different view. They claim that our knowledge is not represented in the brain at all, but rather consists of an unformalized shared background, from which we articulate representations in order to cope with new situations. In constrast, computer programs contain only pre-selected objects and properties, and there is no basis for moving beyond this initial formalization when breakdown occurs. Winograd and Flores provide convincing arguments with examples familiar to most AI researchers. However, they significally understate the role of representation in mediating intelligent behavior, specifically in the process of reflection, when representations are generated prior to physical action. Furthermore, they do not consider the practical benefits of expert systems and the extent of what can be accomplished. Nevertheless, the book is crisp and stimulating. It should make AI researchers more cautious about what they are doing, more aware of the nature of formalization, and more open to alternative views. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1173/CS-TR-87-1173.pdf %R CS-TR-87-1174 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Intelligent Tutoring Systerns: A Tutorial Survey %A Clancey, William J. %D September 1986 %X This survey of Intelligent Tutoring Systems is based on a tutorial originally presented by John Seely Brown, Richard R. Burton (Xerox - PARC, USA) and William J. Clancey at the National Conference on AI (AAAI) in Austin, TX in August, 1984. The survey describes the components of tutoring systems, different teaching scenarios, and their relation to a theory of instruction. The underlying pedagogical approach is to make latent knowledge manifest, which the research accomplishes by different forms of qualitative modeling: simulating physical processes; simulating expert problem-solving, including strategies for montoring and controling problem solving (metacognition); modeling the plans behind procedural behavior; and forcing articulation of model inconsistencies through the Socratic method of instruction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1174/CS-TR-87-1174.pdf %R CS-TR-87-1177 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Log Files: An Extended File Service Exploiting Write-Once Storage %A Finlayson, R. S. %A Cheriton, D. R. %D August 1987 %X A log service provides efficient storage and retrieval of data that is written sequentially (append-only) and not subsequently modified. Application programs an subsystems use log services for recovery, to record security audit trails, and for perforrnance monitoring. Ideally, a log service should accomodate very large, long-lived logs, and provlde efficient retrieval and low space overhead. In this paper, we describe the design and implementation of the Clio log service. Clio provides the abstraction of log files: readable, append-only files that are accessed in the same way as conventional files. The underlying storage medium is required only to be append-only; more general types of write access are not necessary. We show how log files can be implemented efficiently and robustly on top of such storage media - in particular, write-once. optical disk. In addition, we describe a general application software storage architecture that makes use of log files. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1177/CS-TR-87-1177.pdf %R CS-TR-87-1175 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Using and Evaluating Differential Modeling in Intelligent Tutoring and Apprentice Learning Systems %A Wilkin, D. C. %D January 1987 %X A powerful approach to debugging and refining the knowledge structures of a problem solving agent is to differentially model the actions of the agent against a gold standard. This paper proposes a framework for exploring the inherent limitations of such an approach when a problem solver is differentially modeled againt an expert system. A procedure is described for determining a performance upper bound for debugging via differential modeling, called the synthetic agent method. The synthetic agent method systematically explores the space of near miss training instances and expresses the limits of debugging in terrns of the knowledge representation and control language constructs of the expert system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1175/CS-TR-87-1175.pdf %R CS-TR-87-1178 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Dynamic, Cut-Through Communications Protocol with Multicast %A Byrd, G. T. %A Nakano, R. %A Delagi, B. A. %D September 1987 %X This paper describes a protocol to support point-to-point vinterprocessor communications with multicast. Dynamic, cut-through routing with local flow control is used to provide a high-throughput, low-latency communications path between processors. In addition, multicast transmissions are available, in which copies of a packet are sent to multiple destinations using common resources as much as possible. special packet terminators and selective buffering are introduced to avoid deadlock during multicasts. A simulated implementation of the protocol is also described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1178/CS-TR-87-1178.pdf %R CS-TR-87-1180 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography; Department of Computer Science Technical Reports, 1963-1988 %A Na, Taleen M. %D January 1988 %X This report lists, in chronological order, all reports published by the Stanford Computer Science Department (CSD) since 1963. Each report is identified by CSD number, author's name, title, number of pages, and date. If a given report is available from the department at the time of the Bibliography's printing, price is listed. For convenience, an author index, ordering information, codes, and alternative sources are also included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1180/CS-TR-87-1180.pdf %R CS-TR-87-1181 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Debugging Rule Sets When Reasoning Under Uncertainty %A Wilkins, D. C. %A Buchanan, B. G. %D May 1987 %X Heuristic inference rules with a measure of strength less than certainty have an unusual property: better individual rules do not necessarily lead to a better overall rule set. All less-than-certain rules contribute evidence towards erroneous conclusions for some problem instances, and the distribution of these erroneous conclusions over the instances is not necessarily related to individual rule quality. This has important consequences for automatic machine learning of rules, since rule selection is usually based on measures of quality of individual rules. In this paper, we explain why the most obvious and intuitively reasonable solution to this probelm, incremental modification and deletion of rules responsible for wrong conclusions a la Teiresias, is not always appropriate. In our experience, it usually fails to converge to an optimal set of rules. Given a set of heuristic rules, we explain why the best rule set should be considered to be the element of the power set of rules that yields a global minimum error with respect to generating erroneous positive and negative conclusions. This selection process is modeled as a bipartite graph minimization problem and shown to be NP-complete. A solution method is described, the Antidote Algorithm, that performs a model-directed search of the rule space. On an example from medical diagnosis, the Antitdote Algortithm signif1cantly reduced the number of misdiagnoses when applied to a rule set generated from 104 training instances. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1181/CS-TR-87-1181.pdf %R CS-TR-87-1182 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Knowledge Base Refinement by Monitoring Abstract Control Knowledge %A Wilkins, D. C. %A Buchanan, B. G. %D August 1987 %X An explicit representation of the problem solving method of an expert system shell as abstract control knowledge provides a powerful foundation for learning. This paper describes the abstract control knowledge of the Heracles expert system shell for heuristic classification problems, and describes how the Odysseus apprenticeship learning program uses this representation to automate "end-game" knowledge acquisition. Particular emphasis is given to showing how abstract control knowledge facilitates the use of underlying domain theories by a learning program. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1182/CS-TR-87-1182.pdf %R CS-TR-87-1183 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Knowledge Engineer as Student: Metacognitive bases for asking good questions %A Clancey, W. J. %D January 1987 %X Knowledge engineers are efficient, active leamers. They systematically approach domains and acquire knowledge to solve routine, practical problems. By modeling their methods, we may develop a basis for teaching other students how to direct their own learning. In particular, a knowledge engineer is good at detecting gaps in a knowledge base and asking focused questions to improve an expert system's performance. This ability stems from domain-general knowledge about: problem-solving procedures, the categorization of routine problem-solving knowledge, and domain and task differences. this paper studies these different forms of metaknowledge, and illustrates its incorporation in an intelligent tutoring system. A model of learning is presented that describes how the knowledge engineer detects problem-solving failures and tracks them back to gaps in domain knowledge, which are then reformulated as questions to ask a teacher. We describe how this model of active learning is being developed and tested in a knowledge acquisition program for an expert system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1183/CS-TR-87-1183.pdf %R CS-TR-87-1184 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Firmware Approach to Fast Lisp Interpreter %A Okuno, H. %A Osato, N. %A Takeuchi, I. %D September 1987 %X The approach to speed up a Lisp interpreter by implementing it in firmware seems promising. A microcoded Lisp interpreter shows good performance for very simple benchmarks, while it often fails to provide good performance for larger benchmarks and applications unless speedup techniques are devised for it. This was the case for the TAO/ELIS system. This paper describes various techniques devised for the TAO/ELIS system in order to speed up the interpreter of the TAO language implemented on the ELIS Lisp machine. The techniques include data type dispatch, variable access, function call and so on. TAO is not only upward compatible with Common Lisp, but also incorporates logic programming, object-oriented programming and Fortran/C-like programming into Lisp programming. TAO also provides concurrent programming and supports multiple users (up to eight users). The TAO interpreter for those programming paradigms is coded fully in microcodes. In spite of rich functionalities, the speed of interpreted codes of TAO is comparable to that of compiled codes of commercial Lisp machines. Furthermore, the speeds of the interpreted codes of the same program written in various prograrnming paradigms in TAO does not differ so much. This speed balance is very important for the user. Another outstanding feature of the TAO/ELIS system is its firmware development environments. Micro Assembler and Linker are written in TAO, which enables the user to use the capability of TAO in microcodes. Since debugging tools are also written in mini-Lisp, many new tools were developed in parallel to debugging of microcodes. This high level approach to firmware development environments is very important to provide high productivity of development. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1184/CS-TR-87-1184.pdf %R CS-TR-87-1185 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Blazenet: A Photonic Implementable Wide-Area Network %A Haas, Z . %A Cheriton, D. R. %D December 1987 %X High-performance wide-area networks are required to interconnect clusters of computers connected by local area and metropolitan area networks. Optical fiber technology provides long distance channels in the multi-gigabit per second range. The challenge is to provide switching nodes that handle these data rates with minimum delay, and at a reasonable cost. In this paper, we describe a packet switching network, christened Blazenet, that provides low delay and has minimal memory requirements. It can be extended to support multicast and priority delivery. Such a network can revolutionize the opportunities for distributed command and control, information and resources sharing, real-time conferencing, and wide-area parallel computation, to mention but a few applications. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1185/CS-TR-87-1185.pdf %R CS-TR-87-1186 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Hierarchy of Temporal Properties %A Manna, Z ohar %A Pnueli, Amir %D October 1987 %X We propose a classification of temporal properties into a hierarchy which refines the known safety-liveness classification of properties. The new classification recognizes the classes of safety, guarantee, persistence, fairness, and hyper-fairness. The classification suggested here is based on the different ways a property of finite computations can be extended into a property of infinite computations. For properties that are expressible by temporal logic and predicate automata, we provide a syntactic characterization of the formulae and automata that specify properties in the different classes. We consider the verification of properties over a given program, and provide a unique proof principle for each class. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1186/CS-TR-87-1186.pdf %R CS-TR-87-1188 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Experiments with a Knowledge-Based System on a Multiprocessor %A Nakano, Russell %A Minami, Masafumi %D October 1987 %X This paper documents the results we obtained and the lessons we learned in the design, implementation, and execution of a simulated real-time application on a simulated parallel processor. Specifically, our parallel program ran 100 times faster on a 100-processor multiprocessor. The machine architecture is a distributed-memory multiprocessor. The target machine consists of 10 to 1000 processors, but because of simulator limitations, we ran simulations of machines consisting of 1 to 100 processors. Each processor is a computer with its own local memory, executing an independent instruction stream. There is no global shared memory; all processes communicate by message passing. The target programming environment, called Lamina, encourages a programming style that stresses performance gains through problem decomposition, allowing many processors to be brought to bear on a problem. THe key is to distribute the processing load over replicated objects, and to incresase throughput by building pipelined sequences of objects that handle stages of problem solving. We focused on a knowledge-based application that simulates real-time understanding of radar tracks, called Airtrac. This paper describes a portion of the Airtrac application implemented in Lamina and a set of experiments that we performed. We confirmed the following hypotheses: 1) Performance of our concurrent program improves with additional processors, and thereby attains a significant level of speedup. 2) Correctness of our concurrent program can be maintained despite a high degree of problem decomposition and highly overloaded input data conditions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1188/CS-TR-87-1188.pdf %R CS-TR-87-1189 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Instrumented Architectural Simulation %A Delagi, Bruce A. %A Saraiya, Nakul %A Nishimura, Sayuri %A Byrd, Greg %D November 1987 %X Simulation of systems at an architectural level can offer an effective way to study critical design choices if (1) the performance of the simulator is adequate to examine designs executing significant code bodies -- not just toy problems or small application fragments, (2) the details of the simulation include the critical details of the design, (3) the view of the design presented by the simulator instrumentation leads to useful insights on the problems with the design, and (4) there is enough flexibility in the simulation system so that the asking of unplanned questions is not suppressed by the weight of the mechanics involoved in making changes either in the design or its measurement. A simulation system with these goals is described together with the approach to its implementation. Its application to the study of a particular class of multiprocessor hardsware system architectures is illustrated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/87/1189/CS-TR-87-1189.pdf %R CS-TR-88-1195 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Lower Bound for Radio Broadcast %A Bar-Noy, A. %A Linial, N. %A Peleg, D. %D February 1988 %X A radio network is a synchronous network of processors that communicate by transmitting messages to their neighbors, where a processor receives a message in a given step if and only if it is silent in this step and precisely one of its neighbors transmits. In this paper we prove the existence of a family of radius-2 networks on n vertices for which any broadcast schedule requires at least Omega((log n/ log log n)2) rounds of transmissions. This almost matches an upper bound of O(log2 n) rounds for networks of radius 2 proved earlier by Bar-Yehuda, Goldreich, and Itai. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1195/CS-TR-88-1195.pdf %R CS-TR-88-1196 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Motion Planning with Uncertainty: The Preimage Backchaining Approach %A Latombe, Jean-Claude %D March 1988 %X This paper addresses the problem of planning robot motions in the presence of uncertainty. It explores an approach to this problem, known as the preimage backchaining approach. Basically, a preimage is a region in space, such that if the robot executes a certain motion command from within this region, it is guaranteed to attain a target and to terminate into it. Preimage backchaining consists of reasoning backward from a given goal region, by computing preimages of the goal, and then recursively preimages of the preimages, until some preimages include the initial region where it is known at planning time that the robot will be before executing the motion plan. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1196/CS-TR-88-1196.pdf %R CS-TR-88-1197 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The VMP Multiprocessor: Initial Experience, Refinements and Performance Evaluation %A Cheriton, D. R. %A Gupta, A. %A Boyle, P. D. %A Goosen, H. A. %D March 1988 %X VMP is an experimental multiprocessor being developed at Stanford University, suitable for high-performance workstations and server machines. Its primary novelty lies in the use of software management of the preprocessor caches and the design decisions in the cache and bus that make this approach feasible. The design and some uniprocessor trace-driven simulations indicating its perforrnance have been reported previously. In this paper, we present our initial experience with the VMP design based on a running prototype as well as various refinements to the design. Performance evaluation is based both on measurement of actual execution as well as trace-driven simulation of multiprocessor executions from the Mach operating system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1197/CS-TR-88-1197.pdf %R CS-TR-88-1199 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Projections of Vector Addition System Reachability Sets are Semilinear %A Buning, H. K. %A Lettman, T. %A Mayr, E. W. %D March 1988 %X The reachability sets of Vector Addition Systems of dimension six or more can be non-semilinear. This may be one reason why the inclusion problem (as well as the equality problem) for reachability sets of vector addition systems in general is undecidable, even though the reachability problem itself is known to be decidable. We show that any one-dimensional projection of the reachability set of an arbitrary vector addition system is semilinear, and hence, "simple". %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1199/CS-TR-88-1199.pdf %R CS-TR-88-1200 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel Approximation Algorithms for Bin Packing %A Anderson, R. J. %A Mayr, E. W. %A Warmuth, M. K. %D March 1988 %X We study the parallel complexity of polynomial heuristics for the bin packing problem. We show that some well-known (and simple) moethods like first-fit- decreasing are P-complete, and it is hence very unlikely that they can be efficiently parallelized. On the other hand, we exhibit an optimal NC algorithm that achieves the same performance bound as does FFD. Finally, we discuss parallelization of polynomial approximation algorithms for bin packing based on discretization. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1200/CS-TR-88-1200.pdf %R CS-TR-88-1203 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the Semantics of Temporal Logic Prograrnrning %A Baudinet, Marianne %D June 1988 %X Recently, several researchers have suggested directly exploiting in a programming language temporal logic's ability to describe changing worlds. The resulting languages are quite diverse. They are based on different subsets of temporal logic and use a variety of execution mechanisms. So far, little attention has been paid to the formal semantics of these languages. In this paper, we study the semantics of an instance of temporal logic programming, namely, the TEMPLOG language defined by Abadi and Manna. We first give declarative semantics for TEMPLOG, in model-theoretic and in fixpoint terms. Then, we study its operational semantics and prove soundness and completeness theorems for the temporal-resolution proof method underlying its execution mechanism. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1203/CS-TR-88-1203.pdf %R CS-TR-88-1206 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Parallel Lisp Simulator %A Weening, Joseph S. %D May 1988 %X CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an examples its use with a parallel Lisp program. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1206/CS-TR-88-1206.pdf %R CS-TR-88-1208 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Toetjes %A Feder, Tomas %D June 1988 %X A number is secretly chosen from the interval [0, 1], and n players try to guess this number. When the secret number is revealed, the player with the closest guess wins. We describe an optimal strategy for a version of this game. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1208/CS-TR-88-1208.pdf %R CS-TR-88-1209 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Combinatorial Algorithms for the Generalized Circulation Problem %A Goldberg, A. V. %A Plotkin, S. A. %A Tardos, E. %D June 1988 %X We consider a generalization of the maximum flow problem in which the amounts of flow entering and leaving an arc are linearly related. More precisely, if x(e) units of flow enter an arc e, x(e) gamma(e) units arrive at the other end. For instance, nodes of the graph can correspond to different currencies, with the multipliers being the exchange rates. We require conservation of flow at every node except a given source node. The goal is to maximize the amount of flow excess at the source. This problem is a special case of linear programming, and therefore can be solved in polynomial time. In this paper we present the first polynomial time combinatorial algorithms for this problem. The algorithms are simple and intuitive. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1209/CS-TR-88-1209.pdf %R CS-TR-88-1211 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sublinear-Time Parallel Algorithms %A Goldberg, A. V. %A Plotkin, S. A. %A Vaidya, P. M. %D June 1988 %X This paper presents the first sublinear-time deterministic parallel algorithms for bipartite matching and several related problems, including maximal node-disjoint paths, depth-first search, and flows in zero-one networks. Our results are based on a better understanding of the combinatorial structure of the above problems, which leads to new algorithmic techniques. In particular, we show how to use maximal matching to extend, in parallel, a current set of node-disjoint paths and how to take advantage of the parallelism that arises when a large number of nodes are "active" during an execution of a push/relabel network flow algorithm. We also show how to apply our techniques to design parallel algorithms for the weighted versions of the above problems. In particular, we present sublinear-time deterministic parallel algorithms for finding a minimum-weight bipartite matching and for finding a minimum-cost flow in a network with zero-one capacities, if the weights are polynomially bounded integers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1211/CS-TR-88-1211.pdf %R CS-TR-88-1210 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T String-Functional Semantics for Formal Verification of Synchronous Circuits %A Bronstein, Alexandre %A Talcott, Carolyn L. %D June 1988 %X A new functional semantics is proposed for synchronous circuits, as a basis for reasoning formally about that class of hardware systems. Technically, we define an extensional semantics with monotonic length-preserving functions on finite strings, and an intensional semantics based on functionals on those functions. As support for the semantics we prove the equivalence of the extensional semantics with a simple operational semantics, as well as a characterization of circuits which obey the "every loop is clocked" design rule. Also, we develop the foundations in complete detail both to increase confidence in the theory, and as a prerequisite to its future mechanization. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1210/CS-TR-88-1210.pdf %R CS-TR-88-1214 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Multicast Routing in Internetworks and Extended LANs %A Deering, Stephen E. %D June 1988 %X Multicasting is used within local-area networks to make distributed applications more robust and more efficient. The growing need to distribute applications across multiple, interconnected networks, and the increasing availability of high-performance, high-capacity switching nodes and networks, lead us to consider providing LAN-style multicasting across an internetwork. In this paper, we propose extensions to two common internetwork routing algorithms -- distance-vector routing and link-state routing -- to support low-delay datagram multicasting. We also suggest modifications to the single-spanning-tree routing algorithm, commonly used by link-layer bridges, to reduce the costs of multicasting in large extended LANs. Finally, we show how different link-layer and network-layer multicast routing algorithms can be combined hierarchically to support multicasting across large, heterogeneous internetworks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1214/CS-TR-88-1214.pdf %R CS-TR-88-1218 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Square Meshes are not always Optimal %A Bar-Noy, Amotz %A Peleg, David %D August 1988 %X In this paper we consider mesh connected computers with multiple buses, providing broadcast facilities along rows and columns. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1218/CS-TR-88-1218.pdf %R CS-TR-88-1225 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel Approximation Algorithms %A Mayr, Ernst W. %D September 1988 %X Many problems of great practical importance are hard to solve computationally, at least if exact solutions are required. We survey a number of (NP- or P-complete) problems for which fast parallel approximation algorithms are known: The 0-1 knapsack problem, binpacking, the minimal makeshift problem, the list scheduling problem, greedy scheduling, and the high density subgraph problem. Algorithms for these problems are presented highlighting the underlying techniques and principles, and several types of parallel approximation schemes are exhibited. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1225/CS-TR-88-1225.pdf %R CS-TR-88-1226 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Making Intelligent Systems Adaptive %A Hayes-Roth, Barbara %D October 1988 %X Contemporary intelligent systems are isolated problem-solvers. They accept particular classes of problems, reason about them, perhaps request additional information, and eventually produce solutions. By contrast, human beings and other intelligent animals continuously adapt to the demands and opportunities presented by a dynamic environment. Adaptation plays a critical role in everyday behaviors, such as conducting a conversation, as well as in sophisticated professional behaviors, such as monitoring critically ill medical patients. To make intelligent systems similarly adaptive, we must augment their reasoning capabilities with capabilities for perception and action. Equally important, we must endow them with an attentional mechanism to allocate their limited computational resources among competing perceptions, actions, and cognitions, in real time. In this paper, we discuss functional objectives for "adaptive intelligent systems," an architecture designed to achieve those objectives, and our continuing study of both objectives and architecture in the context of particular tasks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1226/CS-TR-88-1226.pdf %R CS-TR-88-1227 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finding Minimum-Cost Flows by Double-Scaling %A Ahuja, R. K. %A Goldberg, A. V. %A Orlin, J. B. %A Tarjan, R. E. %D October 1988 %X Several researchers have recently developed new techniques that give fast algorithms for the minimum-cost flow problem. In this paper we combine several of these techniques to yield an algorithm running in O(nm log log Ulog(nC)) time on networks with n vertices, m edges, maximum arc capacity U, and maximum arc cost magnitude C. The major techniques used are the capacity-scaling approach of Edmonds and Karp, the excess-scaling approach of Ahuja and Orlin, the cost-scaling approach Goldberg and Tarjan, and the dynamic tree data structure of Sleator and Tarjan. For nonsparse graphs with large maximum arc capacity, we obtain a similar but slightly better bound. We also obtain a slightly better bound for the (noncapacitated) transportation problem. In addition, we discuss a capacity-bounding approach to the minimum-cost flow problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1227/CS-TR-88-1227.pdf %R CS-TR-88-1228 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Parallel Algorithm for Finding a Blocking Flow in an Acyclic Network %A Goldberg, A. V. %A Tarjan, R. E. %D November 1988 %X We propose a simple parallel algorithm for finding a blocking flow in an acyclic network. On an n-vertex, m-arc network, our algorithm runs in O(n log n) time and O(nm) space using an m-processor EREW PRAM. A consequence of our algorithm is an O(n2 (log n) log (nC)-time, O(nm)-space, m-processor algorithm for the minimum-cost circulation problem, on a network with integer arc capacities of magnitude at most C. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1228/CS-TR-88-1228.pdf %R CS-TR-88-1229 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Distributing Intelligence within an Individual %A Hayes-Roth, B. %A Hewett, M. %A Washington, R. %A Hewett, R. %A Seiver, A. %D November 1988 %X Distributed artificial intelligence (DAI) refers to systems in which decentralized, cooperative agents work synergistically to perform a task. Altemative specifications of DAI resemble particular biological or social systems, such as teams, contract nets, or societies. Our DAI model resembles a single individual comprising multiple loosely coupled agents for perception, action, and cognition functions. We demonstrate the DAI individual in the Guardian system for intensive-care monitoring and argue that it is more appropriate than the prevalent team model for a large class of similar applications. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1229/CS-TR-88-1229.pdf %R CS-TR-88-1230 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Specification and Verification of Concurrent Programs by For-All Automata %A Manna, Z ohar %A Pnueli, Amir %D November 1988 %X For-all automata are non-deterministic finite-state automata over infinite sequences. They differ from conventional automata in that a sequence is accepted if all runs of the automaton over the sequence are accepting. These automata are suggested as a formalism for the specification and verification of temporal properties of concurrent programs. It is shown that they are as expressive as extended temporal logic (ETL), and, in some cases, provide a more compact representation of properties than temporal logic. A structured diagram notation is suggested for the graphical representation of these automata. A single sound and complete proof rule is presented for proving that all computations of a program have the property specified by a for-all automaton. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1230/CS-TR-88-1230.pdf %R CS-TR-88-1233 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Procedural Semantics for Well Founded Negation in Logic Programs %A Ross, Kenneth A. %D November 1988 %X We introduce global SLS-resolution, a procedural semantics for well-founded negation as defined by Van Gelder, Ross and Schlipf. Global SLS-resolution extends Przymusinski's SLS-resolution, and may be applied to all programs, whether locally stratified or not. Global SLS-resolution is defined in terms of global trees, a new data structure representing the dependence of goals on derived negative subgoals. We prove that global SLS-resolutlon is sound with respect to the well-founded semantics, and complete for non-floundering queries. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1233/CS-TR-88-1233.pdf %R CS-TR-88-1234 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Average Number of Stable Matchings %A Pittel, Boris %D December 1988 %X The probable behavior of an instance of size n of the stable marriage problem, chosen uniformly at random, is studied. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1234/CS-TR-88-1234.pdf %R CS-TR-88-1236 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Time for Action: On the Relation between Time, Knowledge, and Action %A Shoham, Yoav %D December 1988 %X We consider the role played by the concept of action in AI. We first briefly summarize the advantages and limitations of past approaches to taking the concept as primitive, as embodied in the situation calculus and dynamic logic. We also briefly summarize the alternative, namely adopting a temporal framework, and point out its complementary advantages and limitations. We then propose a framework that retains the advantages of both viewpoints, and that ties the notion of action closely to that of knowledge. Specifically, we propose starting with the notion of time lines, and defining the notion of action as the ability to make certain choices among sets of time lines. Our definitions shed new light on the connection between time, action, knowledge and ignorance, choice-making, feasibility, and simultaneous reasoning about the same events at different levels of detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1236/CS-TR-88-1236.pdf %R CS-TR-88-1237 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Belief as Defeasible Knowledge %A Shoham, Yoav %A Moses, Yoram %D December 1988 %X We investigate the relation between the notions of knowledge and belief. Contrary to the well-known slogan about knowledge being "justified, true belief," we propose that belief be viewed as defeasible knowledge. Specifically, we offer a definition of belief as knowledge-relative-to-assumptions, and tie the definition to the notion of nonmonotonicity. Our definition has several advantages. First, it is short. Second, we do not need to add anything to the logic of knowledge: the right properties of belief fall out of the definition and the properties of knowledge. Third, the connection between knowledge and belief is derived from one fundamental principle, which is more enlightening than a collection of arbitrary-seeming axioms relating the two notions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1237/CS-TR-88-1237.pdf %R CS-TR-88-1239 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sorting, Minimal Feedback Sets and Hamilton Paths in Tournaments %A Bar-Noy, Amotz %A Naor, Joseph %D December 1988 %X We present a general method for translating sorting by comparisons algorithms to algorithms that compute a Hamilton path in a tournament. The translation is based on the relation between minimal feedback sets and Hamilton paths in tournaments. We prove that there is a one to one correspondence between the set of minimal feedback sets and the set of Hamilton paths. In the comparison model, all the tradeoffs for sorting between the number of processors and the number of rounds hold when a Hamilton path is computed. For the CRCW model, with O(n) processors, we show the following: (i) Two paths in a tournament can be merged in O(log log n) time (Valiant's algorithm): (ii) a Hamilton path can be computed in O (log n) time (Cole's algorithm). This improves a previous algorithm for computing a Hamilton path. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1239/CS-TR-88-1239.pdf %R CS-TR-88-1240 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Separat1ng the EREW and CREW PRAM Models %A Gafni, E. %A Naor, J. %A Ragde, P. %D December 1988 %X In [6], Snir proposed the Selection Problem (searching in a sorted table) to show that the CREW PRAM is strictly more powerful than the EREW PRAM. This problem defines a partial function, that is, one that is defined only on a restricted set of inputs. Recognizing whether an arbitrary input belongs to this restricted set is hard for both CREW and EREW PRAMs. The existence of a total function that exhibits the power of the CREW model over the EREW model was an open problem. Here we solve this problem by generalizing the Selection problem to a Decision Tree problem which is defined on a full domain and to which Snir's lower bound applies. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/88/1240/CS-TR-88-1240.pdf %R CS-TR-85-1035 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T RESIDUE: a deductive approach to design synthesis %A Finger, J. J. %A Genesereth, Michael R. %D January 1985 %X We present a new approach to deductive design synthesis, the Residue Approach, in which designs are represented as sets of constraints. Previous approaches, such as PROLOG [18] or the work of Manna and Waldinger [11], express designs as bindings on single terms. We give a complete and sound procedure for finding sets of propositions constituting a legal design. The size of the search space of the procedure and the advantages and disadvantages of the Residue Approach are analysed. In particular we show how Residue can avoid backtracking caused by making design decisions of overly coarse granularity. In contrast, it is awkward for the single term approaches to do the same. In addition we give a rule for constraint propagation in deductive synthesis, and show its use in pruning the design space. Finally, Residue is related to other work, in particular, to Default Logic [16] and to Assumption-Based Truth Maintenance [1]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1035/CS-TR-85-1035.pdf %R CS-TR-85-1036 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Learning control heuristics in BB1 %A Hayes-Roth, Barbara %A Hewett, Micheal %D January 1985 %X BB1, a blackboard system building architecture, ameliorates the knowledge acquisition bottleneck with generic knowledge sources that learn control heuristics. Some learning knowledge sources replace the knowledge engineer, interacting directly with domain experts. Others operate autonomously. The paper presents a trace from the illustrative knowledge source. Understand-Preference, running in PROTEAN, a blackboard system for elucidating protein structure. Understand-Preference is triggered when a domain expert overrides one of BB1's scheduling recommendations. It identifies and encodes the heuristic underlying the expert's scheduling decision. The trace illustrates how learning knowledge sources exploit BB1's rich representation of domain and control knowledge, actions, and resuits. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1036/CS-TR-85-1036.pdf %R CS-TR-85-1037 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Expressiveness and language choice %A MacKinlay, Jock %A Genesereth, Michael R. %D January 1985 %X Specialized languages are often more appropriate than general languages for expressing certain information. However, specialized languages must be chosen carefully because they do not allow all sets of facts to be stated. This paper considers the problems associated with choosing among specialized languages. Methods are presented for determining that a set of facts is expressible in a language, for identifying when additional facts are stated accidentally, and for choosing among languages that can express a set of facts. This research is being used to build a system that automatically chooses an appropriate graphical language to present a given set of facts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1037/CS-TR-85-1037.pdf %R CS-TR-85-1038 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Uniform hashing is optimal %A Yao, Andrew C. %D January 1985 %X It was conjectured by J. Ullman that uniform hashing is optimal in its expected retrieval cost among all open-address hashing schemes (JACM 19 (1972), 569-575). In this paper we show that, for any open-address hashing scheme, the expected cost of retrieving a record from a large table which is alpha-fraction full is at least 1/alpha log 1/1-alpha + o(1). This proves Ullman's conjecture to be true in the asymptotic sense. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1038/CS-TR-85-1038.pdf %R CS-TR-85-1043 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Constructing a perfect matching is in Random NC %A Karp, Richard M. %A Upfal, Eli %A Wigderson, Avi %D March 1985 %X We show that the problem of constructing a perfect matching in a graph is in the complexity class Random NC: i.e., the problem is solvable in polylog time by a randomized parallel algorithm using a polynomial-bounded number of processors. We also show that several related problems lie in Random NC. These include: (i) Constructing a perfect matching of maximum weight in a graph whose edge weights are given in unary notation; (ii) Constructing a maximum-cardinality matching; (iii) Constructing a matching covering a set of vertices of maximum weight in a graph whose vertex weights are given in binary; (iv) Constructing a maximum s-t flow in a directed graph whose edge weights are given in unary. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1043/CS-TR-85-1043.pdf %R CS-TR-85-1047 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Smooth, easy to compute interpolating splines %A Hobby, John D. %D January 1985 %X We present a system of interpolating splines with first and approximate second order geometric continuity. The curves are easily computed in linear time by solving a system of linear equations without the need to resort to any kind of successive approximation scheme. Emphasis is placed on the need to find aesthetically pleasing curves in a wide range of circumstances; favorable results are obtained even when the knots are very unequally spaced or widely separated. The curves are invariant under scaling, rotation, and reflection, and the effects of a local change fall off exponentially as one moves away from the disturbed knot. Approximate second order continuity is achieved by using a linear "mock curvature" function in place of the actual endpoint curvature for each spline segment and choosing tangent directions at knots so as to equalize these. This avoids extraneous solutions and other forms of undesirable behavior without seriously compromising the quality of the results. The actual spline segments can come from any family of curves whose endpoint curvatures can be suitably approximated, but we propose a specific family of parametric cubics. There is freedom to allow tangent directions and "tension" parameters to be specified at knots, and special "curl" parameters may be given for additional control near the endpoints of open curves. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1047/CS-TR-85-1047.pdf %R CS-TR-85-1048 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some constructions for order-theoretic models of concurrency %A Pratt, Vaughan %D March 1985 %X We give "tight" and "loose" constructions suitable for specifying processes represented as sets of pomsets (partially ordered multisets). The tight construction is suitable for specifying "primitive" processes; it introduces the dual notions of concurrence and orthocurrence. The loose construction specifies a process in terms of a net of communicating subprocesses; it introduces lhe notion of a utilization embedding a process in a net. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1048/CS-TR-85-1048.pdf %R CS-TR-85-1049 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The pomset model of parallel processes: unifying the temporal and the spatial %A Pratt, Vaughan %D January 1985 %X No abstract. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1049/CS-TR-85-1049.pdf %R CS-TR-85-1050 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast sequential algorithms to find shuffle-minimizing and shortest paths in a shuffle-exchange network %A Hershberger, John %A Mayr, Ernst %D May 1985 %X This paper analyzes the problem of finding shortest paths and shuffle-minimizing paths in an n-node shuffle-exchange network, where n = $2^m$. Such paths have the properties needed by the Valiant-Brebner permutation routing algorithm, unlike the trivial (m - 1)-shuffle paths usually used for shuffle-exchange routing. The Valiant-Brebner algorithm requires n simultaneous route computations, one for each packet to be routed, which can be done in parallel. We give fast sequential algorithms for both problems we consider. Restricting the shortest path problem to allow only paths that use fewer than m shuffles provides intuition applicable to the general problem. Linear-time pattern matching techniques solve part of the restricted problem; as a consequence, a path using fewest shuffles can be found in O(m) time, which is optimal up to a constant factor. The shortest path problem is equivalent to the problem of finding the Hamming distances between a bitstring and all shifted instances of another. An application of the fast Fourier transform solves this problem and the shortest path problem in O(m log m) time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1050/CS-TR-85-1050.pdf %R CS-TR-85-1051 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Special relations in automated deduction %A Manna, Z ohar %A Waldinger, Richard %D May 1985 %X Two deduction rules are introduced to give streamlined treatment to relations of special importance in an automated theorem-proving system. These rules, the relation replacement and relation matching rules, generalize to an arbitrary binary relation the paramodulation and E-resolution rules, respectively, for equality, and may operate within a nonclausal or clausal system. The new rules depend on an extension of the notion of polarity to apply to subterms as well as to subsentences, with respect to a given binary relation. The rules allow us to eliminate troublesome axioms, such as transitivity and monotonicity, from the system; proofs are shorter and more comprehensible, and the search space is correspondingly deflated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1051/CS-TR-85-1051.pdf %R CS-TR-85-1053 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Transaction classification to survive a network partition %A Apers, Peter M. G. %A Wiederhold, Gio %D August 1984 %X When comparing centralized and distributed databases one of the advantages of distributed databases is said to be the greater availability of the data. Availability is defined as having access to the stored data for update and retrieval, even when some distributed sites are down due to hardware failures. We will investigate the functioning of a distributed database of which the underlying computer network may fail. A classification of transactions is given to allow an implementation of different levels of operatability. Some transactions can be guaranteed to commit in spite of a network partition, while others have to wait until the state of potential transactions in the other partitions is also known. An algorithm is given to compute a classification. Based on historics of transactions kept in the different partitions a merge of histories is computed, generating the new values for some data items when communication is re-established. Thc algorithm to compute the merge of the histories makes use of a knowledge base containing knowledge about the transactions, to decide whether to merge, delete, or delay a transaction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1053/CS-TR-85-1053.pdf %R CS-TR-85-1055 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Programming and Problem-Solving Seminar %A Haddad, Ramsey W. %A Knuth, Donald E. %D June 1985 %X This report contains edited transcripts of the discussions held in Stanford's course CS204, Problem Seminar, during winter quarter 1985. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms were touched on during the discussions, these notes may be of interest to graduate students of computer science at other universities, as well as to their professors and to professional people in the "real world." The present report is the sixth in a series of such transcripts, continuing the tradition established in STAN-CS-77-606 (Michael J. Clancy, 1977), STAN-CS-79-707 (Chris Van Wyk, 1979), STAN-CS-81-863 (Allan A. Miller, 1981), STAN-CS-83-989 (Joseph S. Weening, 1983), STAN-CS-83-990 (John D. Hobby, 1983). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1055/CS-TR-85-1055.pdf %R CS-TR-85-1056 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Nonclausal temporal deduction %A Abadi, Martin %A Manna, Z ohar %D June 1985 %X We present a proof system for propositional temporal logic. This system is based on nonclausal resolution; proofs are natural and generally short. Its extension to first-order temporal logic is considered. Two variants of the system are described. The first one is for a logic with $\Box$ ("always"), $\Diamond$ ("sometime"), and $\bigcirc$ ("next"). The second variant is an extension of the first one to a logic with the additional operators U ("until") and P ("precedes"). Each of these variants is proved complete. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1056/CS-TR-85-1056.pdf %R CS-TR-85-1058 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Host groups: a multicast extension for datagram internetworks %A Cheriton, David R. %A Deering, Stephen E. %D July 1985 %X The extensive use of local networks is beginning to drive requirements for internetwork facilities that connect these local networks. In particular, the availability of multicast addressing in many local networks and its use by sophisticated distributed applications motivates providing multicast across internetworks. In this paper, we propose a model of service for multicast in an internetwork, describe how this service can be used, and describe aspects of its implementation, including how it would fit into one existing internetwork architecture, namely the US DoD Internet Architecture. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1058/CS-TR-85-1058.pdf %R CS-TR-85-1062 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computer Science comprehensive examinations, 1981/82-1984/85 %A Keller, Arthur M. %D August 1985 %X This report is a collection of the eight comprehensive examinations from Winter 1982 through Spring 1985 prepared by the faculty and students of Stanford's Computer Science Department together with solutions to the problems posed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1062/CS-TR-85-1062.pdf %R CS-TR-85-1065 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Review of Sowa's "Conceptual Structures" %A Clancey, William J. %D March 1985 %X "Conceptual Structures" is a bold, provocative synthesis of logic, linguistics, and Artificial Intelligence research. At the very least, Sowa has provided a clean, well-grounded notation for knowledge representation that many researchers will want to emulate and build upon. At its best, Sowa's notation and proofs hint at what a future Principia Mathematica of knowledge and reasoning may look like. No other AI text achieves so much in breadth, style, and mathematical precision. This is a book that everyone in AI and cognitive science should know about, and that experienced researchers will profit from studying in some detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1065/CS-TR-85-1065.pdf %R CS-TR-85-1066 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Heuristic classification %A Clancey, William J. %D June 1985 %X A broad range of well-structured problems--embracing forms of diagnosis, catalog selection, and skeletal planning--are solved in "expert systems" by the method of heuristic classification. These programs have a characteristic inference structure that systematically relates data to a pre-enumerated set of solutions by abstraction, heuristic association, and refinement. In contrast with previous descriptions of classification reasoning, particularly in psychology, this analysis emphasizes the role of a heuristic in routine problem solving as a non-hierarchical, direct association between concepts. In contrast with other descriptions of expert systems, this analysis specifies the knowledge needed to solve a problem, independent of its representation in a particular computer language. The heuristic classification problem-solving model provides a useful framework for characterizing kinds of problems, for designing representation tools, and for understanding non-classification (constructive) problem-solving methods. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1066/CS-TR-85-1066.pdf %R CS-TR-85-1067 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Acquiring, representing, and evaluating a competence model of diagnostic strategy %A Clancey, William J. %D August 1985 %X NEOMYCIN is a computer program that models one physician's diagnostic reasoning within a limited area of medicine. NEOMYCIN's diagnostic procedure is represented in a well-structured way, separately from the domain knowledge it operates upon. We are testing the hypothesis that such a procedure can be used to simulate both expert problem-solving behavior and a good teacher's explanations of reasoning. The model is acquired by protocol analysis, using a framework that separates an expert's causal explanations of evidence from his descriptions of knowledge relations and strategies. The model is represented by a procedural network of goals and rules that are stated in terms of the effect the problem solver is trying to have on his evolving model of the world. The model is evaluated for sufficiency by testing it in different settings requiring expertise, such as providing advice and teaching. The model is evaluated for plausibility by arguing that the constraints implicit in the diagnostic procedure are imposed by the task domain and human computational capability. This paper discusses NEOMYCIN's diagnostic procedure in detail, viewing it as a memory aid, as a set of operators, as proceduralized constraints, and as a grammar. This study provides new perspectives on the nature of "knowledge compilation" and how an expert-teacher's explanations relate to a working program. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1067/CS-TR-85-1067.pdf %R CS-TR-85-1068 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T GUIDON-WATCH: a graphic interface for viewing a knowledge-based system %A Richer, Mark H. %A Clancey, William J. %D August 1985 %X This paper describes GUIDON-WATCH, a graphic interface that uses multiple windows and a mouse to allow a student to browse a knowledge base and view reasoning processes during diagnostic problem solving. Methods are presented for providing multiple views of hierarchical structures, overlaying results of a search process on top of static structures to make the strategy visible, and graphically expressing evidence relations between findings and hypotheses. This work demonstrates the advantages of stating a diagnostic search procedure in a well-structured, rule-based language, separate from domain knowledge. A number of issues in software design are also considered, including the automatic management of a multiple-window display. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1068/CS-TR-85-1068.pdf %R CS-TR-85-1072 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Programming and Problem-Solving Seminar %A Mayr, Ernst W. %A Anderson, Richard J. %A Hochschild, Peter H. %D October 1985 %X This report contains edited transcripts of the discussions held in Stanford's course CS204, Problem Seminar, during winter quarter 1984. The course topics consisted of five problems coming from different areas of computer science. The problems were discussed in class and solved and programmed by the students working in teams. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1072/CS-TR-85-1072.pdf %R CS-TR-85-1074 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Designing new typefaces with Metafont %A Southall, Richard %D September 1985 %X The report discusses issues associated with the symbolic design of new typefaces using programming languages such as Metafont. A consistent terminology for the subject area is presented. A schema for type production systems is described that lays stress on the importance of communication between the designer of a new typeface and the producer of the fonts that embody it. The methods used for the design of printers' type from the sixteenth century to the present day are surveyed in the context of this schema. The differences in the designer's task in symbolic and graphic design modes are discussed. A new typeface design made with Metafont is presented, and the usefulness of Metafont as a tool for making new designs considered. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1074/CS-TR-85-1074.pdf %R CS-TR-85-1075 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Expert systems: working systems and the research literature %A Buchanan, Bruce G. %D October 1985 %X Expert systems are the subject of considerable interest among persons in AI research or applications. There is no single definition of an expert system, and thus no precisely defined set of programs or set of literature references that represent work on expert systems. This report provides (a) a characterization of what an expert systems is, (b) a list of expert systems in routine use or field testing, and (c) a list of relevant references in the AI research literature. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1075/CS-TR-85-1075.pdf %R CS-TR-85-1076 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some approaches to knowledge acquisition %A Buchanan, Bruce G. %D July 1985 %X Knowledge acquisition is not a single, monolithic problem for AI. There are many ways to approach the topic in order to understand issues and design useful tools for constructing knowledge-based systems. Several of those approaches are being explored in the Knowledge Systems Laboratory (KSL) at Stanford. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1076/CS-TR-85-1076.pdf %R CS-TR-85-1079 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Two processor scheduling is in NC %A Helmbold, David %A Mayr, Ernst %D October 1985 %X We present a parallel algorithm for the two processor scheduling problem. This algorithm constructs an optimal schedule for unit execution time task systems with arbitrary precedence constraints using a polynomial number of processors and running in time polylog in the size of the input. Whereas previous parallel solutions for the problem made extensive use of randomization, our algorithm is completely deterministic and based on an interesting decomposition technique. And it is of independent relevance for two more reasons. It provides another example for the apparent difference in complexity between decision and search problems in the context of fast parallel computation, and it gives an NC-algorithm for the matching problem in certain restricted cases. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1079/CS-TR-85-1079.pdf %R CS-TR-85-1084 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Taliesin: a distributed bulletin board system %A Edighoffer, Judy L. %A Lantz, Keith A. %D September 1985 %X This paper describes a computer bulletin board facility intended to support replicated bulletin boards on a network that may frequently be in a state of partition. The two major design issues covered are the choice of a name space and the choice of replication algorithms. The impact of the name space on communication costs is explained. A special purpose replication algorithm that provides high availability and response despite network partition is introduced. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1084/CS-TR-85-1084.pdf %R CS-TR-85-1086 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Towards a universal directory service %A Lantz, Keith A. %A Edighoffer, Judy L. %A Hitson, Bruce L. %D August 1985 %X Directory services and name servers have been discussed and implemented for a number of distributed systems. Most have been tightly interwoven with the particular distributed systems of which they are a part; a few are more general in nature. In this paper we survey recent work in this area and discuss the advantages and disadvantages of a number of approaches. From this, we are able to extract some fundamental requirements of a naming system capable of handling a wide variety of object types in a heterogeneous environment. We outline how these requirements can be met in a universai directory service. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1086/CS-TR-85-1086.pdf %R CS-TR-85-1087 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Preemptable remote execution facilities for the V-system %A Theimer, Marvin M. %A Lantz, Keith A. %A Cheriton, David R. %D September 1985 %X A remote execution facility allows a user of a workstation-based distributed system to offload programs onto idle workstations, thereby providing the user with access to computational resources beyond that provided by his personal workstation. In this paper, we describe the design and performance of the remote execution facility in the V distributed system, as well as several implementation issues of interest. In particular, we focus on network transparency of the execution environment, preemption and migration of remotely executed programs, and avoidance of residual dependencies on the original host. We argue that preemptable remote execution allows idle workstations to be used a a "pool of processors" without interfering with use by their owners and without significant overhead for the normal execution of programs. In general, we conclude that the cost of providing preemption is modest compared to providing a similar amount of computation service by dedicated "computation engines". %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1087/CS-TR-85-1087.pdf %R CS-TR-85-1080 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The compleat guide to MRS %A Russell, Stuart %D June 1985 %X MRS is a logic programming system with extensive meta-level facilities. As such it can be used to implement virtually all kinds of artificial intelligence applications in a wide variety of architectures. This guide is intended to be a comprehensive text and reference for MRS. It also attempts to explain the foundations of the logic programming approach from the ground up, and it is hoped that it will thus provide access, even for the uninitiated, to all the benefits of AI methods. The only prerequisites for understanding MRS are a passing acquaintance with LISP and an open mind. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/85/1080/CS-TR-85-1080.pdf %R CS-TR-86-1085 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography of Computer Science reports, 1963-1986 %A Berg, Kathryn A. %A Marashian, Taleen %D June 1986 %X This report lists, in chronological order, all reports published by the Stanford Computer Science Department since 1963. Each report is identified by a Computer Science number, author's name, title, National Technical Information Service (NTIS) retrieval number (i.e., AD-XXXXXX), date, and number of pages. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1085/CS-TR-86-1085.pdf %R CS-TR-86-1093 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A general reading list for artificial intelligence %A Subramanian, Devika %A Buchanan, Bruce G. %D December 1985 %X This reading list is based on thc syllabus for the course CS229b offered in Winter 1985. This course was an intensive 10 week survey intended as preparation for the 1984-85 qualifying examination in Artificial Intelligence at Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1093/CS-TR-86-1093.pdf %R CS-TR-86-1094 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Expert systems: working systems and the research literature %A Buchanan, Bruce G. %D December 1985 %X Many expert systems have moved out of development laboratories into field test and routine use. About sixty such systems are listed. Academic research laboratories are contributing manpower to fuel the commercial development of AI. But the quantity of AI research may decline as a result unless the applied systems are experimented with and analyzed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1094/CS-TR-86-1094.pdf %R CS-TR-86-1095 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A torture test for METAFONT %A Knuth, Donald E. %D January 1986 %X Programs that claim to be implementations of METAFONT84 are supposed to be able to process the test routine contained in this report, producing the outputs contained in this report. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1095/CS-TR-86-1095.pdf %R CS-TR-86-1096 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A model-theoretic approach to updating logical databases %A Wilkins, Marianne Winslett %D January 1986 %X We show that it is natural to extend the concept of database updates to encompass databases with incomplete information. Our approach embeds the incomplete database and the updates in the language of first-order logic, which we believe has strong advantages over relational tables and traditional data manipulation languages in the incomplete information situation. We present semantics for our update operators, and also provide an efficient algorithm to perform the operations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1096/CS-TR-86-1096.pdf %R CS-TR-86-1097 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T TEXware. %A Knuth, Donald E. %D April 1986 %X This report documents four TEX utility programs: The POOLtype processor (Version 2, July 1983), The TFtoPL processor (Version 2.5, September 1985), The PLtoTF processor (Version 2.3, August 1985), and The DVItype processor (Version 2.8, August 1984). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1097/CS-TR-86-1097.pdf %R CS-TR-86-1100 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Model theorem proving %A Abadi, Martin %A Manna, Z ohar %D May 1986 %X We describe resolution proof systems for several modal logics. First we present the propositional versions of the systems and prove their completeness. The first-order resolution rule for classical logic is then modified to handle quantifiers directly. This new resolution rule enables us to extend our propositional systems to complete first-order systems. The systems for the different modal logics are closely related. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1100/CS-TR-86-1100.pdf %R CS-TR-86-1102 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Data independent recursion in deductive databases %A Naughton, Jeffrey F. %D February 1986 %X Some recursive definitions in deductive database systems can be replaced by equivalent nonrecursive definitions. In this paper we give a linear-time algorithm that detects many such definitions, and specify a useful subset of recursive definitions for which the algorithm is complete. It is unlikely that our algorithm can be extended significantly, as recent results by Gaifman [5] and Vardi [19] show that the general problem is undecidable. We consider two types of initialization of the recursively defined relation: arbitrary initialization, and initialization by a given nonrecursive rule. This extends earlier work by Minker and Nicolas [10], and by Ioannidis [7], and is related to bounded tableau results by Sagiv [14]. Even if there is no equivalent equivalent nonrecursive definition, a modification of our algorithm can be used to optimize a recursive definition and improve the efficiency of the compiled evaluation algorithms proposed in Henschen and Naqvi [6] and in Bancilhon et al. [3]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1102/CS-TR-86-1102.pdf %R CS-TR-86-1104 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T CS229b: a survey of AI classnotes for Winter 84-85 %A Subramanian, Devika %D April 1986 %X These are the compiled classnotes for the course CS229b offered in Winter 1985. This course was an intensive 10 week survey intended as preparation for the 1984-85 qualifying examination in Artificial Intelligence at Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1104/CS-TR-86-1104.pdf %R CS-TR-86-1105 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Software-controlled caches in the VMP multiprocessor %A Cheriton, David R. %A Slavenburg, Gert A. %A Boyle, Patrick D. %D March 1986 %X VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory. Each processor has a synchronous, virtually addressed, single master connection to its cache, providing very high memory bandwidth. An unusually large cache page size and fast sequential memory copy hardware make it feasible for cache misses to be handled in software, analogously to the handling of virtual memory page faults. Hardware support for cache consistency is limited to a simple state machine that monitors the bus and interrupts the processor when a cache consistency action is required. In this paper, we show how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost. We also describe simple solutions to the consistency problems associated with virtually addressed caches. Simulation results indicate that the design achieves good performance providing data contention is not excessive. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1105/CS-TR-86-1105.pdf %R CS-TR-86-1106 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A timely resolution %A Abadi, Martin %A Manna, Z ohar %D April 1986 %X We present a novel proof system R for First-order (Linear) Temporal Logic. This system extends our Propositional Temporal Logic proof system ([AM]). The system R is based on nonclausal resolution; proofs are natural and generally short. Special quantifier rules, unification techniques, and a resolution rule are introduced. We relate R to other proof systems for First-order Temporal Logic and discuss completeness issues. The system R should be useful as a tool for such tasks as verification of concurrent programs and reasoning about hardware devices. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1106/CS-TR-86-1106.pdf %R CS-TR-86-1109 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A proof editor for propositional temporal logic %A Casley, Ross %D May 1986 %X This report describes PTL, a program to assist in constructing proofs in propositional logic extended by the operators $\Box$ ("always"), $\Diamond$ ("eventually") and $\bigcirc$ ("at the next step"). This is called propositional temporal logic and is one of two systems of logic presented by Abadi and Manna in [1]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1109/CS-TR-86-1109.pdf %R CS-TR-86-1114 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Optimizing function-free recursive inference rules %A Naughton, Jeffrey F. %D May 1986 %X Recursive inference rules arise in recursive definitions in logic programming systems and in database systems with recursive query languages. Let D be a recursive definition of a relation t. We say that D is minimal if for any predicate p in a recursive rule in D, p must appear in a recursive rule in any definition of t. We show that testing for minimality is in general undecidable. However, we do present an efficient algorithm for a useful class of recursive rules, and show how to use it to transform a recursive definition to a minimal recursive definition. Evaluating the optimized definition will avoid redundant computation without the overhead of caching intermediate results and run-time checking for duplicate goals. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1114/CS-TR-86-1114.pdf %R CS-TR-86-1115 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The heuristic refinement method for deriving solution structures of proteins %A Buchanan, Bruce G. %A Hayes-Roth, Barbara %A Lichtarge, Olivier %A Altman, Russ %A Brinkley, James %A Hewett, Micheal %A Cornelius, Craig %A Duncan, Bruce %A Jardetzky, Oleg %D March 1986 %X A new method is presented for determining structures of proteins in solution. The method uses constraints inferred from analytic data to successively refine both the locations for parts of the structure and the levels of detail for describing those parts. A computer program, called PROTEAN, which encodes this method, has been partially implemented and was used to derive structures for the lac-repressor headpiece from experimental data. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1115/CS-TR-86-1115.pdf %R CS-TR-86-1116 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Inductive knowledge acquisition for rule-based expert systems %A Fu, Li-Min %A Buchanan, Bruce G. %D October 1985 %X The RL program was developed to construct knowledge bases automatically in rule-based expert systems, primarily in MYCIN-like evidence-gathering systems where there is uncertainty about data as well as the strength of inference, and where rules are chained together or combined to infer complex hypotheses. This program comprises three subprograms: (1) a program that learns confirming rules, which employs a heuristic search commencing with the most general hypothesis; (2) a subprogram that learns rules containing intermediate concepts, which exploits the old partial knowledge or defines new intermediate concepts, based on heuristics; (3) a program that learns disconfirming rules, which is based on the expert's heuristics to formulate disconfirming rules. RL's validity has been demonstrated with a performance program that diagnoses the causes of jaundice. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1116/CS-TR-86-1116.pdf %R CS-TR-86-1117 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Empirical Study of Distributed Application Performance %A Lantz, Keith %A Nowicki, William %A Theimer, Marvin %D October 1985 %X A major reason for the rarity of distributed applications, despite the proliferation of networks, is the sensitivity of their performance to various aspects of the network environment. We demonstrate that distributed applications can run faster than local ones, using common hardware. We also show that the primary factors affecting performance are, in approximate order of importance: speed of the user's workstation, speed of the remote host (if any), and the high-level (above the transport level) protocols used. In particular, the use of batching pipelining, and structure in high-level protocols reduces the degradation often experienced between different bandwidth networks. Less significant, but still noticeable improvements result from proper design and implementation of underlying transport protocols. Ultimately, with proper application of these techniques, network bandwidth is rendered virtually insignificant. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1117/CS-TR-86-1117.pdf %R CS-TR-86-1118 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Applications of Parallel Scheduling to Perfect Graphs %A Helmbold, David %A Mayr, Ernst %D June 1986 %X We combine a parallel algorithm for the two processor scheduling problem, which runs in polylog time on a polynomial number of processors, with an algorithm to find transitive orientations of graphs where they exist. Both algorithms together solve the maximum clique problem and the minimum coloring problem for comparability graphs, and the maximum matching problem for co-comparability graphs. These parallel algorithms can also be used to identify permutation graphs and interval graphs, important subclasses of perfect graphs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1118/CS-TR-86-1118.pdf %R CS-TR-86-1119 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Simulation of an Ultracomputer with Several 'Hot Spots' %A Rosenblum, David S. %A Mayr, Ernst W. %D June 1986 %X This report describes the design and results of a time-driven simulation of an Ultracomputer-like multiprocessor in the presence of several "hot spots," or memory modules which are frequent targets of requests. Such hot spots exist during execution of parallel programs in which the several threads of control synchronize through manipulation of a small number of shared variables. The simulated system is comprised of N processing elements (PEs) and N shared memory modules connected by an N x N buffered, packet-switched Omega network. The simulator was designed to accept a wide variety of system configurations to enable observation of many different characteristics of the system behavior. We present the results of four experiments: (1) General simulation of several 16-PE configurations, (2) General simulation of several 512-PE configurations, (3) Determination of critical queue lengths as a function of request rate (512 PEs) and (4) Determination of the effect of hot spot spacing on system performance (512 PEs). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1119/CS-TR-86-1119.pdf %R CS-TR-86-1123 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Blackboard Systems %A Nii, H. Penny %D June 1986 %X The first blackboard system was the HEARSAY-II speech understanding system that evolved between 1971 and 1976. Subsequently, many systems have been built that have similar system organizations and run-time behavior. The objectives of this document are: (1) to define what is meant by "blackboard systems," and (2) to show the richness and diversity of blackboard system designs. The article begins with a discussion of the underlying concept behind all blackboard systems, the blackboard model of problem solving. In order to bridge the gap between a model and working systems, the blackboard framework, an extension of the basic blackboard model is introduced, including a detailed description of the model's components and their behavior. A model does not come into existence on its own and is usually an abstraction of many examples. In section 2, the history of ideas is traced and the designs of some applications systems that helped shape the blackboard model are detailed. We then describe and contrast existing blackboard systems. Blackboard systems can generally be divided into two categories; application and skeletal systems. In application systems the blackboard system components are integrated with the domain knowledge required to solve the problem at hand. Skeletal systems are devoid of domain knowledge, and, as the name implies, consist of the essential system components from which application systems can be built by the addition of knowledge and the specification of control (i.e. meta-knowledge). Application systems will be discussed in Section 3, and skeletal systems will be discussed elsewhere. In Section 3.6, we summarize the features of the applications systems and in Section 4 present the author's perspective on the utility of the blackboard approach to problem solving and knowledge engineering. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1123/CS-TR-86-1123.pdf %R CS-TR-86-1124 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficient Matching Algorithms for the SOARIOPSS Production System %A Scales, Daniel J. %D June 1986 %X SOAR is a problem-solving and learning program intended to exhibit intelligent behavior. SOAR uses a modified form of the OPS5 production system for storage of and access to long-term knowledge. As with most programs which use production system systems, the match phase of SOAR's production system dominates all other SOAR processing. This paper describes the results of an investigation of various ways of speeding up the matching process in SOAR through additions and changes to the OPS5 matching algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1124/CS-TR-86-1124.pdf %R CS-TR-86-1125 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The CAOS System %A Schoen, Eric %D March 1986 %X The CAOS system is a framework designed to facilitate the development of highly concurrent real-time signal interpretation applications. It explores the potential of multiprocessor architectures to improve the performance of expert systems in the domain of signal interpretation. CAOS is implemented in Lisp on a (simulated) collection of processor-memory sites, linked by a high-speed communiications subsystem. The "virtual machine" on which it depends provides remote evaluation and packet-based message exchange between processes, using virtual circuits known as streams. To this presentation layer, CAOS adds (1) a flexible process scheduler, and (2) an object-centered notion of agents, dynamically-instantiable entities which model interpreted signal features. This report documents the principal ideas, programming model, and implementation of CAOS. A model of real-time signal interpretation, based on replicated "abstraction" pipelines, is presented. For some applications, this model offers a means by which large numbers of processors may be utilized without introducing synchronization-necessitated software bottlenecks. The report concludes with a description of the performance of a large CAOS application over various sizes of multiprocessor configurations. Lessons about problem decomposition grain size, global problem solving control strategy, and appropriate service provided to CAOS by the underlying architecture are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1125/CS-TR-86-1125.pdf %R CS-TR-86-1126 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T CAREL: A Visible Distributed Lisp %A Davies, Byron %D March 1986 %X CAREL is a Lisp implementation designed to be a high-level interactive systems programming language for a distributed-memory multiprocessor. CAREL insulates the user from the machine language of the multiprocessor architecture, but still makes it possible for the user to specify explicitly the assignment of tasks to processors in the multiprocessor network. CAREL has been implemented to run on a TI Explorer Lisp machine using Stanford's CARE multiprocessor simulator. CAREL is more than a language: real-time graphical displays provided by the CARE simulator make CAREL a novel graphical programming environment for distributed computing. CAREL enables the user to create programs interactively and then watch them run on a network of simulated processors. As a CAREL program executes, the CARE simulator graphically displays the activity of the processors and the transmission of data through the network. Using this capability, CAREL has demonstrated its utility as an educational tool for multiprocessor computing. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1126/CS-TR-86-1126.pdf %R CS-TR-86-1129 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Beta Operations: Efficient Implementation of a Primitive Parallel Operation %A Cohn, Evan R. %A Haddad, Ramsey W. %D August 1986 %X We will consider the primitive parallel operation of the Connection Machine, the Beta Operation. Let the imput size of the problem be N and output size M. We will show how to perforrn the Beta Operation on an N-node hypercube in O(log N + $log^2$ M) time. For a $\sqrt{N} x \sqrt{M}$ mesh-of-trees, we require O(log N + $\sqrt{M}$) time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1129/CS-TR-86-1129.pdf %R CS-TR-86-1131 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Processor Renaming in Asynchronous Environments %A Bar-Noy, Amotz %A Peleg, David %D September 1986 %X Fischer, Lynch and Paterson proved that in a completely asynchronous system "weak agreement" cannot be achieved even in the presence of a single "benign" fault. Following the direction proposed in Attiya, Bar-Noy, Dolev and Koller (Aug 1986), we demonstrate the interesting fact that some weaker forms of processor cooperation are still achievable in such a situation, and in fact, even in the presence of up to t < n/2 such faulty processors. In particular, we show that n processors, each having a distinct name taken from an unbounded ordered domain, can individually choose new distinct names from a space of size n + t (where n is an obvious lower bound). In case the new names are required also to preserve the original order, we give an algorithm in which the space of new names is of size ${2^t}(n - t + 1) - 1$, which is tight. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1131/CS-TR-86-1131.pdf %R CS-TR-86-1132 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Optimizing Datalog Programs %A Sagiv, Yehoshua %D March 1986 %X Datalog programs, i.e., Prolog programs without function symbols, are considered. It is assumed that a variable appearing in the head of a rule must also appear in the body of the rule. The input of a program is a set of ground atoms (which are given in addition to the program's rules) and, therefore, can be viewed as an assignment of relations to some of the program's predicates. Two programs are equivalent if they produce the same result for all possible assignments of relations to the extensional predicates (i.e., the predicates that do not appear as heads of rules). Two programs are uniformly equivalent if they produce the same result for all possible assignments of initial relations to all the predicates (i.e. both extensional and intentional). The equivalence problem for Datalog programs is known to be undecidable. It is shown that uniform equivalence is decidable, and an algorithm is given for minimizing a Datalog program under equivalence. A technique for removing parts of a program that are redundant under equivalence (but not under uniform equivalence) is developed. A procedure for testing uniform equivalence is also developed for the case in which the database satisfies some constraints. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1132/CS-TR-86-1132.pdf %R CS-TR-86-1134 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T UIO: A Uniform I/O System Interface for Distributed Systems %A Cheriton, David R. %D November 1986 %X A uniforrn I/O interface allows programs to be written relatively independent of specific I/O services and yet work with a wide variety of the I/O services available in a distributed environment. Ideally, the interface provides this uniform access without excessive complexity in the interface or loss of performance. However, a uniform interface does not arise from careful design of individual system interfaces alone; it requires explicit definition. In this paper, we describe the UIO (uniform I/O) system interface that has been used for the past five years in the V distributed operating systems, focusing on the key design issues. This interface provides several extensions beyond the I/O interface of UNIX, including support for record I/O, locking, atomic transactions and replications as well as attributes that indicate whether optional semantics and operations are available. We also describe our experience in using and implementing this interface with a variety of different I/O services plus the performance of both local and network I/O. We conclude that the UIO interface provides a uniform I/O system interface with significant functionality, wide applicability and no significant performance penalty. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1134/CS-TR-86-1134.pdf %R CS-TR-86-1136 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Experiment in Knowledge-based Signal Understanding Using Parallel Architectures %A Brown, Harold %A Schoen, Eric %A Delagi, Bruce %D October 1986 %X This report documents an experiment investigating the potential of a parallel computing architecture to enhance the performance of a knowledge-based signal understanding system. The experiment consisted of implementing and evaluating an application encoded in a parallel programming extension of Lisp and executing on a simulated multiprocessor system. The chosen application for the experiment was a knowledge-based system for interpreting pre-processed, passively acquired radar emissions from aircraft. The application was implemented in an experimental concurrent, asynchronous object-oriented framework. This framework, in turn, relied on the services provided by the underlying hardware system. The hardware system for the experiment was a simulation of various sized grids of processors with inter-processor communication via message-passing. The experiment investigated the effects of various high-level control strategies on the quality of the problem solution, the speedup of the overall system performance as a function of the number of processors in the grid, and some of the issues in implementing and debugging a knowledge-based system on a message-passing multiprocessor system. In this report we describe the software and (simulated) hardware components of the experiment and present the qualitative and quantitative experimental results. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1136/CS-TR-86-1136.pdf %R CS-TR-86-1137 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Leaf File Access Protocol %A Mogul, Jeffrey %D December 1986 %X Personal computers are superior to timesharing systems in many ways, but they are inferior in this respect: they make it harder for users to share files. A local area network provides a substrate upon which file sharing can be built; one must also have a protocol for sharing files. This report describes Leaf, one of the first protocols to allow remote access to files. Leaf is a remote file access protocol rather than a file transfer protocol. Unlike a file transfer protocol, which must create a complete copy of a file, a file access protocol provides random access directly to the file itself. This promotes sharing because it allows simultaneous access to a file by several remote users, and because it avoids the creation of new copies and the associated consistency-maintenance problem. The protocol described in this report is nearly obsolete. It is interesting for historical reasons, primarily because it was perhaps the first non-proprietary remote file access protocol actually implemented, and also because it serves as a case study in practical protocol design. The specification of Leaf is included as an appendix; it has not been widely available outside of Stanford. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1137/CS-TR-86-1137.pdf %R CS-TR-86-1139 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Local Shape from Specularity %A Healey, Glenn %A Binford, Thomas O. %D June 1986 %X We show that highlights in images of objects with specularly reflecting surfaces provide significant information about the surfaces which generate them. A brief survey is given of specular reflectance models which have been used in computer vision and graphics. For our work, we adopt the Torrance-Sparrow specular model which, unlike most previous models, considers the underlying physics of specular reflection from rough surfaces. From this model we derive powerful relationships between the properties of a specular feature in an image and local properties of the corresponding surface. We show how this analysis can be used for both prediction and interpretation in a vision system. A shape from specularity system has been implemented to test our approach. The performance of the system is demonstrated by careful experiments with specularly reflecting objects. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1139/CS-TR-86-1139.pdf %R CS-TR-86-1130 %Z Mon, 01 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Detecting Edges %A Nalwa, Vishvjit S. %A Binford, Thomas O. %D March 1986 %X An edge in an image corresponds to a discontinuity in the intensity surface of the underlying scene. It can be approximated by a piecewise straight curve composed of edgels, i.e., short, linear edge-elements, each characterized by a direction and a position. The approach to edgel-detection here, is to fit a series of one-dimensional surfaces to each window (kernel of the operator) and accept the surface-description which is adequate in the least squares sense and has the fewest parameters. (A one-dimensional surface is one which is constant along some direction.) The tanh is an adequate basis for the step-edge and its combinations are adequate for the roof-edge and the line-edge. The proposed method of step-edgel detection is robust with respect to noise; for (step-size/${\sigma}_{noise}$) >= 2.5, it has subpixel position localization (${\sigma}_{position}$ < 1/3) and an angular localization better than $10^\infty$; further, it is designed to be insensitive to smooth shading. These results are demonstrated by some simple analysis, statistical data and edgel-images. Also included is a comparison, of performance on a real image, with a typical operator (Difference-of-Gaussians). The results indicate that the proposed operator is superior with respect to detection, localization and resolution. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/86/1130/CS-TR-86-1130.pdf %R CS-TR-84-1004 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A computational theory of higher brain function %A Goldschlager, Leslie M. %D April 1984 %X The higher functions of the brain are believed to occur in the cortex. This region of the brain is modelled as a memory surface which performs both storage and computation. Concepts are modelled as patterns of activity on the memory surface, and the model explains how these patterns interact with one another to give the computations which the brain performs. The method of interaction can explain the formation of abstract concepts, association of ideas and train of thought. It is shown that creativity, self, consciousness and free will are explainable within the same framework. A theory of sleep is presented which is consistent with the model. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1004/CS-TR-84-1004.pdf %R CS-TR-84-1005 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Adequate proof principles for invariance and liveness properties of concurrent programs %A Manna, Z ohar %A Pnueli, Amir %D May 1984 %X This paper presents proof principles for establishing invariance and liveness properties of concurrent programs. Invariance properties are established by systematically checking that they are preserved by every atomic instruction in the program. The methods for establishing liveness properties are based on 'well-founded asserations' and are applicable to both "just" and "fair" computations. These methods do not assume a decrease of the rank at each computation step. It is sufficient that there exists one process which decreases the rank when activated. Fairness then ensures that the program will eventually attain its goal. In the finite state case such proofs can be represented by diagrams. Several examples are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1005/CS-TR-84-1005.pdf %R CS-TR-84-1006 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T EKL - an interactive proof checker user's reference manual %A Ketonen, Jussi %A Weening, Joseph S. %D June 1984 %X EKL is an interactive proof checker and constructor. Its main goal is to facilitate the checking of mathematical proofs. Some of the special features of EKL are: * The language of EKL can be extended all the way to finite-order predicate logic with typed lambda-calculus. * Several proofs can be handled at the same time. * Metatheoretic reasoning allows formal extensions of the capabilities of EKL. * EKL is a programmable system. The MACLISP language is available to the user, and LISP functions can be written to create input to EKL, thereby allowing expression of proofs in an arbitrary input language. This document is a reference manual for EKL. Each of the sections discusses a major part of the language, beginning with an overview of that area, and proceeding to a detailed discussion of available features. To gain an acquaintance with EKL, it is recommended that you read only the introductory part of each section. EKL may be used both at the Stanford Artificial Intelligence Laboratory (SAIL) computer system, and on DEC TOPS-20 systems that support MACLISP. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1006/CS-TR-84-1006.pdf %R CS-TR-84-1007 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Queue-based multi-processing Lisp %A Gabriel, Richard P. %A McCarthy, John %D June 1984 %X This report presents a dialect of Lisp, called QLAMBDA, which supports multi-processing. Along with the definition of the dialect, the report presents programming examples and performance studies of some programs written in QLAMBDA. Unlike other proposed multi-processing Lisps, QLAMBDA provides only a few very powerful and intuitive primitives rather than a number of parallel variants of familiar constructs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1007/CS-TR-84-1007.pdf %R CS-TR-84-1009 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complexity of a top-down capture rule %A Sagiv, Yehoshua %A Ullman, Jeffrey D. %D July 1984 %X Capture rules were introduced in [U] as a method for planning the evaluation of a query expressed in first-order logic. We examine a capture rule that is substantiated by a simple top-down implementation of restricted Horn clause logic. A necessary and sufficient condition for the top-down algorithm to converge is shown. It is proved that, provided there is a bound on the number of arguments of predicates, the test can be performed in polynomial time; however, if the arity of predicates is made part of the input, then the problem of deciding whether the top-down algorithm converges is NP-hard. We then consider relaxation of some of our constraints on the form of the logic, showing that success of the top-down algorithm can still be tested in polynomial time if the number of arguments is limited and in exponential time if not. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1009/CS-TR-84-1009.pdf %R CS-TR-84-1012 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T TABLOG: the deductive-tableau programming language %A Malachi, Yonathan %A Manna, Z ohar %A Waldinger, Richard %D June 1984 %X TABLOG (Tableau Logic Programming Language) is a language based on first-order predicate logic with equality that combines functional and logic programming. TABLOG incorporates advantages of LISP and PROLOG. A program in TABLOG is a list of formulas in a first-order logic (including equality, negation, and equivalence) that is more general and more expressive than PROLOG's Horn clauses. Whereas PROLOG programs must be relational, TABLOG programs may define either relations or functions. While LISP programs yield results of a computation by returning a single output value, TABLOG programs can be relations and can produce several results simultaneously through their arguments. TABLOG employs the Manna-Waldinger deductive-tableau proof system as an interpreter in the same way that PROLOG uses a resolution-based proof system. Unification is used by TABLOG to match a call with a line in the program and to bind arguments. The basic rules of deduction used for computing are nonclausal resolution and rewriting by means of equality and equivalence. A pilot interpreter for the language has been implemented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1012/CS-TR-84-1012.pdf %R CS-TR-84-1014 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A P-complete problem and approximations to it %A Anderson, Richard %A Mayr, Ernst W. %D September 1984 %X The P-complete problem that we will consider is the High Degree Subgraph Problem. This problem is: given a graph G = (V,E) and an integer k, find the maximum induced subgraph of G that has all nodes of degree at least k. After showing that this problem is P-complete, we will discuss two approaches to finding approximate solutions to it in NC. We will give a variant of the problem that is also P-complete that can be approximated to within a factor of c in NC, for any c < 1/2, but cannot be approximated by a factor of better than 1/2 unless P = NC. We will also give an algorithm that finds a subgraph with moderately high minimum degree. This algorithm exhibits an interesting relationship between its performance and the time it takes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1014/CS-TR-84-1014.pdf %R CS-TR-84-1018 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Classification problem solving %A Clancey, William J. %D July 1984 %X A broad range of heuristic programs--embracing forms of diagnosis. catalog selection, and skeletal planning--accomplish a kind of well-structured problem solving called classification. These programs have a characteristic inference structure that systematically relates data to a pre-enumerated set of solutions by abstraction, heuristic association, and refinement. This level of description specifies the knowledge needed to solve a problem, independent of its representation in a particular computer language. The classification problem-solving model provides a useful framework for recognizing and representing similar problems, for designing representation tools, and for understanding the problem-solving methods used by non-classification programs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1018/CS-TR-84-1018.pdf %R CS-TR-84-1023 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A method for managing evidential reasoning in a hierarchical hypothesis space %A Gordon, Jean %A Shortliffe, Edward H. %D September 1984 %X No abstract. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1023/CS-TR-84-1023.pdf %R CS-TR-84-1024 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T How to share memory in a distributed system %A Upfal, Eli %A Wigderson, Avi %D October 1984 %X We study the power of shared-memory in models of parallel computation. We describe a novel distributed data structure that eliminates the need for shared memory without significantly increasing the run time of the parallel computation. More specifically we show how a complete network of processors can deterministically simulate one PRAM step in O(log n ${(loglog n)}^2$) time, when both models use n processors, and the size of the PRAM's shared memory is polynomial in n. (The best previously known upper bound was the trivial O(n)). We also establish that this upper bound is nearly optimal. We prove that an on-line simulation of T PRAM steps by a complete network of processors requires $\Omega (T log n/loglog n)$ time. A simple consequence of the upper bound is that an Ultracomputer (the only currently feasible general purpose parallel machine), can simulate one step of a PRAM (the most convenient parallel model to program), in O(${(log n loglog n)}^2$) steps. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1024/CS-TR-84-1024.pdf %R CS-TR-84-1025 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast scheduling algorithms on parallel computers %A Helmbold, David %A Mayr, Ernst %D November 1984 %X With the introduction of parallel processing, scheduling problems have generated great interest. Although there are good sequential algorithms for many scheduling problems, there are few fast parallel scheduling algorithms. In this paper we present several good scheduling algorithms that run on EREW PRAMS. For the unit time execution case, we have algorithms that will schedule n jobs with intree or outtree precedence constraints in O(log n) time. The intree algorithm requires $n^3$ processors, and the outtree algorithm requires $n^4$ processors. Another type of scheduling problem is list scheduling, where a list of n jobs with integer execution times is to be scheduled in list order. We show that the general list scheduling problem on two identical processors is polynomial-time complete, and therefore is not likely to have a fast parallel algorithm. However, when the length of the (binary representation of the) execution times is bounded by O($log^c$ n) there is an NC algorithm using $n^4$ processors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1025/CS-TR-84-1025.pdf %R CS-TR-84-1027 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A torture test for TEX %A Knuth, Donald E. %D November 1984 %X Programs that claim to be implementations of TEX82 are supposed to be able to process the test routine contained in this report, producing the outputs contained in this report. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1027/CS-TR-84-1027.pdf %R CS-TR-84-1028 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel graph algorithms %A Hochschild, Peter H. %A Mayr, Ernst W. %A Siegel, Alan R. %D December 1984 %X This paper presents new paradigms to solve efficiently a variety of graph problems on parallel machines. These paradigms make it possible to discover and exploit the "parallelism" inherent in many classical graph problems. We abandon attempts to force sequential algorithms into parallel environments for such attempts usually result in transforming a good uniprocessor algorithm into a hopelessly greedy parallel algorithm. We show that by employing more local computation and mild redundance, a variety of problems can be solved in a resource- and time-efficient manner on a variety of architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1028/CS-TR-84-1028.pdf %R CS-TR-84-1032 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Solving the Prisoner's Dilemma %A Genesereth, Michael R. %A Ginsberg, Matthew L. %A Rosenschein, Jeffrey S. %D November 1984 %X A framework is proposed for analyzing various types of rational interaction. We consider a variety of restrictions of participants' moves; each leads to a diferent characterization of rational behavior. Under an assumption of "common rationality," it is proven that participants will cooperate, rather than defect, in the Prisoner's Dilemma. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1032/CS-TR-84-1032.pdf %R CS-TR-84-1034 %Z Sat, 27 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T BB1: an architecture for blackboard systems that control, explain, and learn about their own behavior %A Hayes-Roth, Barbara %D December 1984 %X BB1 implements a domain-independent "blackboard control architecture" for Al systems that control, explain, and learn about their own problem-solving behavior. A BB1 system comprises: a user-defined domain blackboard, a pre-defined control blackboard, user-defined domain and control knowledge sources, a few generic control knowledge sources, and a pre-defined basic control loop. The architecture's run time user interface provides capabilities for: displaying the blackboard, knowledge sources, and pending knowledge source actions, recommending an action for execution, explaining a recommendation, accepting a user's override, executing a designated action, and running without user intervention. BB1 supports a variety of control behavior ranging from execution of pre-defined control procedures to dynamic construction and modification of complex control plans during problem solving. It explains problem-solving actions by showing their roles in the underlying control plan. It learns new control heuristics from experience, applies them within the current problem-solving session, and uses them to construct new control plans in subsequent sessions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1034/CS-TR-84-1034.pdf %R CS-TR-84-1003 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallelism and greedy algorithms %A Anderson, Richard %A Mayr, Ernst %D April 1984 %X A number of greedy algorithms are examined and are shown to be probably inherently sequential. Greedy algorithms are presented for finding a maximal path, for finding a maximal set of disjoint paths in a layered dag, and for finding the largest induced subgraph of a graph that has all vertices of degree at least k. It is shown that for all of these algorithms, the problem of determining if a given node is in the solution set of the algorithm is P-complete. This means that it is unlikely that these sequential algorithms can be sped up significantly using parallelism. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/84/1003/CS-TR-84-1003.pdf %R CS-TR-83-962 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography of Stanford Computer Science reports, 1963-1983 %A Berg, Kathryn A. %D March 1983 %X This report lists, in chronological order, all reports published by the Stanford Computer Science Department since 1963. Each report is identified by a Computer Science number, author's name, title, National Technical Information Service (NTIS) retrieval number (i.e., AD-XXXXXX), date, and number of pages. If an NTIS number is not given, it means that the report is probably not available from NTIS. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/962/CS-TR-83-962.pdf %R CS-TR-83-963 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A hardware semantics based on temporal intervals %A Halpern, Joseph %A Manna, Z ohar %A Moszkowski, Ben %D March 1983 %X We present an interval-based temporal logic that permits the rigorous specification of a variety of hardware components and facilitates describing properties such as correctness of implementation. Conceptual levels of circuit operation ranging from detailed quantitative timing and signal propagation up to functional behavior are integrated in a unified way. After giving some motivation for reasoning about hardware, we present the propositional and first-order syntax and semantics of the temporal logic. In addition we illustrate techniques for describing signal transitions as well as for formally specifying and comparing a number of delay models. Throughout the discussion, the formalism provides a means for examining such concepts as device equivalence and internal states. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/963/CS-TR-83-963.pdf %R CS-TR-83-964 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Proving precedence properties: the temporal way %A Manna, Z ohar %A Pnueli, Amir %D April 1983 %X This paper explores the three important classes of temporal properties of concurrent programs: invariance, liveness and precedence. It presents the first methodological approach to the precedence properties, while providing a review of the invariance and liveness properties. The approach is based on the 'unless' operator, which is a weak version of the 'until' operator. For each class of properties, we present a single complete proof principle. Finally, we show that the properties of each class are decidable over finite state programs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/964/CS-TR-83-964.pdf %R CS-TR-83-965 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An approach to type design and text composition in Indian scripts %A Ghosh, Pijush K. %D April 1983 %X The knowledge of letters exerts a dual enchantment. When it uncovers the relationships between a series of arbitrary symbols and the sounds of speech, it fills us with joy. For others the visible expression of the letters, their graphical forms, their history and their development become fascinating. The advent of digital information technology has opened new vistas in the concept of letter forms. Unfortunately the graphics industry in India has remained almost unaffected by these technological advances, especially in the field of type design and text composition. This report strives to demonstrate how to use various tools and techniques, so that the new technology can cope with the plurality of Indian scripts. To start with all you need to know is the basic shapes of the letters of the Roman alphabet and the sounds they represent. With this slender thread of knowledge an enjoyable study of letter design and text composition in Indian scripts can begin. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/965/CS-TR-83-965.pdf %R CS-TR-83-966 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A formal approach to lettershape description for type design %A Ghosh, Pijush K. %A Bigelow, Charles A. %D May 1983 %X This report is designed to explore some analytic means of specifying lettershapes. Computer representation and analysis of lettershape have made use of two diametrically different approaches, one representing a shape by its boundary, the other by its skeleton or medial axis. Generally speaking, the boundary representation is conceptually simpler to the designer, but the skeletal representation provides more insight into the "piecedness" of the shape. Donald Knuth's METAFONT is one of the sophisticated lettering design systems which has basically adopted the medial axis approach. Moreover, the METAFONT system has introduced the idea of metafont-description of a letter, i.e., to give a rigorous definition of the shape of a letter in such a way that many styles are obtained from a single definition by changing only a few user-defined parameters. That is why we have considered the METAFONT system as our starting point and have shown how we can arrive at the definition of a formal language for specifying lettershapes. We have also introduced a simple mathematical model for decomposing a letter into its constituent elements. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/966/CS-TR-83-966.pdf %R CS-TR-83-967 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Verification of concurrent programs: a temporal proof system %A Manna, Z ohar %A Pnueli, Amir %D June 1983 %X A proof system based on temporal logic is presented for proving properties of concurrent programs based on the shared-variables computation model. The system consists of three parts: the general uninterpreted part, the domain dependent part and the program dependent part. In the general part we give a complete proof system for first-order temporal logic with detailed proofs of useful theorems. This logic enables reasoning about general time sequences. The domain dependent part characterizes the special properties of the domain over which the program operates. The program dependent part introduces program axioms which restrict the time sequences considered to be execution sequences of a given program. The utility of the full system is demonstrated by proving invariance, liveness and precedence properties of several concurrent programs. Derived proof principles for these classes of properties are obtained and lead to a compact representation of proofs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/967/CS-TR-83-967.pdf %R CS-TR-83-969 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reasoning in interval temporal logic %A Moszkowski, Ben %A Manna, Z ohar %D July 1983 %X Predicate logic is a powerful and general descriptive formalism with a long history of development. However, since the logic's underlying semantics have no notion of time, statements such as "I increases by 2" cannot be directly expressed. We discuss interval temporal logic (ITL), a formalism that augments standard predicate logic with operators for time-dependent concepts. Our earlier work used ITL to specify and reason about hardware. In this paper we show how ITL can also directly capture various control structures found in conventional programming languages. Constructs are given for treating assignment, iteration, sequential and parallel computations and scoping. The techniques used permit specification and reasoning about such algorithms as concurrent Quicksort. We compare ITL with the logic-based programming languages Lucid and Prolog. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/969/CS-TR-83-969.pdf %R CS-TR-83-971 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Letterform design systems %A Ruggles, Lynn %D April 1983 %X The design of letterforms requires a skilled hand, an eye for fine detail and an understanding of the letterforms themselves. This work has traditionally been done by experienced artisans, but in the last fifteen years there have been attempts to integrate the design process with the use of computers in order to create digital type forms. The use of design systems for the creation of these digital forms has led to an analysis of the way type designs are created by type designers. Their methods have been integrated into a variety of systems for creating digital forms. This paper describes these design systems and discusses the relevant issues for the success of the systems that exist and are used today. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/971/CS-TR-83-971.pdf %R CS-TR-83-972 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Experience with a regular expression compiler %A Karlin, Anna R. %A Trickey, Howard W. %A Ullman, Jeffrey D. %D June 1983 %X The language of regular expressions is a useful one for specifying certain sequebtial processes at a very high level. They allow easy modification of designs for circuits, like controllers, that are described by patterns of events they must recognize and the responses they must make to those patterns. This paper discusses the compilation of such expressions into reasonably compact layouts. The translation of regular expressions into nondeterministic automata by two different methods is discussed, along with the advantages of each method. A major part of the compilation problem is selection of good state codes for the nondeterministic automata; one successful strategy is explained in the paper. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/972/CS-TR-83-972.pdf %R CS-TR-83-973 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The distributed V kernel and its performance for diskless workstations %A Cheriton, David R. %A Z waenepoel, Willy %D July 1983 %X The distributed V kernel is a message-oriented kernel that provides uniform local and network interprocess communication. It is primarily being used in an environment of diskless workstations connected by a high-speed local network to a set of file servers. We describe a performance evaluation of the kernel, with particular emphasis on the cost of network file access. Our results show that over a local network: 1. Diskless workstations can access remole files with minimal performance penalty. 2. The V message facility can be used to access remote files at comparable cost to any well-tuned specialized file access protocol. We conclude that it is feasible to build a distributed system with all network communication using the V message facility even when most of the network nodes have no secondary storage. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/973/CS-TR-83-973.pdf %R CS-TR-83-974 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Chinese Meta-Font %A Hobby, John %A Guoan, Gu %D July 1983 %X METAFONT is Donald E. Knuth's system for alphabet design. The system allows an entire family of fonts or "meta-fonts" to be specified precisely and mathematically so that it can be produced in different sizes and styles for different raster devices. We present a new technique for defining Chinese characters hierarchically with METAFONT. We define METAFONT subroutines for commonly used portions of strokes and then combine some of these into routines for drawing complete strokes. Parameters describe the skeletons of the strokes and the stroke routines are carefully designed to transform themselves appropriately. This allows us to handle all of the basic strokes with only 14 different routines. The stroke routines in turn are used to build up groups of strokes and radicals. Special routines for positioning control points ensure that the strokes will join properly in a variety of different styles. The radical routines are parameterized to allow them to be placed at different locations in the typeface and to allow for adjusting their size and shape. Key points are positioned relative to the bounding box for the radical, and the special positioning routines find other points that must be passed to the stroke routines. We use this method to design high quality Song style characters. Global parameters control the style, and we show how these can be used to create Song and Long Song from the same designs. Other settings can produce other familiar styles or even new styles. We show how it is possible to create completely different styles, such as Bold style, merely by substituting different stroke routines. The global parameters can be used to augment simple scaling by altering stroke width and other details to account for changes in size. We can adjust stroke widths to help even out the overall darkness of the characters. We also show how it is possible to experiment with new ideas such as adjusting character widths individually. While many of our characters are based on existing designs, the stroke routines facilitate the design of new characters without the need to refer to detailed drawings. The skeletal parameters and special positioning routines make it easy to position the strokes properly. In our previous paper, in contrast to this, we parameterized the strokes according to their boundaries and copied an existing design. The previous approach made it very difficult to create different styles with the same METAFONT program. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/974/CS-TR-83-974.pdf %R CS-TR-83-980 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The WEB system of structured documentation %A Knuth, Donald E. %D September 1983 %X This memo describes how to write programs in the WEB language (Version 2.3, September 1983); and it also includes the full WEB documentation for WEAVE and TANGLE, the programs that read WEB input and produce TEX and PASCAL output, respectively. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/980/CS-TR-83-980.pdf %R CS-TR-83-985 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T First grade TEX: a beginner's TEX manual %A Samuel, Arthur L. %D November 1983 %X This is an introductory ready-reference TEX82 manual for the beginner who would like to do First Grade TEX work. Only the most basic features of the TEX system are discussed in detail. Other features are summarized in an appendix and references are given to the more complete documentation available elsewhere. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/985/CS-TR-83-985.pdf %R CS-TR-83-989 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem-solving seminar %A Knuth, Donald E. %A Weening, Joseph S. %D December 1983 %X This report contains edited transcripts of the discussions held in Stanford's course CS 204, Problem Seminar, during autumn quarter 1981. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms were touched on during the discussions, these notes may be of interest to graduate students of computer science at other universities, as well as to their professors and to professional people in the "real world." The present report is the fourth in a series of such transcripts, continuing the tradition established in CS606 (Michael J. Clancy, 1977), CS707 (Chris Van Wyk, 1979), and CS863 (Allan A. Miller, 1981). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/989/CS-TR-83-989.pdf %R CS-TR-83-990 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem-solving seminar %A Hobby, John D. %A Knuth, Donald E. %D December 1983 %X This report contains edited transcripts of the discussions held in Stanford's course CS204, Problem Seminar, during autumn quarter 1982. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms were touched on during the discussions, these notes may be of interest to graduate students of computer science at other universities, as well as to their professors and to professional people in the "real world." The present report is the fifth in a series of such transcripts, continuing the tradition established in STAN-CS-77-606 (Michael J. Clancy, 1977), STAN-CS-79-707 (Chris Van Wyk, 1979), STAN-CS-81-863 (Allan A. Miller, 1981), STAN-CS-83-989 (Joseph S. Weening, 1983). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/990/CS-TR-83-990.pdf %R CS-TR-83-991 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel algorithms for arithmetics, irreducibility and factoring of GFq-polynomials %A Morgensteren, Moshe %A Shamir, Eli %D December 1983 %X A new algorithm for testing irreducibility of polynomials over finite fields without gcd computations makes it possible to devise efficient parallel algorithms for polynomial factorization. We also study the probability that a random polynomial over a finite field has no factors of small degree. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/991/CS-TR-83-991.pdf %R CS-TR-83-992 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The language of an interactive proof checker %A Ketonen, Jussi %A Weening, Joseph S. %D December 1983 %X We describe the underlying language for EKL, an interactive theorem-proving system currently under development at the Stanford Artificial Intelligence Laboratory. Some of the reasons for its development as well as its mathematical properties are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/992/CS-TR-83-992.pdf %R CS-TR-83-994 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sorting by recursive partitioning %A Chapiro, Daniel M. %D December 1983 %X We present a new O(n lg lg n) time sort algorithm that is more robust than O(n) distribution sorting algorithms. The algorithm uses a recursive partition-concatenate approach, partitioning each set into a variable number of subsets using information gathered dynamically during execution. Sequences are partitioned using statistical information computed during the sort for each sequence. Space complexity is O(n) and is independent from the order and distributlon of the data. lf the data is originally in a list, only O($\sqrt{n}$) extra space is necessary. The algorithm is insensitive to the initial ordering of the data, and it is much less sensitive to the distribution of the values of the sorting keys than distribution sorting algorithms. Its worst-case time is O(n lg lg n) across all distributions that satisfy a new "fractalness" criterion. This condition, which is sufficient but not necessary, is satisfied by any set with bounded length keys and bounded repetition of each key. If this condition is not satisfied, its worst case performance degrades gracefully to O(n lg n). In practice, this occurs when the density of the distribution over $\Omega$(n) of the keys is a fractal curve (for sets of numbers whose values are bounded), or when the distribution has very heavy tails with arbitrarily long keys (for sets of numbers whose precision is bounded). In some preliminary tests, it was faster than Quicksort for sets of more than 150 elements. The algorithm is practical, works basically "in place", can be easily implemented and is particularly well suited both for parallel processing and for external sorting. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/994/CS-TR-83-994.pdf %R CS-TR-83-995 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The advantages of abstract control knowledge in expert system design %A Clancey, William J. %D November 1983 %X A poorly designed knowledge base can be as cryptic as an arbitrary program and just as difficult to maintain. Representing control knowledge abstractly, separately from domain facts and relations, makes the design more transparent and explainable. A body of abstract control knowledge provides a generic framework for constructing knowledge bases for related problems in other domains and also provides a useful starting point for studying the nature of strategies. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/995/CS-TR-83-995.pdf %R CS-TR-83-996 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Strategic explanations for a diagnostic consultation system %A Hasling, Diane Warner %A Clancey, William J. %A Rennels, Glenn %D November 1983 %X This paper examines the problem of automatic explanation of reasoning, especially as it relates to expert systems. By explanation we mean the ability of a program to discuss what it is doing in some understandable way. We first present a general framework in which to view explanation and review some of the research done in this area. We then focus on the explanation system for NEOMYCIN, a medical consultation program. A consultation program interactively helps a user to solve a problem. Our goal is to have NEOMYCIN explain its problem-solving strategies. An explanation of strategy describes the plan the program is using to reach a solution. Such an explanation is usually concrete, referring to aspects of the current problem situation. Abstract explanations articulate a general principle, which can be applied in different situations; such explanations are useful in teaching and in explaining by analogy. We describe the aspects of NEOMYCIN that make abstract strategic explanations possible--the representation of strategic knowledge explicitly and separately from domain knowledge--and demonstrate how this representation can be used to generate explanations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/996/CS-TR-83-996.pdf %R CS-TR-83-945 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Perseus: retrospective on a portable operating system %A Z waenepoel, Willy %A Lantz, Keith A. %D February 1983 %X We describe the operating system Perseus, developed as part of a study into the issues of computer communications and their impact on operating system and programming language design. Perseus was designed to be portable by virtue of its kernel-based structure and its implementation in Pascal. In particular, machine-dependent code is limited to the kernel and most operating systems functions are provided by server processes, running in user mode. Perseus was designed to evolve into a distributed operating system by virtue of its interprocess communication facilities, based on message-passing. This paper presents an overview of the system and gives an assessment of how far it satisfied its original goals. Specifically, we evaluate its interprocess communication facilities and kernel-based structure, followed by a discussion of portability. We close with a brief history of the project, pointing out major milestones and stumbling blocks along the way. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/83/945/CS-TR-83-945.pdf %R CS-TR-82-998 %Z Mon, 29 May 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Knowledge engineering: a daily activity on a hospital ward %A Mulsant, Benoit %A Servan-Schreiber, David %D September 1983 %X Two common barriers against the development and diffusion of Expert Systems in Medicine are the difficulty of design and the low level of acceptance. This paper reports on an original experience which entails potential solutions of these issues: the task of Knowledge Engineering is performed by medical students and residents on a hospital ward using a sophisticated Knowledge Acquisition System, EMYCIN. The Knowledge Engineering sessions are analysed in detail and a structured method is proposed. A transcript of a sample run of the resulting program is presented along with an evaluation of its performance, acceptance, educational potential and amount of endeavour required. The impact of the Knowledge Engineering process itself is then assessed both from the residents and the medical students standpoint. Finally, the possibility of generalizing the experiment is examined. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/998/CS-TR-82-998.pdf %R CS-TR-82-892 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An algorithm for reducing acyclic hypergraphs %A Kuper, Gabriel M. %D January 1982 %X This report is a description of an algorithm to compute efflciently the Graham reduction of an acyclic hypergraph with sacred nodes. To apply the algorithm we must already have a tree representation of the hypergraphs, and therefore it is useful when we have a fixed hypergraph and wish to compute Graham reductions many times, as we do in the Systern/U query interpretation algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/892/CS-TR-82-892.pdf %R CS-TR-82-895 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T GLISP users' manual %A Novak, Gordon S., Jr. %D January 1982 %X GLISP is a high-level, LISP-based language which is compiled into LISP. GLISP provides a powerful abstract datatype facility, allowing description and use of both LISP objects and objects in A.I. representation languages. GLISP language features include PASCAL-like control structures, infix expressions with operators which facilitate list manipulation, and reference to objects in PASCAL-like or English-like syntax. English-like definite reference to features of objects which are in the current computational context is allowed; definite references are understood and compiled relative to a knowledge base of object descriptions. Object-centered programming is supported; GLISP can substantially improve runtime performance of object-centered programs by optimized compilation of references to objects. This manual describes the GLISP language and use of GLISP within INTERLISP. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/895/CS-TR-82-895.pdf %R CS-TR-82-903 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Coloring maps and the Kowalski doctrine %A McCarthy, John %D April 1982 %X It is attractive to regard an algorithm as composed of the logic determining what the results are and the control determining how the result is obtained. Logic programmers like to regard programming as controlled deduction, and there have been several proposals for controlling the deduction expressed by a Prolog program and not always using Prolog's normal backtracking algorithm. The present note discusses a map coloring program proposed by Pereira and Porto and two coloring algorithms that can be regarded as control applied to its logic. However, the control mechanisms required go far beyond those that have been contemplated in the Prolog literature. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/903/CS-TR-82-903.pdf %R CS-TR-82-908 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Neomycin: reconfiguring a rule-based expert system for application to teaching %A Clancey, William J. %A Letsinger, Reed %D May 1982 %X NEOMYClN is a medical consultation system in which MYClN's knowledge base is reorganized and extended for use in GUIDON, a teaching program. The new system constitutes a psychological model for doing diagnosis designed to provide a basis for interpreting student behavior and teaching diagnostic strategy. The model separates out kinds of knowledge that are procedurally embedded in MYClN's rules and so inaccessible to the teaching program. The key idea is to represent explicitly and separately a domain-independent diagnostic strategy in the form of meta-rules, knowledge about the structure of the problem space, causal and data/hypothesis rules and world facts. As a psychological model, NEOMYCIN captures the forward-directed, "compiled associations" mode of reasoning that characterizes expert behavior. Collection and interpretation of data are focused by the "differential" or working memory of hypotheses. Moreover, the knowledge base is broadened so that GUIDON can teach a student when to consider a specific infectious dlsease and what competing hypotheses to consider, essentially the knowledge a human would need in order to use the MYCIN consultation system properly. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/908/CS-TR-82-908.pdf %R CS-TR-82-909 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Plan recognition strategies in student modeling: prediction and description %A London, Bob %A Clancey, William J. %D May 1982 %X This paper describes the student modeler of the GUIDON2 tutor, which understands plans by a dual search strategy. It first produces multiple predictions of student behavior by a model-driven simulation of the expert. Focused, data-driven searches then explain incongruities. By supplementing each other, these methods lead to an efficient and robust plan understander for a complex domain. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/909/CS-TR-82-909.pdf %R CS-TR-82-910 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Exploration of Teaching and Problem-Solving Strategies, 1979-1982 %A Clancey, William J. %A Buchanan, Bruce %D May 1982 %X This is the final report for Contract N-00014-79-C-0302, covering the period of 15 March 1979 through 14 March 1982. The goal of the project was to develop methods for representing teaching and problem-solving knowledge in computer-based tutorial systems. One focus of the work was formulation of principles for managing a case method tutorial dialogue; the other major focus was investigation of the use of a production rule representation for the subject material of a tutorial program. The main theme pursued by this research is that representing teaching and problem-solving knowledge separately and explicitly enhances the ability to build, modify and test complex tutorial programs. Two major computer programs were constructed. One was the tutorial program, GUIDON, which uses a set of explicit "discourse procedures" for carrying on a case method dialogue with a student. GUIDON uses the original MYCIN knowledge base as subject material, and as such, was an experiment in exploring the ways in which production rules can be used in tutoring. GUlDON's teaching knowledge is separate from and compatible with any knowledge base that is encoded in MYClN's rule language. Demonstrations of GUIDON were given for two medical and one engineering application. Thus, the generality of this kind of system goes beyond being able to teach about any problem in a "case library"--it also allows teaching expertise to be transferred and tested in multiple problem domains. The second major program is the consultation program, NEOMYCIN. This is a second generation system in which MYClN's knowledge has been reconfigured to make explicit distinctions that are important for teaching. Unlike MYCIN, the program uses the hypothesis-oriented approach and predominantly forward-directed reasoning. As such, NEOMYCIN is consistent with and extends psychological models of diagnostic problem-solving. The program differs from other knowledge-based Al systems in that reasoning is completely controlled by a set of explicit meta-rules. These meta-rules are domain independent and constitute the diagnostic procedure to be taught to students: the tasks of diagnosis and heuristics for attending to and confirming relevant diagnostic hypotheses. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/910/CS-TR-82-910.pdf %R CS-TR-82-911 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography of Stanford Computer Science reports, 1963-1982 %A Roberts, Barbara J. %A Marashian, Irris %D May 1982 %X This report lists, in chronological order, all reports published by the Stanford Computer Science Department since 1963. Each report is identified by a Computer Science number, author's name, title, National Technical Information Service (NTIS) retrieval number (i.e., AD-XXXXXX), date, and number of pages. If the NTIS number is not given, it means that the report is probably not available from NTIS. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/911/CS-TR-82-911.pdf %R CS-TR-82-912 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The implication and finite implication problems for typed template dependencies %A Varde, Moshe Y. %D May 1982 %X The class of typed template dependencies is a class of data dependencies that includes embedded multivalued and join dependencies. We show that the implication and the finite implication problems for this class are unsolvable. An immediate corollary is that this class has no formal system for finite implication. We also show how to construct a finite set of typed template dependencies whose implication and finite implication problems are unsolvable. The class of projected join dependencies is a proper subclass of the above class, and it generalizes slightly embedded join dependencies. It is shown that the implication and the finite implication problems for this class are also unsolvable. An immediate corollary is that this class has no universe-bounded formal system for either impllication or finite implication. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/912/CS-TR-82-912.pdf %R CS-TR-82-914 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Using string matching to compress Chinese characters %A Guoan, Gu %A Hobby, John %D May 1982 %X A new method for font compression is introduced and compared to existing methods. A very compact representation is achieved by using a variant of McCreight's string matching algorithm to compress the bounding contour. Results from an actual implementation are given showing the improvement over other methods and how this varies with resolution and character complexity. Compression ratios of up to 150 are achieved for Chinese characters. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/914/CS-TR-82-914.pdf %R CS-TR-82-915 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Verification of concurrent programs: proving eventualities by well-founded ranking %A Manna, Z ohar %A Pnueli, Amir %D May 1982 %X In this paper, one of a series on verification of concurrent programs, we present proof methods for establishing eventuality and until properties. The methods are based on well-founded ranking and are applicable to both "just" and "fair" computations. These methods do not assume a decrcase of the rank at each computation step. It is sufficient that there exists one process which decreases the rank when activated. Fairness then ensures that the program will eventually attain its goal. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/915/CS-TR-82-915.pdf %R CS-TR-82-922 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An approach to verifying completeness and consistency in a rule-based expert system %A Suwa, Motoi %A Scott, A. Carlisle %A Shortliffe, Edward H. %D June 1982 %X We describe a program for verifying that a set of rules in an expert system comprehensively spans the knowledge of a specialized domain. The program has been devised and tested within the context of the ONCOCIN System, a rule-based consultant for clinical oncology. The stylized format of ONCOCIN's rules has allowed the automatic detection of a number of common errors as the knowledge base has been developed. This capability suggests a general mechanism for correcting many problems with knowledge base completeness and consistency before they can cause performance errors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/922/CS-TR-82-922.pdf %R CS-TR-82-923 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Explanatory power for medical expert systems: studies in the representation of causal relationships for clinical consultations %A Wallis, Jerold W. %A Shortliffe, Edward H. %D July 1982 %X This paper reports on experiments designed to identify and implement mechanisms for enhancing the explanation capabilities of reasoning programs for medical consultation. The goals of an explanation system are discussed, as is the additional knowledge needed to meet these goals in a medical domain. We have focussed on the generation of explanations that are appropriate for different types of system users. This task requires a knowledge of what is complex and what is important; it is further strengthened by a classification of the associations or causal mechanisms inherent in the inference rules. A causal representation can also be used to aid in refining a comprehensive knowledge base so that the reasoning and explanations are more adequate. We describe a prototype system which reasons from causal inference rules and generates explanations that are appropriate for the user. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/923/CS-TR-82-923.pdf %R CS-TR-82-926 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Principles of rule-based expert systems %A Buchanan, Bruce G. %A Duda, Richard O. %D August 1982 %X Rule-based expert systems are surveyed. The most important considerations are representation and inference. Rule-based systems make strong assumptions about the representation of knowledge as conditional sentences and about the control of inference in one of three ways. The problem of reasoning with incomplete or inexact information is also discussed, as are several other issues regarding the design of expert systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/926/CS-TR-82-926.pdf %R CS-TR-82-927 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Combining state machines and regular expressions for automatic synthesis of VLSI circuits %A Ullman, Jeffrey D. %D September 1982 %X We discuss a system for translating regular expressions into logic equations or PLA's, with particular attention to how we can obtain both the benefits of regular expressions and state machines as input languages. An extended example of the method is given, and the results of our approach is compared with hand design; in this example we use less than twice the area of a hand-designed, machine optimized PLA. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/927/CS-TR-82-927.pdf %R CS-TR-82-928 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automated ambulatory medical record systems in the U.S. %A Kuhn, Ingeborg M. %A Wiederhold, Gio %A Rodnick, Jonathan E. %A Ramsey-Klee, Diane M. %A Benett, Sanford %A Beck, Donald D. %D August 1982 %X This report presents an overview of the developments in Automated Ambulatory Medical Record Systems (AAMRS) from 1975 to the present. A summary of findings from a 1975 state-of-the-art review is presented along with the current findings of a follow-up study of a selected number of the AAMRS operating today. The studies revealed that effective automated medical record systems have been developed for ambulatory care settings and that they are now in the process of being transferred to other sites or users, either privately or as a commericial product. Since 1975 there have been no significant advances in system design. However, progress has been substantial in terms of achieving production goals. Even though a variety of systems are commercially available, there is a continuing need for research and development to improve the effectiveness of the systems in use today. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/928/CS-TR-82-928.pdf %R CS-TR-82-931 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T PUFF: an expert system for interpretation of pulmonary function data %A Aikins, Janice S. %A Kunz, John C. %A Shortliffe, Edward H. %A Fallat, Robert J. %D September 1982 %X The application of Artificial Intelligence techniques to real-world problems has produced promising research results, but seldom has a system become a useful tool in its domain of expertise. Notable exceptions are the DENDRAL and MOLGEN systems. This paper describes PUFF, a program that interprets lung function test data and has become a working tool in the pulmonary physiology lab of a large hospital. Elements of the problem that paved the way for its success are examined, as are significant limitations of the solution that warrant further study. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/931/CS-TR-82-931.pdf %R CS-TR-82-932 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Expert systems research: modeling the medical decision making process %A Shortliffe, Edward H. %A Fagan, Lawrence M. %D September 1982 %X During the quarter century since the birth of the branch of computer science known as artificial intelligence (AI), much of the research has focused on developing symbolic models of human inference. In the last decade several related AI research themes have come together to form what is now known as "expert systems research." In this paper we review AI and expert systems to acquaint the reader with the field and to suggest ways in which this research will eventually be applied to advanced medical monitoring. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/932/CS-TR-82-932.pdf %R CS-TR-82-933 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An algorithmic method for studying percolation clusters %A Klein, Shmuel T. %A Shamir, Eli %D September 1982 %X In percolation theory one studies configurations, based on some infinite lattice, where the sites of the lattice are randomly made F (full) with probability p or E (empty) with probability 1-p. For p > $p_c$, the set of configurations which contain an infinite cluster (a connectivity component) has probability 1. Using an algorithmic method and a rearrangement lemma for Bernoulli sequences, we compute the boundary-to-body quotient of infinite clusters and prove it has the definite value (1-p)/p with probability 1. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/933/CS-TR-82-933.pdf %R CS-TR-82-947 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Modelling degrees of item interest for a general database query system %A Rowe, Neil C. %D April 1982 %X Many databases support decision-making. Often this means choices between alternatives according to partly subjective or conflicting criteria. Database query languages are generally designed for precise, logical specification of the data of interest, and tend to be awkward in the aforementioned circumstances. Information retrieval research suggests several solutions, but there are obstacles to generalizing these ideas to most databases. To address this problem we propose a methodology for automatically deriving and monitoring "degrees of interest" among alternatives for a user of a database system. This includes (a) a decision theory model of the value of information to the user, and (b) inference mechanisms, based in part on ideas from artificial intelligence, that can tune the model to observed user behavior. This theory has important applications to improving efficiency and cooperativeness of the interface between a decision-maker and a database system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/947/CS-TR-82-947.pdf %R CS-TR-82-949 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The r-Stirling numbers %A Broder, Andrei Z . %D December 1982 %X The r-Stirling numbers of the first and second kind count restricted permutations and respectively restricted partitions, the restriction being that the first r elements must be in distinct cycles and respectively distinct subsets. The combinatorial and algebraic properties of these numbers, which is most cases generalize similar properties of the regular Stirling numbers, are explored starting from the above definition. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/949/CS-TR-82-949.pdf %R CS-TR-82-950 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Learning physical description from functional definitions, examples and precedents %A Winston, Patrick H. %A Binford, Thomas O. %A Katz, Boris %A Lowry, Michael %D January 1983 %X It is too hard to tell vision systems what things look like. It is easier to talk about purpose and what things are for. Consequently, we want vision systems to use functional descriptions to identify things, when necessary, and we want them to learn physical descriptions for themselves, when possible. This paper describes a theory that explains how to make such systems work. The theory is a synthesis of two sets of ideas: ideas about learning from precedents and exercises developed at MIT and ideas about physical description developed at Stanford. The strength of the synthesis is illustrated by way of representative experiments. All of these experiments have been performed with an implementation system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/950/CS-TR-82-950.pdf %R CS-TR-82-951 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Five paradigm shifts in programming language design and their realization in Viron, a dataflow programming environment %A Pratt, Vaughan %D December 1982 %X We describe five paradigm shifts in programming language design, some old and some relatively new, namely Effect to Entity, Serial to Parallel, Partition Types to Predicate Types, Computable to Definable, and Syntactic Consistency to Semantic Consistency. We argue for the adoption of each. We exhibit a programming language, Viron, lhat capitalizes on these shifts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/951/CS-TR-82-951.pdf %R CS-TR-82-953 %Z Thu, 01 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Partial bibliography of work on expert systems %A Buchanan, Bruce G. %D December 1982 %X Since 1971 many publications on expert systems have appeared in conference proceedings and the technical literature. Over 200 titles are listed in the bibliography. Many relevant publications are omitted because they overlap publications on the list; others should be called to my attention. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/82/953/CS-TR-82-953.pdf %R CS-TR-81-836 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Verification of concurrent programs, Part I: The temporal framework %A Manna, Z ohar %A Pnueli, Amir %D June 1981 %X This is the first in a series of reports describing the application of temporal logic to the specification and verification of concurrent programs. We first introduce temporal logic as a tool for reasoning about sequences of states. Models of concurrent programs based both on transition graphs and on linear-text representations are presented and the notions of concurrent and fair executions are defined. The general temporal language is then specialized to reason aboaut those execution sequences that are fair computations of a concurrent program. Subsequently, the language is used to describe properties of concurrent programs. The set of interesting properties is classified into invariance (safety), eventuality (liveness), and precedence (until) properties. Among the properties studied are: partial correctness, global invariance, clean behavior, mutual exclusion, absence of deadlock, termination, total correctness, intermittent assertions, accessibility, responsiveness, safe liveness, absence of unsolicited response, fair responsiveness, and precedence. In the following reports of this series, we will use the temporal formalism to develop proof methodologies for proving the properties discussed here. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/836/CS-TR-81-836.pdf %R CS-TR-81-837 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Research on expert systems %A Buchanan, Bruce G. %D March 1981 %X All AI programs are essentially reasoning programs. And, to the extent that they reason well about a problem area, all exhibit some expertise at problem solving. Programs that solve the Tower of Hanoi puzzle, for example, reason about the goal state and the initial state in order to find "expert-level" solutions. Unlike other programs, however, the claims about expert systems are related to questions of usefulness and understandability as well as performance. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/837/CS-TR-81-837.pdf %R CS-TR-81-838 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Dynamic program building %A Brown, Peter %D February 1981 %X This report argues that programs are better regarded as dynamic running objects rather than as static textual ones. The concept of dynamic building, whereby a program is constructed as it runs, is described. The report then describes the Build system, which is an implementation of dynamic building for an interactive algebraic programming language. Dynamic building aids the locating of run-time errors, and is especially valuable in environments where programs are relatively short but run-time errors are frequent and/or costly. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/838/CS-TR-81-838.pdf %R CS-TR-81-839 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Short WAITS %A Samuel, Arthur L. %D February 1981 %X This is an introductory manual describing the SU-AI timesharing system that is available primarily for sponsored research in the Computer Science Department. The present manual is written for the beginner and the user interested primarily in the message handling capability as well as for the experienced computer user and programmer who either is unfamiliar with the SU-AI computer or who uses it infrequently. References are made to the available hard-copy manuals and to the extensive on-line document files where more complete information can be obtained. The principal advantages of this system are: 1) The availability of a large repertoire of useful system features; 2) The large memory; 3) The large file storage system; 4) The ease with which one can access other computers via the ARPA net; 5) The file transfer facilities via the EFTP program and the ETHERNET; 6) The XGP and the DOVER printers and the large collections of fonts available for them; and 7) The fast and convenient E editor with its macro facilities. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/839/CS-TR-81-839.pdf %R CS-TR-81-846 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Byzantine Generals strike again %A Dolev, Danny %D March 1981 %X Can unanimity be achieved in an unreliable distributed system? This problem was named "The Byzantine Generals Problem," by Lamport, Pease and Shostak [1980]. The results obtained in the present paper prove that unanimity is achievable in any distributed system if and only if the number of faulty processors in the system is: 1) less than one third of the total number of processors; and 2) less than one half of the connectivity of the system's network. In cases where unanimity is achievable, algorithms to obtain it are given. This result forms a complete characterization of networks in light of the Byzantine Problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/846/CS-TR-81-846.pdf %R CS-TR-81-847 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The optimal locking problem in a directed acyclic graph %A Korth, Henry F. %D March 1981 %X We assume a multiple granularity database locking scheme similar to that of Gray, et al. [197S] in which a rooted directed acyclic graph is used to represent the levels of granularity. We prove that even if it is known in advance exactly what database references the transaction will make, it is NP-complete to find the optimal locking strategy for the transaction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/847/CS-TR-81-847.pdf %R CS-TR-81-848 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the problem of inputting Chinese characters %A Tang, Chih-sung %D April 1981 %X If Chinese-speaking society is to make the best use of computers, it is important to develop an easy, quick, and convenient way to input Chinese characters together with other conventional characters. Many people have tried to approach this problem by designing special typewriters for Chinese character input, but such methods have serious deficiencies and they do not take advantage of the fact that the input process is just part of a larger system in which a powerful computer lies behind the keyboard. The purpose of this note is to clarify the problem and to illustrate a promising solution based entirely on a standard ASCII keyboard. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/848/CS-TR-81-848.pdf %R CS-TR-81-849 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Experiments on the Knee Criterion in a multiprogrammed computer system %A Nishigaki, Tohru %D March 1981 %X Although the effectiveness of the Knee Criterion as a virtual memory management strategy is widely accepted, it has been impossible to take advantage of it in a practical system, because little information is available about the program behavior of executing jobs. A new memory management technique to achieve the Knee Criterion in a multiprogrammed virtual memory system is developed. The technique, termed the Optimum Working-set Estimator (OWE), abstracts the programs' behavior from their past histories by exponential smoothing, and modifies their working set window sizes in order to attain the Knee Criterion. The OWE method was implemented and investigated. Measurements demonstrate its ability to control a variety of jobs. Furthermore, the results also reveal that the throughput improvement is possible in a space-squeezing environment. This technique is expected to increase the efficiency of multiprogrammed virtual memory systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/849/CS-TR-81-849.pdf %R CS-TR-81-851 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Binding in information processing %A Wiederhold, Gio %D May 1981 %X The concept of binding, as used in programming systems, is analyzed and defined in a number of contexts. The attributes of variables to be bound and the phases of binding are enumerated. The definition is then broadened to cover general issues in information systems. Its applicability is demonstrated in a wide range of system design and implementation issues. A number of Database Management Systems are categorized according to the terms defined. A first-order quantitative model is developed and compared with current practice. The concepts and the model are considered helpful when used as a tool for the global design phase of large information systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/851/CS-TR-81-851.pdf %R CS-TR-81-854 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the security of public key protocols %A Dolev, Danny %A Yao, Andrew C. %D May 1981 %X Recently, the use of public key encryption to provide secure network communication has received considerable attention. Such public key systems are usually effective against passive eavesdroppers, who merely tap the lines and try to decipher the message. It has been pointed out, however, that an improperly designed protocol could be vulnerable to an active saboteur, one who may impersonate another user or alter the message being transmitted. In this paper we formulate several models in which the security of protocols can be discussed precisely. Algorithms and characterizations that can be used to determine protocol security in these models will be given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/854/CS-TR-81-854.pdf %R CS-TR-81-863 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem-solving seminar %A Knuth, Donald E. %A Miller, Allan A. %D June 1981 %X This report contains a record of the autumn 1980 session of CS 204, a problem-solving and programming seminar taught at Stanford that is primarily intended for first-year Ph.D. students. The seminar covers a large range of topics, research paradigms, and programming paradigms in computer science, so these notes will be of interest to graduate students, professors, and professional computer scientists. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/863/CS-TR-81-863.pdf %R CS-TR-81-865 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Toward a unified logical basis for programming languages %A Tang, Chih-sung %D June 1981 %X In recent years, more and more computer scientists have been paying attention to temporal logic, since there are many properties of programs that can be described only by bringing the time parameter into consideration. But existing temporal logic languages, such as Lucid, in spite of their mathematical elegance, are still far from practical. I believe that a practical temporal-logic language, once it came into being, would have a wide spectrum of applications. XYZ /E is a temporal-logic language. Like other logic languages, it is a logic system as well as a programming language. But unlike them, it can express all conventional data structures and control structures, nondeterminate or concurrent programs, even programs with branching-time order. We find that the difficulties met in other logic languages often stem from the fact that they try to deal with these structures in a higher level. XYZ /E adopts another approach. We divide the language into two forms: the internal form and the external form. The former is lower level, while the latter is higher. Just as any logic system contains rules of abbreviation, so also in XYZ /E there are rules of abbreviation to transform the internal form into the external form, and vice versa. These two forms can be considered to be different representations of the same thing. We find that this approach can ameliorate many problems of formalization. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/865/CS-TR-81-865.pdf %R CS-TR-81-867 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ADAM - an Ada based language for multi-processing %A Luckham, David C. %A Larsen, Howard J. %A Stevenson, David R. %A Henke, Friedrich W. von %D July 1981 %X Adam is an experimental language derived from Ada. It was developed to facilitate study of issues in Ada implementation. The two primary objectives which motivated the development of Adam were: to program supervisory packages for multitask scheduling, and to formulate algorithms for compilation of Ada tasking. Adam is a subset of the sequential program constructs of Ada combined wlth a set of parallel processing constructs which are lower level than Ada tasking. In addition, Adam places strong restrictions on sharing of global objects between processes. Import declarations and propagate declarations are included. A compiler has been implemented in Maclisp on a DEC PDP-10. It produces assembly code for a PDP-10. It supports separate compilatlon, generics, exceptions, and parallel processes. Algorithms translating Ada tasking into Adam parallel processing have been developed and implemented. An experimental compiler for most of the final Ada language design, including task types and task rendezvous constructs, based on the Adam compiler, is presently available on PDP-10's. This compiler uses a procedure call implementation of task rendezvous, but wlll be used to develop and study alternate implementatlons. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/867/CS-TR-81-867.pdf %R CS-TR-81-868 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Last Whole Errata Catalog %A Knuth, Donald E. %D July 1981 %X This list supplements previous errata published in Stanford reports CS551 (1976) and CS712 (1979). It includes the first corrections and changes to the second edition of volume two (published January, 1981) as well as to the most recent printings of volumes one and three (first published in 1975). In addition to the errors listed here, about half of the occurrences of 'which' in volumes one and three should be changed to 'that'. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/868/CS-TR-81-868.pdf %R CS-TR-81-869 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computer Science comprehensive examinations, 1978/79-1980/81 %A Tajnai, Carolyn E. %D August 1981 %X The Stanford Computer Science Comprehensive Examination was conceived Spring Quarter 1971/72 and since then has been given winter and spring quarters each year. The 'Comp' serves several purposes in the department. There are no course requirements in the Ph.D. and the Ph.D. Minor programs, and only one (CS293, Computer Laboratory) in the Master's program. Therefore, the 'Comp' fulfills the breadth and depth requirements. The Ph.D. Minor and Master's student must pass at the Master's level to be eligible for the degree. For the Ph.D. student it serves as a "Rite of Passage"; the exam must be passed at the Ph.D. level by the end of six quarters of full-time study (excluding summers) for the student to continue in the program. This report is a collection of comprehensive examinations from Winter Quarter 1978/79 through Spring Quarter 1980/81. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/869/CS-TR-81-869.pdf %R CS-TR-81-871 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Good layouts for pattern recognizers %A Trickey, Howard W. %D August 1981 %X A system to lay out custom circuits that recognize regular languages can be a useful VLSI design automation tool. This paper describes the algorithms used in an implementation of a regular expression compiler. Layouts that use a network of programmable logic arrays (PLA's) have smaller areas than those of some other methods, but there are the problems of partitioning the circuit and then placing the individual PLA's. Regular expressions have a structure which allows a novel solution to these problems: dynamic programming can be used to find layouts which are in some sense optimal. Various search pruning heuristics have been used to increase thc speed of the compiler, and the experience with these is reported in the conclusions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/871/CS-TR-81-871.pdf %R CS-TR-81-875 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computation of matrix chain products: Part I, Part II %A Hu, T. C. %A Shing, M. T. %D September 1981 %X This paper considers the computation of matrix chain products of the form $M_1 x M_2 x ... x M_{n-1}$. If the matrices are of different dimensions, the order in which the product is computed affects the number of operations. An optimum order is an order which minimizes the total number of operations. Some theorems about an optimum order of computing the matrices are presented in part I. Based on these theorems, an O(n log n) algorithm for finding an optimum order is presented in part II. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/875/CS-TR-81-875.pdf %R CS-TR-81-876 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On linear area embedding of planar graphs %A Dolev, Danny %A Trickey, Howard W. %D September 1981 %X Planar embedding with minimal area of graphs on an integer grid is one of the major issues in VLSI. Valiant [1981] gave an algorithm to construct a planar embedding for trees in linear area; he also proved that there are planar graphs that require quadratic area. We give an algorithm to embed outerplanar graphs in linear area. We extend this algorithm to work for every planar graph that has the following property: for every vertex there exists a path of length less than K to the exterior face, where K is a constant. Finally, finding a minimal embedding area is shown to be NP-complete for forests, and hence for more general types of graphs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/876/CS-TR-81-876.pdf %R CS-TR-81-879 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interlisp-VAX: a report %A Masinter, Larry M. %D August 1981 %X This report documents the results of a study to evaluate the feasibility of implementing the Interlisp language to run on the DEC VAX computer. Specific goals of the study were to: 1) assess the technical status of the on-going implementation project at USC-ISI; 2) estimate the expected performance of Interlisp on the VAX famility of machines as compared to Interlisp-10, other Lisp systems for the VAX, and other Interlisp implementations where performance data were available; and 3) identify serious obstacles and alternatives to the timely completion of an effective Interlisp-VAX system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/879/CS-TR-81-879.pdf %R CS-TR-81-880 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Well structured parallel programs are not easier to schedule %A Mayr, Ernst W. %D September 1981 %X The scheduling problem for unit time task systems with arbitrary precedence constrainls is known to be NP-complete. We show that the same is true even if the precedence constraints are restricted to certain subclasses which make the corresponding parallel programs more structured. Among these classes are those derived from hierarchic cobegin-coend programming constructs, level graph forests, and the parallel or serial composition of an out-tree and an in-tree. In each case, the completeness proof depends heavily on the number of processors being part of the problem instances. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/880/CS-TR-81-880.pdf %R CS-TR-81-883 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On program transformations for abstract data types and concurrency %A Pepper, P. %D October 1981 %X We study transformation rules for a particular class of abstract data types, namely types that are representable by recursive mode declarations. The transformations are tailored to the development of efficient tree traversal and they allow for concurrency. The techniques are exemplified by an implementation of concurrent insertion and deletion in 2-3-trees. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/883/CS-TR-81-883.pdf %R CS-TR-81-887 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finding the convex hull of a simple polygon %A Graham, Ronald L. %A Yao, Frances %D November 1981 %X It is well known that the convex hull of a set of n points in the (Euclidean) plane can be found by an algorithm having worst-case complexity O(n log n). In this note we give a short linear algorithm for finding the convex hull in the case that the (ordered) set of points from the vertices of a simple (i.e., non-self-intersecting) polygon. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/887/CS-TR-81-887.pdf %R CS-TR-81-889 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T AL users' manual %A Mujtaba, Shahid %A Goldman, Ron %D December 1981 %X AL is a high-level programming language for manipulator control useful in industrial assembly research. This document describes the current state of the AL system now in operation at the Stanford Artificial Intelligence Laboratory, and teaches the reader how to use it. The system consists of the AL compiler and runtime system and the source code interpreter, POINTY, which facilitates specifying representation of parts, and interactive execution of AL statements. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/889/CS-TR-81-889.pdf %R CS-TR-81-894 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Methodology for building an intelligent tutoring system %A Clancey, William J. %D October 1981 %X Over the past 6 years we have been developing a computer program to teach medical diagnosis. Our research synthesizes and extends results in artlficlal intelligence (Al), medicine, and cognitive psychology. This paper describes the progression of the research, and explalns how theories from these fields are combined in a computational model. The general problem has been to develop an "intelligent tutoring system" by adapting the MYCIN "expert system." Thls conversion requires a deeper understanding of the nature of expertise and explanatlon than origlnally requlred for developlng MYCIN, and a concomitant shift in perspective from slmple performance goals to attaining psychologlcal validity in the program's reasoning process. Others have written extensively about the relatlon of artificlal intelligence to cognltive sclence (e.g., [Pylyshyn, 1978] [Boden, 1977]). Our purpose here is not to repeat those arguments, but to present a case study which will provide a common point for further dlscusslon. To this end, to help evaluate the state of cognitive science, we will outline our methodology and survey what resources and viewpoints have helped our research. We will also discuss pitfalls that other Al-oriented cognitive scientists may encounter. Finally, we will present some questions coming out of our work whlch might suggest possible collaboration with other fields of research. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/894/CS-TR-81-894.pdf %R CS-TR-81-896 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The epistemology of a rule-based expert system: a framework for explanation %A Clancey, William J. %D November 1981 %X Production rules are a popular representation for encoding heuristic knowledge in programs for scientific and medical problem solving. However, experience with one of these programs, MYCIN, indicates that the representation has serious limitations: people other than the original rule authors find it difficult to modify the rule set, and the rules are unsuitable for use in other settings, such as for application to teaching. These paroblems are rooted in fundamental limitations in MYCIN's original rule representation: the view that expert knowledge can be encoded as a uniform, weakly-structured set of if/then associations is found to be wanting. To illustrate these problems, this paper examines MYCIN's rujles from the perspective of a teacher trying to justify them and to convey a problem-solving approach. We discover that individual rules play different roles, have different kinds of justifications, and are constructed using different rationales for the ordering and choice of premise clauses. This design knowledge, consisting of structural and strategic concepts which lie outside the representation, is shown to be procedurally embedded in the rules. Moreover, because the data/hypothesis associations are themselves a proceduralized form of underlying disease models, they can only be supported by appealing to this deeper level of knowledge. Making explicit this structural, strategic and support knowledge enhances the ability to understand and modify the system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/896/CS-TR-81-896.pdf %R CS-TR-81-898 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Separability as a physical database design methodology %A Whang, Kyu-Young %A Wiederhold, Gio %A Sagalowicz, Daniel %D October 1981 %X A theoretical approach to the optimal design of large multifile physical databases is presented. The design algorithm is based on the theory that, given a set of join methods that satisfy a certain property called "separability," the problem of optimal assignment of access structures to the whole database can be reduced to the subproblem of optimizing individual relations independently of one another. Coupling factors are defined to represent all the interactions among the relations. This approach not only reduces the complexity of the problem significantly, but also provides a better understanding of underlying mechanisms. A closed noniterative formula is introduced for estimating the number of block accesses in a database organization, and the error analyzed. This formula, an approximation of Yao's exact formula, has a maximum error of 3.7%, and significantly reduces the computation time by eliminating the iterative loop. It also achieves a much higher accuracy than an approximation proposed by Cardenas. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/81/898/CS-TR-81-898.pdf %R CS-TR-80-779 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Problematic features of programming languages: a situational-calculus approach %A Manna, Z ohar %A Waldinger, Richard J. %D September 1980 %X Certain features of programming languages, such as data structure operations and procedure call mechanisms, have been found to resist formalization by classical techniques. An alternate approach is presented, based on a "situational calculus," which makes explicit reference to the states of a computation. For each state, a distinction is drawn between an expression, its value, and the location of the value. Within this conceptual framework, the features of a programming language can be described axiomatically. Programs in the language can then be synthesized, executed, verified, or transformed by performing deductions in this axiomatic system. Properties of entire classes of programs, and of programming languages, can also be expressed and proved in this way. The approach is amenable to machine implementation. In a situational-calculus formalism it is possible to model precisely many "problematic" features of programming langauges, including operations on such data structures as arrays, pointers, lists, and records, and such procedure call mechanisms as call-by-reference, call-by-value, and call-by-name. No particular obstacle is presented by aliasing between variables, by declarations, or by recursive procedures. The paper is divided into three parts, focusing respectively on the assignment statement, on data structure operations, and on procedure call mechanisms. In this first part, we introduce the conceptual framework to be applied throughout and present the axiomatic definition of the assignment statement. If suitable restrictions on the programming language are imposed, the well-known Hoare assignment axiom can then be proved as a theorem. However, our definition can also describe the assignment statement of unrestricted programming languages, for which the Hoare axiom does not hold. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/779/CS-TR-80-779.pdf %R CS-TR-80-780 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Computer Modern family of typefaces %A Knuth, Donald E. %D January 1980 %X This report gives machine-independent definitions of all the styles of type planned for use in future editions of "The Art of Computer Programming." Its main purpose is to provide a detailed example of a complete family of font definitions using METAFONT, so that people who want new symbols for their own books and papers will understand how to incorporate them easily. The fonts are intended to have the same spirit as those used in earlier editions of "The Art of Computer Programming," but each character has been redesigned and defined in the METAFONT idiom. It is hoped that some readers will be inspired to make similar definitions of other important familites of fonts. The bulk of this report consists of about 400 short METAFONT programs for the various symbols needed, and as such it is pretty boring, but there are some nice illustrations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/780/CS-TR-80-780.pdf %R CS-TR-80-785 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Equations and rewrite rules: a survey %A Huet, Gerard %A Oppen, Derek C. %D January 1980 %X Equations occur frequently in mathematics, logic and computer science. In this paper, we survey the main results concerning equations, and the methods available for reasoning about them and computing with them. The survey is self-contained and unified, using traditional abstract algebra. Reasoning about equations may involve deciding if an equation follows from a given set of equations (axioms), or if an equation is true in a given theory. When used in this manner, equations state properties that hold between objects. Equations may also be used as definitions; this use is well known in computer science: programs written in applicative languages, abstract interpreter definitions, and algebraic data type definitions are clearly of this nature. When these equations are regarded as oriented "rewrite rules," we may actually use them to compute. In addition to covering these topics, we discuss the problem of "solving" equations (the "unification" problem), the problem of proving termination of sets of rewrite rules, and the decidability and complexity of word problems and of combinations of equational theories. We restrict ourselves to first-order equations, and do not treat equations which define non-terminating computations or recent work on rewrite rules applied to equational congruence classes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/785/CS-TR-80-785.pdf %R CS-TR-80-786 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Algorithms in modern mathematics and computer science %A Knuth, Donald E. %D January 1980 %X The life and work of the ninth century scientist al-Khwarizmi, "the father of algebra and algorithms," is surveyed briefly. Then a random sampling technique is used in an attempt to better understand the kinds of thinking that good mathematicians and computer scientists do and to analyze whether such thinking is significantly "algorithmic" in nature. (This is the text of a talk given at the opening session of a symposium on "Algorithms in Modern Mathamatics and Computer Science" held in Urgench, Khorezm Oblast', Uzbek S.S.R., during the week of September 16-22, 1979.) %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/786/CS-TR-80-786.pdf %R CS-TR-80-788 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Circumscription - a form of non-monotonic reasoning %A McCarthy, John %D February 1980 %X Humans and intelligent computer programs must often jump to the conclusion that the objects they can determine to have certain properties or relations are the only objects that do. Circumscripion formalizes such conjectural reasoning. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/788/CS-TR-80-788.pdf %R CS-TR-80-789 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ADA exceptions: specification and proof techniques %A Luckham, David C. %A Polak, Wolfgang %D February 1980 %X A method of documenting exception propagation and handling in Ada programs is proposed. Exception propagation declarations are introduced as a new component of Ada specifications. This permits documentation of those exceptions that can be propagated by a subprogram. Exception handlers are documented by entry assertions. Axioms and proof rules for Ada exceptions are given. These rules are simple extensions of previous rules for Pascal and define an axiomatic semantics of Ada exceptions. As a result, Ada programs specified according to the method can be analysed by formal proof techniques for consistency with their specifications, even if they employ exception propagation and handling to achieve required results (i.e. non-error situations). Example verifications are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/789/CS-TR-80-789.pdf %R CS-TR-80-790 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Databases in healthcare %A Wiederhold, Gio %D March 1980 %X This report defines database design and implementation technology as applicable to healthcare. The relationship of technology to various healthcare settings is explored, and the effectiveness on healthcare costs, quality and access is evaluated. A summary of relevant development directions is included. Detailed examples of 5 typical clinical applications (public health, clinical trials, clinical research, ambulatory care, and hospitals) are appended. There is an extended bibliography. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/790/CS-TR-80-790.pdf %R CS-TR-80-792 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MAINSAIL implementation overview %A Wilcox, Clark R. %A Dageforde, Mary L. %A Jirak, Gregory A. %D March 1980 %X The MAINSAIL programming language and the supporting implementations have been developed over the past five years as an integrated approach to a viable machine-independent system suitable for the development of large, portable programs. Particular emphasis has been placed on minimizing the effort involved in moving the system to a new machine and/or operating system. For this reason, almost all of the compiler and runtime support is written in MAINSAIL, and is utilized in each implementation without alteration. This use of a high-level language to support its own implementation has proved to be a significant advantage in terms of documentation and maintenance, without unduly affecting the execution speed. This paper gives an overview of the compiler and runtime implementation strategies, and indicates what an implementation requires for the machine-dependent and operating-system-dependent parts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/792/CS-TR-80-792.pdf %R CS-TR-80-794 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Recent developments in the complexity of combinatorial algorithms %A Tarjan, Robert Endre %D June 1980 %X The last three years have witnessed several major advances in the area of combinatorial algorithms. These include improved algorithms for matrix multiplication and maximum network flow, a polynomial-time algorithm for linear programming, and steps toward a polynomial-time algorithm for graph isomorphism. This paper surveys these results and suggests directions for future research. Included is a discussion of recent work by the author and his students on dynamic dictionaries, network flow problems, and related questions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/794/CS-TR-80-794.pdf %R CS-TR-80-796 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Essential E %A Samuel, Arthur L. %D March 1980 %X This is an introductory manual describing the display-oriented text editor E that is available on the Stanford A.I. Laboratory PDP-10 computer. The present manual is intended to be used as an aid for the beginner as well as for experienced computer users who either are unfamiliar with the E editor or use it infrequently. Reference is made to the two on-line manuals that help the beginner to get started and that provide a complete description of the editor for the experienced user. E is commonly used for writing computer programs and for preparing reports and memoranda. It is not a document editor, although it does provide some facilities for getting a document into a pleasing format. The primary emphasis is that of speed, both in terms of the number of key strokes required of the user and in terms of the demands made on the computer system. At the same time, E is easy to learn and it offers a large range of facilities that are not available on many editors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/796/CS-TR-80-796.pdf %R CS-TR-80-797 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Read-only transactions in a distributed database %A Garcia-Molina, Hector %A Wiederhold, Gio %D April 1980 %X A read-only transaction or query is a transaction which does not modify any data. Read-only transactions could be processed with general transaction processing algorithms, but in many cases it is more efficient to process read-only transactions with special algorithms which take advantage of the knowledge that the transaction only reads. This paper defines the various consistency and currency requirements that read-only transactions may have. The processing of the different classes of read-only transactions in a distributed database is discussed. The concept of R insularity is introduced to characterize both the read-only and update algorithms. Several simple update and read-only transaction processing algorithms are presented to illustrate how the query requirements and the update algorithms affect the read-only transaction processing algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/797/CS-TR-80-797.pdf %R CS-TR-80-799 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Multidimensional additive spline approximation %A Friedman, Jerome H. %A Grosse, Eric %A Stuetzle, Werner %D May 1980 %X We describe an adaptive procedure that approximates a function of many variables by a sum of (univariate) spline functions $s_m$ of selected linear combinations $a_m \cdot x$ of the coordinates $\theta (x) = \sum_{1\le m\le M} s_m (a_m \cdot x)$. The procedure is nonlinear in that not only the spline coefficients but also the linear combinations are optimized for the particular problem. The sample need not lie on a regular grid, and the approximation is affine invariant, smooth, and lends itself to graphical interpretation. Function values, derivatives, and integrals are cheap to evaluate. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/799/CS-TR-80-799.pdf %R CS-TR-80-807 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Path-regular graphs %A Matula, David W. %A Dolev, Danny %D June 1980 %X A graph is vertex-[edge-]path-regular if a list of shortest paths, allowing multiple copies of paths, exists where every pair of vertices are the endvertices of the same number of paths and each vertex [edge] occurs in the same number of paths of the list. The dependencies and independencies between the various path-regularity, regularity of degree, and symmetry properties are investigated. We show that every connected vertex-[edge-]symmetric graph is vertex-[edge-]path-regular, but not conversely. We show that the product of any two vertex-path-regular graphs is vertex-path-regular but not conversely, and the iterated product G x G x ... x G is edge-path-regular if and only if G is edge-path-regular. An interpretation of path-regular graphs is given regarding the efficient design of concurrent communication networks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/807/CS-TR-80-807.pdf %R CS-TR-80-808 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Final report: Basic Research in Artificial Intelligence and Foundations of Programming %A McCarthy, John %A Binford, Thomas O. %A Luckham, David C. %A Manna, Z ohar %A Weyhrauch, Richard W. %A Earnest, Les %D May 1980 %X Recent research results are reviewed in the areas of formal reasoning, mathematical theory of computation, program verification, and image understanding. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/808/CS-TR-80-808.pdf %R CS-TR-80-811 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An extended semantic definition of Pascal for proving the absence of common runtime errors %A German, Steven M. %D June 1980 %X We present an axiomatic definition of Pascal which is the logical basis of the Runcheck system, a working verifier for proving the absence of runtime errors such as arlthmetic overflow, array subscripting out of range, and accessing an uninitialized variable. Such errors cannot be detected at compile time by most compilers. Because the occurrence of a runtime error may depend on the values of data supplied to a program, techniques for assuring the absence of errors must be based on program specifications. Runcheck accepts Pascal programs documented with assertions, and proves that the specifications are consistent with the program and that no runtime errors can occur. Our axiomatic definition is similar to Hoare's axiom system, but it takes into account certain restrictions that have not been considered in previous definitions. For instance, our definition accurately models uninitialized variables, and requires a variable to have a well defined value before it can be accessed. The logical problems of introducing the concept of uninitialized variables are discussed. Our definition of expression evaluation deals more fully with function calls than previous axiomatic definitions. Some generalizations of our semantics are presented, including a new method for verifying programs with procedure and function parameters. Our semantics can be easily adopted to similar languages, such as ADA. One of the main potential problems for the user of a verifier is the need to write detailed, repetitious assertions. We develop some simple logical properties of our definition which are exploited by Runcheck to reduce the need for such detailed assertions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/811/CS-TR-80-811.pdf %R CS-TR-80-821 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Semiantichains and unichain coverings in direct products of partial orders %A West, Douglas B. %A Tovey, Craig A. %D September 1980 %X We conjecture a generalization of Dilworth's theorem to direct products of partial orders. In particular, we conjecture that the largest "semiantichain" and the smallest "unichain covering" have the same size. We consider a special class of semiantichains and unichain coverings and determine when equality holds for them. This conjecture implies the existence of k-saturated partitions. A stronger conjecture, for which we also prove a special case, implies the Greene-Kleitman result on simultaneous k and (k + 1)-saturated partitions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/821/CS-TR-80-821.pdf %R CS-TR-80-824 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T LCCD, a language for Chinese character design %A Mei, Tung Yun %D October 1980 %X LCCD is a computer system able to produce aesthetically pleasing Chinese characters for use on raster-oriented printing devices. It is analogous to METAFONT, in that the user writes a little program that explains how to draw each character; but it uses different types of simulated 'pens' that are more appropriate to the Chinese idiom, and it includes special scaling features so that a complex character can easily be built up from simpler ones, in an interactive manner. This report contains a user's manual for LCCD, together with many illustrative examples and a discussion of the algorithms used. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/824/CS-TR-80-824.pdf %R CS-TR-80-826 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A database approach to communication in VLSI design %A Wiederhold, Gio %A Beetem, Anne %A Short, Garrett %D October 1980 %X This paper describes recent and planned work at Stanford in applying database technology to the problems of VLSI design. In particular, it addresses the issue of communication within a design's different representations and hierarchical levels in a multiple designer environment. We demonstrate the heretofore questioned utility of using commercial database systems, at least while developing a versatile, flexible, and generally efficient model and its associated communication paths. Completed work and results from initial work using DEC DBMS-20 is presented, including macro expansion within the database, and signalling of changes to higher structural levels. Considerable discussion regarding overall philosophy for continued work is also included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/826/CS-TR-80-826.pdf %R CS-TR-80-827 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the parallel computation for the knapsack problem %A Yao, Andrew Chi-Chih %D November 1980 %X We are interested in the complexity of solving the knapsack problem with n input real numbers on a parallel computer with real arithmetic and branching operations. A processor-time tradeoff constraint is derived; in particular, it is shown that an exponential number of processors have to be used if the problem is to be solved in time $t \le {\sqrt{n}}/2$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/827/CS-TR-80-827.pdf %R CS-TR-80-829 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The dinner table problem %A Aspvall, Bengt %A Liang, Frank M. %D December 1980 %X This report contains two papers inspired by the "dinner table problem": If n people are seated randomly around a circular table for two meals, what is the probability that no two people sit together at both meals? We show that this probability approaches $e^{-2}$ as $n \rightarrow \infty$, and also give a closed form. We then observe that in many similar problems on permutations with restricted position, the number of permutations satisfying a given number of properties is approximately Poisson distributed. We generalize our asymptotic argument to prove such a limit theorem, and mention applications to the problems of derangements, menages, and the asymptotic number of Latin rectangles. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/829/CS-TR-80-829.pdf %R CS-TR-80-830 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Two linear-time algorithms for five-coloring a planar graph %A Matula, David W. %A Shiloach, Yossi %A Tarjan, Robert E. %D November 1980 %X A "sequential processing" algorithm using bicolor interchange that five-colors an n vertex planar graph in $O(n^2)$ time was given by Matula, Marble, and Isaacson [1972]. Lipton and Miller used a "batch processing" algorithm with bicolor interchange for the same problem and achieved an improved O(n log n) time bound [1978]. In this paper we use graph contraction arguments instead of bicolor interchange and improve both the sequential processing and batch processing methods to obtain five-coloring algorithms that operate in O(n) time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/830/CS-TR-80-830.pdf %R CS-TR-80-832 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Scheduling wide graphs %A Dolev, Danny %D December 1980 %X The problem of scheduling a partially ordered set of unit length tasks on m identical processors is known to be NP-complete. There are efficient algorithms for only a few special cases of this problem. In this paper we explore the relations between the structure of the precedence graph (the partial order) and optimal schedules. We prove that in finding an optimal schedule for certain systems it suffices to consider at each step high roots which belong to at most the m-1 highest components of the precedence graph. This result reduces the number of cases we have to check during the construction of an optimal schedule. Our method may lead to the development of linear scheduling algorithms for many practical cases and to better bounds for complex algorithms. In particular, in the case the precedence graph contains only inforest and outforest components, this result leads to efficient algorithms for obtaining an optimal schedule on two or three processors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/832/CS-TR-80-832.pdf %R CS-TR-80-850 %Z Thu, 08 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Performing remote operations efficiently on a local computer network %A Spector, Alfred Z . %D December 1980 %X This paper presents a communication model for local networks, whereby processes execute generalized remote references that cause operations to be performed by remote processes. This remote reference/remote operation model provides a taxonomy of primitives that (1) are naturally useful in many applications and (2) can be efficiently implemented. The motivation for this work is our desire to develop systems architectures for local network based multiprocessors that support distributed applications requiring frequent interprocessor communication. After a section containing a brief overview, Section 2 of this paper discusses the remote reference/remote operation model. In it, we derive a set of remote reference types that can be supported by a communication system carefully integrated with the local network interface. The third section exemplifies a communication system that provides one remote reference type. These references (i.e., remote load, store, compare-and-swap, enqueue, and dequeue) take about 150 microseconds, or 50 average instruction times, to perform on Xerox Alto computers connected by a 2.94 megabit Ethernet. The last section summarizes this work and proposes a complete implementation resulting in a highly efficient communication system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/850/CS-TR-80-850.pdf %R CS-TR-80-768 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Causal nets or what is a deterministic computation %A Gacs, Peter %A Levin, Leonid A. %D October 1980 %X We introduce the concept of causal nets - it can be considered as the most general and elementary concept of the history of a deterministic computation (sequential or parallel). Causality and locality are distinguished as the only important properties of nets representing such records. Different types of complexities of computations correspond to different geometrical characteristics of the corresponding causal nets - which have the advantage of being finite objects. Synchrony becomes a relative notion. Nets can have symmetries; therefore it will make sense to ask what can be computed from arbitrary symmetric inputs. Here, we obtain a complete group-theoretical characterization of the kind of symmetries that can be allowed in parallel computations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/80/768/CS-TR-80-768.pdf %R CS-TR-78-709 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Design and analysis of a data structure for representing sorted lists %A Brown, Mark R. %A Tarjan, Robert E. %D December 1978 %X In this paper we explore the use of 2-3 trees to represent sorted lists. We analyze the worst-case cost of sequences of insertions and deletions in 2-3 trees under each of the following three assumptions: (i) only insertions are performed; (ii) only deletions are performed; (iii) deletions occur only at the small end of the list and insertions occur only away from the small end. Our analysis leads to a data structure for representing sorted lists when the access pattern exhibits a (perhaps time-varying) locality of reference. This structure has many of the properties of the representation proposed by Guibas, McCreight, Plass, and Roberts [1977], but it is substantially simpler and may be practical for lists of moderate size. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/709/CS-TR-78-709.pdf %R CS-TR-78-649 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T DENDRAL and Meta-DENDRAL: their applications dimension %A Buchanan, Bruce G. %A Feigenbaum, Edward A. %D February 1978 %X The DENDRAL and Meta-DENDRAL programs assist chemists with data interpretation problems. The design of each program is described in the context of the chemical inference problems the program solves. Some chemical results produced by the programs are mentioned. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/649/CS-TR-78-649.pdf %R CS-TR-78-651 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Proving termination and multiset orderings %A Dershowitz, Nachum %A Manna, Z ohar %D March 1978 %X A common tool for proving the termination of programs is the well-founded set, a set ordered in such a way as to admit no infinite descending sequences. The basic approach is to find a termination function that maps the elements of the program into some well-founded set, such that the value of the termination function is continually reduced throughout the computation. All too often, the termination functions required are difficult to find and are of a complexity out of proportion to the program under consideration. However, by providing more sophisticated well-founded sets, the corresponding termination functions can be simplified. Given a well-founded set S, we consider multisets over S, "sets" that admit multiple occurrences of elements taken from S. We define an ordering on all finite multisets over S that is induced by the given ordering on S. This multiset ordering is shown to be well-founded. The value of the multiset ordering is that it permits the use of relatively simple and intuitive termination functions in otherwise difficult termination proofs. In particular, we apply the multiset ordering to provide simple proofs of the termination of production systems, programs defined in terms of sets of rewriting rules. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/651/CS-TR-78-651.pdf %R CS-TR-78-652 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Simplification by cooperating decision procedures %A Nelson, Charles Gregory %A Oppen, Derek C. %D April 1978 %X We describe a simplifier for use in program manipulation and verification. The simplifier finds a normal form for any expression over the language consisting of individual variables, the usual boolean connectives, equality, the conditional function cond (denoting if-then-else), the numerals, the arithmetic functions and predicates +, - and $\leq$, the LISP constants, functions and predicates nil, car, cdr, cons and atom, the functions store and select for storing into and selecting from arrays, and uninterpreted function symbols. Individual variables range over the union of the reals, the set of arrays, LISP list structure and the booleans true and false. The simplifier is complete; that is, it simplifies every valid formula to true. Thus it is also a decision procedure for the quantifier-free theory of reals, arrays and list structure under the above functions and predicates. The organization of the simplifier is based on a method for combining decision procedures for several theories into a single decision procedure for a theory combining the original theories. More precisely, given a set S of functions and predicates over a fixed domain, a satisfiability program for S is a program which determines the satisfiability of conjunctions of literals (signed atomic formulas) whose predicate and function symbols are in S. We give a general procedure for combining satisfiability programs for sets S and T into a single satisfiability program for S $\cup$ T, given certain conditions on S and T. The simplifier described in this paper is currently used in the Stanford Pascal Verifier. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/652/CS-TR-78-652.pdf %R CS-TR-78-653 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Multi-terminal 0-1 flow %A Shiloach, Yossi %D April 1978 %X Given an undirected 0-1 flow network with n vertices and m edges, we present an O($n^2$(m+n)) algorithm which generates all ($n\choose 2$) maximal flows between all the pairs of vertices. Since O($n^2$(m+n)) is also the size of the output, this algorithm is optimal up to a constant factor. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/653/CS-TR-78-653.pdf %R CS-TR-78-654 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The two paths problem is polynomial %A Shiloach, Yossi %D April 1978 %X Given an undirected graph G = (V,E) and vertices $s_1$,$t_1$;$s_2$,$t_2$, the problem is to determine whether or not G admits two vertex disjoint paths $P_1$ and $P_2$, connecting $s_1$ with $t_1$ and $s_2$ with $t_2$ respectively. This problem is solved by an O($n\cdot m$) algorithm (n = |V|, m = |E|). An important by-product of the paper is a theorem that states that if G is 4-connected and non-planar, then such paths $P_1$ and $P_2$ exist for any choice of $s_1$, $s_2$, $t_1$, and $t_2$, (as was conjectured by Watkins [1968]). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/654/CS-TR-78-654.pdf %R CS-TR-78-655 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On accuracy and unconditional stability of linear multistep methods for second order differential equations %A Dahlquist, Germund %D April 1978 %X Linear multistep methods for the solution of the equation y" = f(t,y) are studied by means of the test equation y" = -$\omega^2$y, with $\omega$ real. It is shown that the order of accuracy cannot exceed 2 for an unconditionally stable method. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/655/CS-TR-78-655.pdf %R CS-TR-78-657 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the model theory of knowledge %A McCarthy, John %A Sato, Masahiko %A Hayashi, Takeshi %A Igarashi, Shigeru %D April 1978 %X Another language for expressing "knowing that" is given together with axioms and rules of inference and a Kripke type semantics. The formalism is extended to time-dependent knowledge. Completeness and decidability theorems are given. The problem of the wise men with spots on their foreheads and the problem of the unfaithful wives are expressed in the formalism and solved. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/657/CS-TR-78-657.pdf %R CS-TR-78-661 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Variations of a pebble game on graphs %A Gilbert, John R. %A Tarjan, Robert Endre %D September 1978 %X We examine two variations of a one-person pebble game played on directed graphs, which has been studied as a model of register allocation. The black-white pebble game of Cook and Sethi is shown to require as many pebbles in the worst case as the normal pebble game, to within a constant factor. For another version of the pebble game, the problem of deciding whether a given number of pebbles is sufficient for a given graph is shown to be complete in polynomial space. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/661/CS-TR-78-661.pdf %R CS-TR-78-662 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T New algorithms in bin packing %A Yao, Andrew Chi-Chih %D September 1978 %X In the bin-packing problem a list L of n numbers are to be packed into unit-capacity bins. For any algorithm S, let r(S) be the maximum ratio S(L)/$L^*$ for large $L^*$, where S(L) denotes the number of bins used by S and $L^*$ denotes the minimum number needed. In this paper we give an on-line O(n log n)-time algorithm RFF with r(RFF) = 5/3, and an off-line polynomial-time algorithm RFFD with r(RFFD) = (11/9)-$\epsilon$ for some fixed $\epsilon$ > 0. These are strictly better respectively than two prominent algorithms -- the First-Fit (FF) which is on-line with r(FF) = 17/10, and the First-Fit-Decreasing (FFD) with r(FFD) = 11/9. Furthermore, it is shown that any on-line algorithm S must have r(S) $\geq$ 3/2. We also discuss the question "how well can an O(n)-time algorithm perform?", showing that, in the generalized d-dimensional bin-packing, any O(n)-time algorithm S must have r(S) $\geq$ d. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/662/CS-TR-78-662.pdf %R CS-TR-78-665 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T SCALD: Structured Computer-Aided Logic Design %A McWilliams, Thomas M. %A Widdoes, Lawrence C., Jr. %D March 1978 %X SCALD, a graphics-based hierarchical digital logic design system, is described and an example of its use is given. SCALD provides a total computer-aided design environment which inputs a high-level description of a digital system, and produces output for computer-aided manufacture of the system. SCALD has been used in the design of an operational, 15-MIPS, 5500-chip ECL-10k processor. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/665/CS-TR-78-665.pdf %R CS-TR-78-666 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The SCALD physical design subsystem %A McWilliams, Thomas M. %A Widdoes, Lawrence C., Jr. %D March 1978 %X The SCALD physical design subsystem is described. SCALD supports the automatic construction of ECL-10k logic on wire wrap cards from the output of a hierarhical design system. Results of its use in the design of an operational 15-MIPS 5500-chip processor are presented and discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/666/CS-TR-78-666.pdf %R CS-TR-78-668 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T BAOBAB, a parser for a rule-based system using a semantic grammar %A Bonnet, Alain %D September 1978 %X Until a recent knowledge-based system is able to learn by itself, it must acquire new knowledge and new heuristics from human experts. This is traditionally done with the aid of a computer programmer acting as intermediary. The direct transfer of knowledge from an expert to the system requires a natural-language processor capable of handling a substantial subset of English. The development of such a natural-language processor is a long-term goal of automating knowledge acquisition; facilitating the interface between the expert and the system is a first step toward this goal. This paper descrtbes BAOBAB, a program designed and implemented for MYCIN (Shortliffe 1974), a medical consultation system for infectious disease diagnosis and therapy selection. BAOBAB is concerned with the problem of parsing - recognizing natural language sentences and encoding them into MYClN's internal representation. For this purpose, it uses a semantic grammar in which the non-terminal symbols denote semantic categories (e.g., infections and symptoms), or conceptual categorles whlch are common tools of knowledge representation in artificial intelligence (e.g., attributes, objects, values and predicate functions). This differs from a syntactic grammar in which non-terminal symbols are syntactic elements such as nouns or verbs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/668/CS-TR-78-668.pdf %R CS-TR-78-670 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Information bounds are weak in the shortest distance problem %A Graham, Ronald L. %A Yao, Andrew C. %A Yao, F. Frances %D September 1978 %X In the all-pair shortest distance problem, one computes the matrix D = ($d_{ij}$) where $d_{ij}$ is the minimum weighted length of any path from vertex i to vertex j in a directed complete graph with a weight on each edge. In all the known algorithms, a shortest path $p_{ij}$ achieving $d_{ij}$ is also implicitly computed. In fact, $\log_3$ f(n) is an information-theoretic lower bound where f(n) is the total number of distinct patterns ($p_{ij}$) for n-vertex graphs. As f(n) potentially can be as large as $2^{n^3}$, it is hopeful that a non-trivial lower bound can be derived this way in the decision tree model. We study the characterization and enumeration of realizable patterns, and show that f(n) $\leq C^{n^2}$. Thus no lower bound greater than C$n^2$ can be derived from this approach. We prove as a corollary that the Triangular polyhedron $T^{(n)}$, defined in $E^{(n\choose 2)}$ by $d_{ij} \geq 0$ and the triangle inequalities $d_{ij} + d_{jk} \geq d_{ik}$, has at most $C^{n^2}$ faces of all dimensions, thus resolving an open question in a similar information bound approach to the shortest distance problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/670/CS-TR-78-670.pdf %R CS-TR-78-673 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A numerical library and its support %A Chan, Tony F. %A Coughran, William M., Jr. %A Grosse, Eric H. %A Heath, Michael T. %D November 1978 %X Reflecting on four years of numerical consulting at the Stanford Linear Accelerator Center, we point out solved and outstanding problems in selecting and installing mathematical software, helping users, maintaining the library and monitoring its use, and managing the consulting operation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/673/CS-TR-78-673.pdf %R CS-TR-78-674 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finite element approximation and iterative solution of a class of mildly non-linear elliptic equations %A Chan, Tony F. %A Glowinski, Roland %D November 1978 %X We describe in this report the numerical analysis of a particular class of nonlinear Dirichlet problems. We consider an equivalent variational inequality formulation on which the problems of existence, uniqueness and approximation are easier to discuss. We prove in particular the convergence of an approximation by piecewise linear finite elements. Finally, we describe and compare several iterative methods for solving the approximate problems and particularly some new algorithms of augmented lagrangian type, which contain as special case some well-known alternating direction methods. Numerical results are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/674/CS-TR-78-674.pdf %R CS-TR-78-678 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reasoning about recursively defined data structures %A Oppen, Derek C. %D July 1978 %X A decision algorithm is given for the quantifier-free theory of recursively defined data structures which, for a conjunction of length n, decides its satisfiability in time linear in n. The first-order theory of recursively defined data structures, in particular the first-order theory of LISP list structure (the theory of CONS, CAR and CDR), is shown to be decidable but not elementary recursive. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/678/CS-TR-78-678.pdf %R CS-TR-78-679 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Steplength algorithms for minimizing a class of nondifferentiable functions %A Murray, Walter %A Overton, Michael L. %D November 1978 %X Four steplength algorithms are presented for minimizing a class of nondifferentiable functions which includes functions arising from $\ell_1$ and $\ell_\infty$ approximation problems and penalty functions arising from constrained optimization problems. Two algorithms are given for the case when derivatives are available wherever they exist and two for the case when they are not available. We take the view that although a simple steplength algorithm may be all that is required to meet convergence criteria for the overall algorithm, from the point of view of efficiency it is important that the step achieve as large a reduction in the function value as possible, given a certain limit on the effort to be expended. The algorithms include the facility for varying this limit, producing anything from an algorithm requiring a single function evaluation to one doing an exact linear search. They are based on univariate minimization algorithms which we present first. These are normally at least quadratically convergent when derivatives are used and superlinearly convergent otherwise, regardless of whether or not the function is differentiable at the minimum. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/679/CS-TR-78-679.pdf %R CS-TR-78-680 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography of Stanford Computer Science reports, 1963-1978 %A Stanley, Connie J. %D November 1978 %X This report lists, in chronological order, all reports published by the Stanford Computer Science Department since 1963. Each report is identified by Computer Science number, author's name, title, National Technical Information Service (NTIS) retrieval number, date, and number of pages. Complete listings of Theses, Artificial Intelligence Memos, and Heuristic Programming Reports are given in the Appendix. Also, for the first time, each report has been marked as to its availability for ordering and the cost if applicable. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/680/CS-TR-78-680.pdf %R CS-TR-78-683 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Storing a sparse table %A Tarjan, Robert Endre %D December 1978 %X The problem of storing and searching large sparse tables arises in compiling and in other areas of computer science. The standard technique for storing such tables is hashing, but hashing has poor worst-case performance. We consider good worst-case methods for storing a table of n entries, each an integer between 0 and N-1. For dynamic tables, in which look-ups and table additions are intermixed, the use of a trie requires O(kn) storage and allows O($\log_k$(N/n)) worst-case access time, where k is an arbitrary parameter. For static tables, in which the entire table is constructed before any look-ups are made, we propose a method which requires O(n $log^{(\ell)}$ n) storage and allows O($\ell \log_n N$) access time, where $\ell$ is an arbitrary parameter. Choosing $\ell$ = $\log^* n$ gives a method with O(n) storage and O(($\log^* n$)($\log_n N$)) access time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/683/CS-TR-78-683.pdf %R CS-TR-78-684 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The matrix inverse eigenvalue problem for periodic Jacobi matrices %A Boley, Daniel L. %A Golub, Gene H. %D December 1978 %X A stable numerical algorithm is presented for generating a periodic Jacobi matrix from two sets of eigenvalues and the product of the off-diagonal elements of the matrix. The algorithm requires a simple generalization of the Lanczos algorithm. It is shown that the matrix is not unique, but the algorithm will generate all possible solutions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/684/CS-TR-78-684.pdf %R CS-TR-78-687 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Prolegomena to a theory of formal reasoning %A Weyhrauch, Richard W. %D December 1978 %X This paper is an introduction to the mechanization of a theory of reasoning. Currently formal systems are out of favor with the AI community. The aim of this paper is to explain how formal systems can be used in AI by explaining how traditional ideas of logic can be mechanized in a practical way. The paper presents several new ideas. Each of these is illustrated by giving simple examples of how this idea is mechanized in the reasoning system FOL. That is, this is not just theory but there is an existing running implementation of these ideas. In this paper: 1) we show how to mechanize the notion of model using the idea of a simulation structure and explain why this is particularly important to AI, 2) we show how to mechanize the notion of satisfaction, 3) we present a very general evaluator for first order expressions, which subsumes PROLOG and we propose as a natural way of thinking about logic programming, 4) we show how to formalize metatheory, 5) we describe reflection principles, which connect theories to their metatheories in a way new to AI, 6) we show how these ideas can be used to dynamically extend the strength of FOL by "implementing" subsidiary deduction rules, and how this in turn can be extended to provide a method of describing and proving theorems about heuristics for using these rules, 7) we discuss one notion of what it could mean for a computer to learn and give an example, 8) we describe a new kind of formal system that has the property that it can reason about its own properties, 9) we give examples of all of the above. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/687/CS-TR-78-687.pdf %R CS-TR-78-689 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An $n^{log n}$ algorithm for the two-variable-per-constraint linear programming satisfiability problem %A Nelson, Charles Gregory %D November 1978 %X A simple algorithm is described which determines the satisfiability over the reals of a conjunction of linear inequalities, none of which contains more than two variables. In the worst case the algorithm requires time O(${mn}^{\lceil \log^2 n \rceil + 3}$), where n is the number of variables and m the number of inequalities. Several considerations suggest that the algorithm may be useful in practice: it is simple to implement, it is fast for some important special cases, and if the inequalities are satisfiable it provides valuable information about their so1ution set. The algorithm is particularly suited to applications in mechanical program verification. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/689/CS-TR-78-689.pdf %R CS-TR-78-690 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A deductive approach to program synthesis %A Manna, Z ohar %A Waldinger, Richard J. %D November 1978 %X Program synthesis is the systematic derivation of a program from a given specification. A deductive approach to program synthesis is presented for the construction of recursive programs. This approach regards program synthesis as a theorem-proving task and relies on a theorem-proving method that combines the features of transformation rules, unification, and mathematical induction within a single framework. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/690/CS-TR-78-690.pdf %R CS-TR-78-693 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A class of solutions to the gossip problem %A West, Douglas B. %D November 1978 %X We characterize and count optimal solutions to the gossip problem in which no one hears his own information. That is, we consider graphs with n vertices where the edges have a linear ordering such that an increasing path exists from each vertex to every other, but there is no increasing path from any vertex to itself. Such graphs exist only when n is even, in which case the fewest number of edges is 2n-4, as in the original gossip problem. We characterize optimal solutions of this sort (NOHO-graphs) using a correspondence with a set of permutations and binary sequences. This correspondence enables us to count these solutions and several subclasses of solutions. The numbers of solutions in each class are simple powers of 2 and 3, with exponents determined by n. We also show constructively that NOHO-graphs are planar and Hamiltonian, and we mention applications to related problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/693/CS-TR-78-693.pdf %R CS-TR-78-694 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computer science at Stanford, 1977-1978 %A King, Jonathan J. %D November 1978 %X This is a review of research and teaching in the Stanford Computer Science Department during the 1977-1978 academic year. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/694/CS-TR-78-694.pdf %R CS-TR-78-699 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T SACON: a knowledge-based consultant for structural analysis %A Bennett, James %A Creary, Lewis %A Engelmore, Robert S. %A Melosh, Robert %D September 1978 %X In this report we describe an application of artificial intelligence (AI) methods to structural analysis. We describe the development and (partial) implementation of an "automated consultant" to advise non-expert engineers in the use of a general-purpose structural analysis program. The analysis program numerically simulates the behavior of a physical structure subjected to various mechanical loading conditions. The automated consultant, called SACON (Structural Analysis CONsultant), is based on a version of the MYCIN program [Shortliffe, 1974], originally developed to advise physicians in the diagnosis and treatment of infectious diseases. The domain-specific knowledge in MYCIN is represented as situation-action rules, and is kept independent of the "inference engine" that uses the rules. By substituting structural engineering knowledge for the medical knowledge, the program was converted easily from the domain of infectious diseases to the domain of structural analysis. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/699/CS-TR-78-699.pdf %R CS-TR-78-702 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An O($n\cdot I \log^2 I$) maximum-flow algorithm %A Shiloach, Yossi %D December 1978 %X We present in this paper a new algorithm to find a maximum flow in a flow-network which has n vertices and m edges in time of O($n\cdot I \log^2 I$), where I = M+n is the input size (up to a constant factor). This result improves the previous upper bound of Z . Galil [1978] which was O($I^{7/3}$) in the worst case. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/702/CS-TR-78-702.pdf %R CS-TR-78-663 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Software restyling in graphics and programming languages %A Grosse, Eric H. %D September 1978 %X The value of large software products can be cheaply increased by adding restyled interfaces that attract new users. As examples of this approach, a set of graphics primitives and a language precompiler for scientific computation are described. These two systems include a general user-defined coordinate system instead of numerous system settings, indention to specify block structure, a modified indexing convention for array parameters, a syntax for n-and-a-half-times-'round loops, and engineering format for real constants; most of all, they strive to be as small as possible. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/663/CS-TR-78-663.pdf %R CS-TR-78-697 %Z Thu, 22 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the linear least squares problem with a quadratic constraint %A Gander, Walter %D November 1978 %X In this paper we present the theory and practical computational aspects of the linear least squares problem with a quadratic constraint. New theorems characterizing properties of the solutions are given and extended for the problem of minimizing a general quadratic function subject to a quadratic constraint. For two important regularization methods we formulate dual equations which proved to be very useful for the applications of smoothing of datas. The resulting algorithm is a numerically stable version of an algorithm proposed by Rutishauser. We show also how to choose a third order iteration method to solve the secular equations. However we are still far away from a foolproof machine independent algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/78/697/CS-TR-78-697.pdf %R CS-TR-79-703 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A polynomial time algorithm for solving systems of linear inequalities with two variables per inequality %A Aspvall, Bengt %A Shiloach, Yossi %D January 1979 %X We present a constructive algorithm for solving systems of linear inequalities (LI) with at most two variables per inequality. The algorithm is polynomial in the size of the input. The LI problem is of importance in complexity theory since it is polynomial time equivalent to linear programming. The subclass of LI treated in this paper is also of practical interest in mechanical verification systems, and we believe that the ideas presented can be extended to the general LI problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/703/CS-TR-79-703.pdf %R CS-TR-79-704 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A survey of the state of software for partial differential equations %A Sweet, Roland A. %D January 1979 %X This paper surveys the state of general purpose software for the solution of partial differential equations. A discussion of the purported capabilities of twenty-one programs is presented. No testing of the routines was performed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/704/CS-TR-79-704.pdf %R CS-TR-79-706 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Graph 2-isomorphism is NP-complete %A Yao, F. Francis %D January 1979 %X Two graphs G and G' are said to be k-isomorphic if their edge sets can be partitioned into E(G) = $E_1 \cup E_2 \cup ... \cup E_k$ and E(G') = ${E'}_1 \cup {E'}_2 \cup ... \cup {E'}_k$ such that as graphs, $E_i$ amd ${E'}_i$ are isomorphic for $1 \leq i \leq k$. In this note we show that it is NP-complete to decide whether two graphs are 2-isomorphic. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/706/CS-TR-79-706.pdf %R CS-TR-79-707 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem-solving seminar %A Van Wyk, Christopher J. %A Knuth, Donald E. %D January 1979 %X This report contains edited transcripts of the discussions held in Stanford's course CS 204, Problem Seminar, during autumn quarter 1978. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms came up during the discussions, these notes may be of interest to graduate students of computer science at other universities, as well as to their professors and to professional people in the "real world." %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/707/CS-TR-79-707.pdf %R CS-TR-79-708 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An analysis of a memory allocation scheme for implementing stacks %A Yao, Andrew C. %D January 1979 %X Consider the implementation of two stacks by letting them grow towards each other in a table of size m . Suppose a random sequence of insertions and deletions are executed, with each instruction having a fixed probability p (0 < p < 1/2) to be a deletion. Let $A_p (m) denote the expected value of max{x,y}, where x and y are the stack heights when the table first becomes full. We shall prove that, as $m \rightarrow \infty$, $A_p (m) = \sqrt{m/(2 \pi (1-2p))} + O((log m)/ \sqrt{m})$. This gives a solution to an open problem in Knuth ["The Art of Computer Programming, Vol. 1, Exercise 2.2.2-13]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/708/CS-TR-79-708.pdf %R CS-TR-79-710 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical computation of the Schwarz-Christoffel transformation %A Trefethen, Lloyd N. %D March 1979 %X A program is described which computes Schwarz-Christoffel transformations that map the unit disk conformally onto the interior of a bounded or unbouded polygon in the complex plane. The inverse map is also computed. The computational problem is approached by setting up a nonlinear system of equations whose unknowns are essentially the "accessory parameters" $z_k$. This system is then solved with a packaged subroutine. New features of this work include the evaluation of integrals within the disk rather than along the boundary, making possible the treatment of unbounded polygons; the use of a compound form of Gauss-Jacobi quadrature to evaluate the Schwarz-Christoffel integral, making possible high accuracy at reasonable cost; and the elimination of constraints in the nonlinear system by a simple change of variables. Schwarz-Christoffel transformations may be applied to solve the Laplace and Poisson equations and related problems in two-dimensional domains with irregular or unbounded (but not curved or multiply connected) geometries. Computational examples are presented. The time required to solve the mapping problem is roughly proportional to $N^3$, where N is the number of vertices of the polygon. A typical set of computations to 8-place accuracy with $N \leq 10$ takes 1 to 10 seconds on an IBM 370/168. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/710/CS-TR-79-710.pdf %R CS-TR-79-712 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The errata of computer programming %A Knuth, Donald E. %D January 1979 %X This report lists all corrections and changes of Volumes 1 and 3 of "The Art of Computer Programming," as of January 5, 1979. This updates the previous list in report CS551, May 1976. The second edition of Volume 2 has been delayed two years due to the fact that it was completely revised and put into the TEX typesetting language; since publication of this new edition is not far off, no changes to Volume 2 are listed here. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/712/CS-TR-79-712.pdf %R CS-TR-79-714 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T PCFORT: a Fortran-to-Pcode translator %A Castaneda, Fernando %A Chow, Frederick C. %A Nye, Peter %A Sleator, Daniel D. %A Wiederhold, Gio %D January 1979 %X PCFORT is a compiler for the FORTRAN language designed to fit as a building block into a PASCAL oriented environment. It forms part of the programming systems being developed for the S-1 multiprocessor. It is written in PASCAL, and generates P-code, an intermediate language used by transportable PASCAL compilers to represent the program in a simple form. P-code is either compiled or interpreter depending upon the objectives of the programming system. A PASCAL written FORTRAN compiler provides a bridge between the FORTRAN and PASCAL communities. The implementation allows PASCAL and FORTRAN generated code to be combined into one program. The FORTRAN language supported here is FORTRAN to the full 1966 standard, extended with those features commonly expected by available large scientific programs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/714/CS-TR-79-714.pdf %R CS-TR-79-715 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T S-1 architecture manual %A Hailpern, Brent T. %A Hitson, Bruce L. %D January 1979 %X This manual provides a complete description of the instruction-set architecture of the S-1 Uniprocessor (Mark IIA), exclusive of vector operations. It is assumed that the reader has a general knowledge of computer architecture. The manual was designed to be both a detailed introduction to the S-1 and an architecture reference manual. Also included are user manuals for the FASM Assembler and the S-1 Formal Description Syntax. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/715/CS-TR-79-715.pdf %R CS-TR-79-716 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A framework for control in production systems %A Georgeff, Michael P. %D January 1979 %X A formal model for representing control in production systems is defined. The formalism allows control to be directly specified independently of the conflict resolution scheme, and thus allows the issues of control and nondeterminism to be treated separately. Unlike previous approaches, it allows control to be examined within a uniform and consistent framework. It is shown that the formalism provides a basis for implementing control constructs which, unlike existing schemes, retain all the properties desired of a knowledge based system --- modularity, flexibility, extensibility and explanatory capacity. Most importantly, it is shown that these properties are not a function of the lack of control constraints, but of the type of information allowed to establish these constraints. Within the formalism it is also possible to provide a meaningful notion of the power of control constructs. This enables the types of control required in production systems to be examined and the capacity of various schemes to meet these requirements to be determined. Schemes for improving system efficiency and resolving nondeterminism are examined, and devices for representing such meta-level knowledge are described. In particular, the objectification of control information is shown to provide a better paradigm for problem solving and for talking about problem solving. It is also shown that the notion of control provides a basis for a theory of transformation of production systems, and that this provides a uniform and consistent approach to problems involving subgoal protection. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/716/CS-TR-79-716.pdf %R CS-TR-79-718 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T AL users' manual %A Mujtaba, Mohamed Shahid %A Goldman, Ron %D January 1979 %X This document describes the current state of the AL system now in operation at the Stanford Artificial Intelligence Laboratory, and teaches the reader how to use it. The system consists of AL, a high-level programming language for manipulator control useful in industrial assembly research; POINTY, an interactive system for specifying representation of parts; and ALAID, an interactive debugger for AL. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/718/CS-TR-79-718.pdf %R CS-TR-79-719 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Extrapolation of asymptotic expansions by a modified Aitken $\delta^2$-formula %A Bjorstad, Petter %A Dahlquist, Germund %A Grosse, Eric H. %D March 1979 %X A modified Aitken formula permits iterated extrapolations to efficiently estimate $s_\infty$ from $s_n$ when an asymptotic expansion $s_n = s_\infty + n^{-k} (c_0 + c_1 n^{-1} + c_2 n^{-2} + ... )$ holds for some (unknown) coefficients $c_j$. We study the truncation and irregular error and compare the method with other forms of extrapolation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/719/CS-TR-79-719.pdf %R CS-TR-79-720 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On grid optimization for boundary value problems %A Glowinski, Roland %D February 1979 %X We discuss in this report the numerical procedures which can be used to obtain the optimal grid when solving by a finite element method a model boundary value problem of elliptic type modelling the potential flow of an incompressible inviscid fluid. Results of numerical experiments are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/720/CS-TR-79-720.pdf %R CS-TR-79-721 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On fault-tolerant networks for sorting %A Yao, Andrew C. %A Yao, F. Frances %D February 1979 %X The study of constructing reliable systems from unreliable components goes back to the work of von Neumann, and of Moore and Shannon. The present paper studies the use of redundancy to enhance reliability for sorting and related networks built from unreliable comparators. Two models of fault-tolerant networks are discussed. The first model patterns after the concept of error-correcting codes in information theory, and the other follows the stochastic criterion used by von Neumann and Moore-Shannon. It is shown, for example, that an additional k(2n-3) comparators are sufficient to render a sorting network reliable, provided that no more than k of its comparators may be faulty. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/721/CS-TR-79-721.pdf %R CS-TR-79-722 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A structural model for database systems %A Wiederhold, Gio %A El-Masri, Ramez A. %D February 1979 %X This report presents a model to be used for database design. Because our motivation extends to providing guidance for the structured implementation of a database, we call our model the 'Structural Model.' We derive the design using criteria of correctness, relevance, and performance from semantic and operational specifications obtained from multiple sources. These sources typically correspond to prospective users or user groups of the database. The integration of such specifications is a central issue in the development of an integrated structural database model. The structural model is used for the design of the logical structures that represent a real-world situation. However, it is not meant to represent all possible real-world semantics, but a subset of the semantics which are important in database modelling. The model uses relations as building blocks, and hence can be considered as an extension of Codd's relational model [1970]. The main extensions to the relational model are the explicit representation of logical connections between relations, the inclusion of insertion-deletion constraints in the model itself, and the separation of relations into several structural types. Connections between relations are used to represent existence dependencies of tuples in different relations. These existence dependencies are important for the definition of semantics of relationships between classes of real-world entities. The connections between relations are used to specify these existence dependencies, and to ensure that they remain valid when the database is updated. Hence, connections implicitly define a basic, limited set of integrity constraints on the database, those that identify and maintain existence dependencies among tuples from different relations. Consequently, the rules for the maintenance of the structural integrity of the model under insertion and deletion of tuples are easy to specify. Structural relation types are used to specify how each relation may be connected to other relations in the model. Relations are classified into five types: primary relations, referenced relations, nest relations, association relations, and lexicon relations. The motivation behind the choice of these relation types is discussed, as is their use in data model design. A methodology for combining multiple, overlapping data models - also called user views in the literature - is associated with the structural model. The database model, or conceptual schema, which represents the integrated database, may thus be derived from the individual data models of the users. We believe that the structural model can be used to represent the data relationships within the conceptual schema of the ANSI/SPARC DBMS model since it can support database submodels, also called external schema, and maintain the integrity of the submodels with respect to the integrity constraints expressable in the structural model. We then briefly discuss the use of the structural model in database design and implementation. The structural model provides a tool to deal effectively with the complexityu of large, real-world databases. We begin this report with a very short review of existing database models. In Chapter 2, we state the purpose of the model, and in Chapter 3 we describe the structural model, first informally and then using a formal framework based on extensions of the relational model. Chapter 4 defines the representations we use, and Chapter 5 covers the integration of data models that represent the different user specifications into an integrated database model. Formal descriptions and examples of the prevalent cases are given. The work is then placed into context first relative to other work (Chapter 6) and then briefly within our methodology for database design (Chapter 7). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/722/CS-TR-79-722.pdf %R CS-TR-79-726 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An analysis of (h,k,l)-shellsort %A Yao, Andrew Chi-Chih %D March 1979 %X One classical sorting algorithm, whose performance in many cases remains unanalyzed, is Shellsort. Let $\vec{h} be a t-component vector of positive integers. An $\vec{h}$-Shellsort will sort any given n elements in t passes, by means of comparisons and exchanges of elements. Let $S_j$($\vec{h}$;n) denote the average number of element exchanges in the j-th pass, assuming that all the n! initial orderings are equally likely. In this paper we derive asymptotic formulas of $S_j$($\vec{h}$;n) for any fixed $\vec{h}$ = (h,k,l), making use of a new combinatorial interpretation of $S_3$. For the special case $\vec{h}$ = (3,2,1), the analysis if further sharpened to yield exact expressions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/726/CS-TR-79-726.pdf %R CS-TR-79-728 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Union-member algorithms for non-disjoint sets %A Shiloach, Yossi %D January 1979 %X In this paper we deal with the following problem. We are given a finite set U = {$u_1$,...,$u_n$} and a set [cursive capital 'S'] = {$S_1$,...,$S_m$} of subsets of U. We are also given m-1 UNION instructions that have the form UNION($S_i$,$S_j$) and mean "add the set $S_i \cup S_j$ to the collection and delete $S_i$ and $S_j$." Interspaced among the UNIONs are MEMBER(i,j) questions that mean "does $u_i$ belong to $S_j$?" We present two algorithms that exhibit the trade-off among the three interesting parameters of this problem, which are: 1. Time required to answer one membership question. 2. Time required to perform the m-1 UNIONs altogether. 3. Space. We also give an application of these algorithms to the problem of 5-coloring of planar graphs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/728/CS-TR-79-728.pdf %R CS-TR-79-729 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A unified approach to path problems %A Tarjan, Robert Endre %D April 1979 %X We describe a general method for solving path problems on directed graphs. Such path problems include finding shortest paths, solving sparse systems of linear equations, and carrying out global flow analysis of computer programs. Our method consists of two steps. First, we construct a collection of regular expressions representing sets of paths in the graph. This can be done by using any standard algorithm, such as Gaussian or Gauss-Jordan elimination. Next, we apply a natural mapping from regular expressions into the given problem domain. We exhibit the mappings required to find shortest paths, solve sparse systems of linear equations, and carry out global flow analysis. Our results provide a general-purpose algorithm for solving any path problem, and show that the problem of constructing path expressions is in some sense the most general path problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/729/CS-TR-79-729.pdf %R CS-TR-79-730 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Qualifying examinations in computer science, 1965-1978 %A Liang, Frank M. %D April 1979 %X Since 1965, the Stanford Computer Science Department has periodically given "qualifying examinations" as one of the requirements of its graduate program. These examinations are given in each of six subareas of computer science: Programming Languages and Systems, Artificial Intelligence, Numerical Analysis, Computer Design, Theory of Computation, and Analysis of Algorithms. This report presents the questions from these examinations, and also the associated reading lists. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/730/CS-TR-79-730.pdf %R CS-TR-79-731 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford Pascal Verifier user manual %A Luckham, David C. %A German, Steven M. %A von Henke, Friedrich W. %A Karp, Richard A. %A Milne, P. W. %A Oppen, Derek C. %A Polak, Wolfgang %A Scherlis, William L. %D March 1979 %X The Stanford PASCAL verifier is an interactive program verification system. It automates much of the work necessary to analyze a program for consistency with its documentation, and to give a rigorous mathematical proof of such consistency or to pin-point areas of inconsistency. It has been shown to have applications as an aid to programming, and to have potential for development as a new and useful tool in the production of reliable software. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/731/CS-TR-79-731.pdf %R CS-TR-79-732 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Notes on introductory combinatorics %A Woods, Donald R. %D April 1979 %X In the spring of 1978, Professors George Polya and Robert Tarjan teamed up to teach CS 150 - Introduction to Combinatorics. This report consists primarily of the class notes and other handouts produced by the author as teaching assistant for the course. Among the topics covered are elementary subjects such as combinations and permutations, mathematical tools such as generating functions and Polya's Theory of Counting, and analyses of specific problems such as Ramsey Theory, matchings, and Hamiltonian and Eulerian paths. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/732/CS-TR-79-732.pdf %R CS-TR-79-733 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A lower bound to finding convex hulls %A Yao, Andrew Chi-Chih %D April 1979 %X Given a set S of n distinct points {($x_i$,$y_i$) | 0 $\leq$ i < n}, the convex hull problem is to determine the vertices of the convex hull H(S). All the known algorithms for solving this problem have a worst-case running time of cn log n or higher, and employ only quadratic tests, i.e., tests of the form f($x_0$, $y_0$, $x_1$, $y_1$,...,$x_{n-1}$, $y_{n-1}$): 0 with f being any polynomial of degree not exceeding 2. In this paper, we show that any algorithm in the quadratic decision-tree model must make cn log n tests for some input. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/733/CS-TR-79-733.pdf %R CS-TR-79-734 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast algorithms for solving path problems %A Tarjan, Robert Endre %D April 1979 %X Let G = (V,E) be a directed graph with a distinguished source vertex s. The single-source path expression problem is to find, for each vertex v, a regular expression P(s,v) which represents the set of all paths in G from s to v. A solution to this problem can be used to solve shortest path problems, solve sparse systems of linear equations, and carry out global flow analysis. We describe a method to compute path expressions by dividing G into components, computing path expressions on the components by Gaussian elimination, and combining the solutions. This method requires O(m $\alpha$(m,n)) time on a reducible flow graph, where n is the number of vertices in G, m is the number of edges in G, and $\alpha$ is a functional inverse of Ackermann's function. The method makes use of an algorithm for evaluating functions defined on paths in trees. A simplified version of the algorithm, which runs in O(m log n) time on reducible flow graphs, is quite easy to implement and efficient in practice. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/734/CS-TR-79-734.pdf %R CS-TR-79-735 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Kronecker's canonical form and the QZ algorithm %A Wilkinson, James Hardy %D April 1979 %X In the QZ algorithm the eigenvalues of Ax = $\lambda$Bx are computed via a reduction to the form $\tilde{A}$x = $\lambda \tilde{B}$x where $\tilde{A}$ and $\tilde{B}$ are upper triangular. The eigenvalues are given by ${\lambda}_i$ = $a_{ii}$/$b_{ii}$. It is shown that when the pencil $\tilde{A}$ - $\lambda \tilde{B}$ is singular or nearly singular a value of ${\lambda}_i$ may have no significance even when $\tilde{a}_{ii}$ and $\tilde{b}_{ii}$ are of full size. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/735/CS-TR-79-735.pdf %R CS-TR-79-736 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Note on the practical significance of the Drazin inverse %A Wilkinson, James Hardy %D April 1979 %X The solution of the differential system Bx = Ax + f where A and B are n x n matrices and A - $\lambda$B is not a singular pencil may be expressed in terms of the Drazin inverse. It is shown that there is a simple reduced form for the pencil A - $\lambda$B which is adequate for the determination of the general solution and that although the Drazin inverse could be determined efficiently from this reduced form it is inadvisable to do so. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/736/CS-TR-79-736.pdf %R CS-TR-79-737 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the average-case complexity of selecting the k-th best %A Yao, Andrew C. %A Yao, F. Frances %D April 1979 %X Let ${\bar{V}}_k$(n) be the minimum average number of pairwise comparisons needed to find the k-th largest of n numbers (k $\leq$ 2), assuming that all n! orderings are equally likely. D. W. Matula proved that, for some absolute constant c, ${\bar{V}}_k$(n)-n $\leq$ c k log log n as n $\rightarrow \infty$. In the present paper, we show that there exists an absolute constant c' > 0 such that ${\bar{V}}_k$(n)-n $\leq$ c' k log log n as n $\rightarrow \infty$, proving a conjecture of Matula. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/737/CS-TR-79-737.pdf %R CS-TR-79-738 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computations related to G-stability of linear multistep methods %A LeVeque, Randall J. %A Dahlquist, Germund %A Andree, Dan %D May 1979 %X In Dahlquist's recent proof of the equivalence of A-stability and G-stability, an algorithm was presented for calculating a G-stability matrix for any A-stable linear multistep method. Such matrices, and various quantities computable from them, are useful in many aspects of the study of the stability of a given method. For example, information may be gained as to the shape of the stability region, or the rate of growth of unstable solutions. We present a summary of the relevant theory and the results of some numerical calculations performed for several backward differentiation, Adams-Bashforth, and Adams-Moulton methods of low order. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/738/CS-TR-79-738.pdf %R CS-TR-79-739 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Induction over large data bases %A Quinlan, J. R. %D May 1979 %X Techniques for discovering rules by induction from large collections of instances are developed. These are based on an iterative scheme for dividing the instances into two sets, only one of which needs to be randomly accessible. These techniques have made it possible to discover complex rules from data bases containing many thousands of instances. Results of several experiments using them are reported. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/739/CS-TR-79-739.pdf %R CS-TR-79-740 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The logic of aliasing %A Cartwright, Robert %A Oppen, Derek C. %D September 1979 %X We give a new version of Hoare's logic which correctly handles programs with aliased variables. The central proof rules of the logic (procedure call and assignment) are proved sound and complete. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/740/CS-TR-79-740.pdf %R CS-TR-79-748 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast algorithms for solving Toeplitz systems of equations and finding rational Hermite interpolants %A Yun, David Y. Y. %D July 1979 %X We present a new algorithm that reduces the computation for solving a Toeplitz system to O(n ${log}^2$ n) and automatically resolves all degenerate cases of the past. Our fundamental results show that all rational Hermite interpolants, including Pade approximants which is intimately related to this solution process, can be computed fast by an Euclidean algorithm. In this report we bring out all these relationships with mathematical justifications and mention important applications including decoding BCH codes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/748/CS-TR-79-748.pdf %R CS-TR-79-753 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Should tables by sorted? %A Yao, Andrew Chi-Chih %D July 1979 %X We examine optimality questions in the following information retrieval problem: Given a set S of n keys, store them so that queries of the form "Is x $\in$ S?" can be answered quickly. It is shown that, in a rather general model including al1 the commonly-used schemes, $\lceil$ lg(n+l) $\rceil$ probes to the table are needed in the worst case, provided the key space is sufficiently large. The effects of smaller key space and arbitrary encoding are also explored. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/753/CS-TR-79-753.pdf %R CS-TR-79-759 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Schema-shift strategies to understanding structured texts in natural language %A Bonnet, Alain %D August 1979 %X This report presents BAOBAB-2, a computer program built upon MYCIN [Shortliffe, 1974] that is used for understanding medical summaries describing the status of patients. Due both to the conventlonal way physicians present medical problems in these summaries and the constrained nature of medical jargon, these texts have a very strong structure. BAOBAB-2 takes advantage of this structure by using a model of this organization as a set of related schemas that facilitate the interpretatlon of these texts. Structures of the schemas and their relatlon to the surface structure are described. Issues relating to selection and use of these schemas by the program during interpretation of the summaries are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/759/CS-TR-79-759.pdf %R CS-TR-79-760 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some monotonicity properties of partial orders %A Graham, Ronald L. %A Yao, Andrew C. %A Yao, F. Frances %D September 1979 %X A fundamental quantity which arises in the sorting of n numbers $a_1$, $a_2$,..., $a_n$ is Pr($a_i$ < $a_j$ | P), the probability that $a_i$ < $a_j$ assuming that all linear extensions of the partial order P are equally likely. In this paper we establish various properties of Pr($a_i$ < $a_j$ | P) and related quantities. In particular, it is shown that Pr($a_i$ < $b_j$ | P') $\geq$ Pr($a_i$ < $b_j$ | P), if the partial order P consists of two disjoint linearly ordered sets A = {$a_1$ < $a_2$ < ... < $a_m$}, B = {$b_1$ < $b_2$ < ... < $b_n} and P' = P $\cup$ {any relations of the form $a_k$ < $b_l$}. These inequalities have applications in determining the complexity of certain sorting-like computations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/760/CS-TR-79-760.pdf %R CS-TR-79-761 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Gossiping without duplicate transmissions %A West, Douglas B. %D August 1979 %X n people have distinct bits of information, which they communicate via telephone calls in which they transmit everything they know. We require that no one ever hear the same piece of information twice. In the case 4 divides n, n $\geq$ 8, we provide a construction that transmits all information using only 9n/4-6 calls. Previous constructions used 1/2 n log n calls. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/761/CS-TR-79-761.pdf %R CS-TR-79-762 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T METAFONT: a system for alphabet design %A Knuth, Donald E. %D September 1979 %X This is the user's manual for METAFONT, a companion to the TEX tyesetting system. The system makes it fairly easy to define high quality fonts of type in a machine-independent manner; a user writes "programs" in a new language developed for this purpose. By varying parameters of a design, an unlimited number of typefaces can be obtained from a single set of programs. The manual also sketches the algorithms used by the system to draw the character shapes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/762/CS-TR-79-762.pdf %R CS-TR-79-763 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A symmetric chain decomposition of L(4,n) %A West, Douglas B. %D August 1979 %X L(m,n) is the set of integer m-tuples ($a_1$,...,$a_m$) with $0\leq a_1 \leq ...\leq a_m \leq n$, ordered by $\underline{a} \leq \underline{b}$ when $a_i\leq b_i$ for all i. R. Stanley conjectured that L(m,n) is a symmetric chain order for all (m,n). We verify this by construction for m = 4. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/763/CS-TR-79-763.pdf %R CS-TR-79-764 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the time-space tradeoff for sorting with linear queries %A Yao, Andrew Chi-Chih %D August 1979 %X Extending a result of Borodin, et al., we show that any branching program using linear queries " $\sum_{i} {\lambda}_i {x_i}: c$ " to sort n numbers $x_1$,$x_2$,...,$x_n$ must satisfy the time-space tradeoff relation TS = $\Omega (n_2)$. The same relation is also shown to be true for branching programs that use queries " min R = ? " where R is any subset of {$x_1$,$x_2$,...,$x_n$}. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/764/CS-TR-79-764.pdf %R CS-TR-79-765 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Relation between the complexity and the probability of large numbers %A Gacs, Peter %D September 1979 %X H(x), the negative logarithm of the apriori probability M(x), is Levin's variant of Kolmogorov's complexity of a natural number x. Let $\alpha (n)$ be the minimum complexity of a number larger than n, s(n) the logarithm of the apriori probability of obtaining a number larger than n . It was known that $s(n) \leq\ \alpha (n) \leq\ s(n) + H(\lceil s(n) \rceil )$. We show that the second estimate is in some sense sharp. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/765/CS-TR-79-765.pdf %R CS-TR-79-767 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On Stewart's singular value decomposition for partitioned orthogonal matrices %A Van Loan, Charles %D September 1979 %X A variant of the singular value decomposition for orthogonal matrices due to G. W. Stewart is discussed. It is shown to be useful in the analysis of (a) the total least squares problem, (b) the Golub-Klema-Stewart subset selection algorithm, and (c) the algebraic Riccati equation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/767/CS-TR-79-767.pdf %R CS-TR-79-770 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Pretty printing %A Oppen, Derek C. %D October 1979 %X An algorithm for pretty printing is given. For an input stream of length n and an output device with margin width m, the algorithm requires time O(n) and space O(m). The algorithrn is described in terms of two parallel processes; the first scans the input stream to determine the space required to print logical blocks of tokens; the second uses this information to decide where to break lines of text; the two processes communicate by means of a buffer of size O(m). The algorithm does not wait for the entire stream to be input, but begins printing as soon as it has received a linefull of input. The algorithm is easily implemented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/770/CS-TR-79-770.pdf %R CS-TR-79-773 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Updating formulae and a pairwise algorithm for computing sample variances %A Chan, Tony F. %A Golub, Gene H. %A LeVeque, Randall J. %D November 1979 %X A general formula is presented for computing the simple variance for a sample of size m + n given the means and variances for two subsamples of sizes m and n. This formula is used in the construction of a pairwise algorithm for computing the variance. Other applications are discussed as well, including the use of updating formulae in a parallel computing environnment. We present numerical results and rounding error analyses for several numerical schemes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf %R CS-TR-79-774 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Large scale geodetic least squares adjustment by dissection and orthogonal decomposition %A Golub, Gene H. %A Plemmons, Robert J. %D November 1979 %X Very large scale matrix problems currently arise in the context of accurately computing the coordinates of points on the surface of the earth. Here geodesists adjust the approximate values of these coordinates by computing least squares solutions to large sparse systems of equations which result from relating the coordinates to certain observations such as distances or angles between points. The purpose of this paper is to suggest an alternative to the formation and solution of the normal equations for these least squares adjustment problems. In particular, it is shown how a block-orthogonal decomposition method can be used in conjunction with a nested dissection scheme to produce an algorithm for solving such problems which combines efficient data management with numerical stability. As an indication of the magnitude that these least squares adjustment problems can sometimes attain, the forthcoming readjustment of the North American Datum in 1983 by the National Geodetic Survey is discussed. Here it becomes necessary to linearize and solve an overdetermined system of approximately 6,000,000 equations in 400,000 unknowns - a truly large-scale matrix problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/774/CS-TR-79-774.pdf %R CS-TR-79-775 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The analysis of sequential experiments with feedback to subjects %A Diaconis, Persi %A Graham, Ronald L. %D November 1979 %X A problem arising in taste testing, medical, and parapsychology experiments can be modeled as follows. A deck of n cards contains $c_i$ cards labeled i, $1 \leq i \leq r$. A subject guesses at the cards sequentially. After each guess the subject is told the card just guessed (or at least if the guess was correct or not). We determine the optimal and worst case strategies for subjects and the distribution of the number of correct guesses under these strategies. We show how to use skill scoring to evaluate such experiments in a way which (asymptotically) does not depend on the subject's strategy. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/775/CS-TR-79-775.pdf %R CS-TR-79-777 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On constant weight codes and harmonious graphs %A Graham, Ronald L. %A Sloane, Neil J. A. %D December 1979 %X Very recently a new method has been developed for finding lower bounds on the maximum number of codewords possible in a code of minimum distance d and length n. This method has led in turn to a number of interesting questions in graph theory and additive number theory. In this brief survey we summarize some of these developments. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/777/CS-TR-79-777.pdf %R CS-TR-79-778 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A hierarchical associative architecture for the parallel evaluation of relational algebraic database primitives %A Shaw, David Elliot %D October 1979 %X Algorithms are described and analyzed for the efficient evaluation of the primitive operators of a relational algebra on a proposed non-von Neumann machine based on a hierarchy of associative storage devices. This architecture permits an O(log n) decrease in time complexity over the best known evaluation methods on a conventional computer system, without the use of redundant storage, and using currently available and potentially competitive technology. In many eases of practical import, the proposed architecture may also permit a significant improvement (by a factor roughly proportional to the capacity of the primary associative storage device) over the performance of previously implemented or proposed database machine architectures based on associative secondary storage devices. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/778/CS-TR-79-778.pdf %R CS-TR-79-781 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Exploring the use of domain knowledge for query processing efficiency %A King, Jonathan J. %D December 1979 %X An approach to query optimization is described that draws on two sources of knowledge: real world constraints on the values for the application domain served by the database; and knowledge about the current structure of the database and the cost of available retrieval processes. Real world knowledge is embodied in rules that are much like semantic integrity rules. The approach, called "query rephrasing", is to generate semantic equivalents of user queries that cost less to process than the original queries. The operation of a prototype system based on this approach is discussed in the context of simple queries which restrict a single file. The need for heuristics to limit the generation of equivalent queries is also discussed, and a method using "constraint thresholds" derived from a model of the retrieval process is proposed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/781/CS-TR-79-781.pdf %R CS-TR-79-816 %Z Mon, 19 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automating the study of clinical hypotheses on a time-oriented data base: the RX Project %A Blum, Robert L. %D November 1979 %X The existence of large chronic disease data bases offers the possibility of studying hypotheses of major medical importance. An objective of the RX Project is to assist a clinical researcher with the tasks of experimental design and statistical analysis. A major component of RX is a knowledge base of medicine and statistics, organized as a frame-based, taxonomic tree. RX determines confounding variables, study design, and analytic techniques. It then gathers data, analyzes it, and interprets results. The American Rheumatism Association Medical Information System is used. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/816/CS-TR-79-816.pdf %R CS-TR-77-588 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On computing the singular value decomposition %A Chan, Tony Fan C. %D February 1977 %X The most well-known and widely-used algorithm for computing the Singular Value Decomposition (SVD) of an m x n rectangular matrix A nowadays is the Golub-Reinsch algorithm [1971]. In this paper, it is shown that by (1) first triangularizing the matrix A by Householder transformations before bidiagonalizing it, and (2) accumulating some left transformations on a n x n array instead of on an m x n array, the resulting algorithm is often more efficient than the Golub-Reinsch algorithm, especially for matrices with considerably more rows than columns (m >> n), such as in least squares applications. The two algorithms are compared in terms of operation counts, and computational experiments that have been carried out verify the theoretical comparisons. The modified algorithm is more efficient even when m is only slightly greater than n, and in some cases can achieve as much as 50% savings when m >> n. If accumulation of left transformations is desired, then $n^2$ extra storage locations are required (relatively small if m >> n), but otherwise no extra storage is required. The modified algorithm uses only orthogonal transformations and is therefore numerically stable. In the Appendix, we give the Fortran code of a hybrid method which automatically selects the more efficient of the two algorithms to use depending upon the input values for m and n. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/588/CS-TR-77-588.pdf %R CS-TR-77-589 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A knowledge-based system for the interpretation of protein x-ray crystallographic data %A Engelmore, Robert S. %A Nii, H. Penny %D February 1977 %X The broad goal of this project is to develop intelligent computational systems to infer the three-dimensional structures of proteins from x-ray crystallographic data. The computational systems under development use both formal and judgmental knowledge from experts to select appropriate procedures and to constrain the space of plausible protein structures. The hypothesis generating and testing procedures operate upon a variety of representations of the data, and work with several different descriptions of the structure being inferred. The system consists of a number of independent but cooperating knowledge sources which propose, augment and verify a solution to the problem as it is incrementally generated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/589/CS-TR-77-589.pdf %R CS-TR-77-593 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Explanation capabilities of production-based consultation systems %A Scott, A. Carlisle %A Clancey, William J. %A Davis, Randall %A Shortliffe, Edward H. %D February 1977 %X A computer program that models an expert in a given domain is more likely to be accepted by experts in that domain, and by non-experts seeking its advice, if the system can explain its actions. An explanation capability not only adds to the system's credibility, but also enables the non-expert user to learn from it. Furthermore, clear explanations allow an expert to check the system's "reasoning", possibly discovering the need for refinements and additions to the system's knowledge base. In a developing system, an explanation capability can be used as a debugging aid to verify that additions to the system are working as they should. This paper discusses the general characteristics of explanation systems: what types of explanations they should be able to give, what types of knowledge will be needed in order to give these explanations, and how this knowledge might be organized. The explanation facility in MYCIN is discussed as an illustration of how the various problems might be approached. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/593/CS-TR-77-593.pdf %R CS-TR-77-596 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A review of knowledge based problem solving as a basis for a genetics experiment designing system %A Stefik, Mark J. %A Martin, Nancy %D February 1977 %X It is generally accepted that problem solving systems require a wealth of domain specific knowledge for effective performance in complex domains. This report takes the view that all domain specific knowledge should be expressed in a knowledge base. With this in mind, the ideas and techniques from problem solving and knowledge base research are reviewed and outstandlng problems are identified. Finally, a task domain is characterized in terms of objects, actions, and control/strategy knowledge and suggestions are made for creating a uniform knowledge base management system to be used for knowledge acquisition, problem solving, and explanation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/596/CS-TR-77-596.pdf %R CS-TR-77-597 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Model-directed learning of production rules %A Buchanan, Bruce G. %A Mitchell, Tom M. %D March 1977 %X The Meta-DENDRAL program is described in general terms that are intended to clarify the similarities and differences to other learning programs. Its approach of model-directed heuristic search through a complex space of possible rules appears well suited to many induction tasks. The use of a strong model of the domain to direct the rule search has been demonstrated for rule formation in two areas of chemistry. The high performance of programs which use the generated rules attests to the success of this learning strategy. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/597/CS-TR-77-597.pdf %R CS-TR-77-602 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The numerically stable reconstruction of a Jacobi matrix from spectral data %A Boor, Carl de %A Golub, Gene H. %D March 1977 %X We show how to construct, from certain spectral data, a discrete inner product for which the associated sequence of monic orthogonal polynomials coincides with the sequence of appropriately normalized characteristic polynomials of the left principal submatrices of the Jacobi matrix. The generation of these orthogonal polynomials via their three term recurrence relation, as popularized by Forsythe, then provides a stable means of computing the entries of the Jacobi matrix. The resulting algorithm might be of help in the approximate solution of inverse eigenvalue problems for Sturm-Liouville equations. Our construction provides, incidentally, very simple proofs of known results concerning existence and uniqueness of a Jacobi matrix satisfying given spectral data and its continuous dependence on that data. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/602/CS-TR-77-602.pdf %R CS-TR-77-603 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Reference machines require non-linear time to maintain disjoint sets %A Tarjan, Robert Endre %D March 1977 %X This paper describes a machine model intended to be useful in deriving realistic complexity bounds for tasks requiring list processing. As an example of the use of the model, the paper shows that any such machine requires non-linear time in the worst case to compute unions of disjoint sets on-line. All set union algorithms known to me are instances of the model and are thus subject to the derived bound. One of the known algorithms achieves the bound to within a constant factor. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/603/CS-TR-77-603.pdf %R CS-TR-77-604 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Control of the dissipativity of Lax-Wendroff type methods for first order systems or hyperbolic equations %A Chan, Tony Fan C. %A Oliger, Joseph %D March 1977 %X Lax-Wendroff methods for hyperbolic systems have two characteristics which are sometimes troublesome. They are sometimes too dissipative -- they may smooth the solution excessively -- and their dissipative behavior does not affect all modes of the solution equally. Both of these difficulties can be remedied by adding properly chosen accretive terms. We develop modifications of the Lax-Wendroff method which equilibrate the dissipativity over the fundamental modes of the solution and allow the magnitude of the dissipation to be controlled. We show that these methods are stable for the mixed initial boundary value problem and develop analogous formulations for the two-step Lax-Wendroff and MacCormack methods. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/604/CS-TR-77-604.pdf %R CS-TR-77-605 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A model for learning systems %A Smith, Reid G. %A Mitchell, Tom M. %A Chestek, Richard A. %A Buchanan, Bruce G. %D March 1977 %X A model for learning systems is presented, and representative AI, pattern recognition, and control systems are discussed in terms of its framework. The model details the functional components felt to be essential for any learning system, independent of the techniques used for its construction, and the specific environment in which it operates. These components are performance element, instance selector, critic, learning element, blackboard, and world model. Consideration of learning system design leads naturally to the concept of a layered system, each layer operating at a different level of abstraction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/605/CS-TR-77-605.pdf %R CS-TR-77-606 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming and problem-solving seminar %A Clancy, Michael J. %A Knuth, Donald E. %D April 1977 %X This report contains edited transcripts of the discussions held in Stanford's course CS 204, Problem Seminar, during autumn quarter 1976. Since the topics span a large range of ideas in computer science, and since most of the important research paradigms and programming paradigms came up during the discussions, the notes may be of use to graduate students of computer science at other universities, as well as to their professors and to professional people in the "real world". %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/606/CS-TR-77-606.pdf %R CS-TR-77-607 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Specifications and proofs for abstract data types in concurrent programs %A Owicki, Susan S. %D April 1977 %X Shared abstract data types, such as queues and buffers, are useful tools for building well-structured concurrent programs. This paper presents a method for specifying shared types in a way that simplifies concurrent program verification. The specifications describe the operations of the shared type in terms of their effect on variables of the process invoking the operation. This makes it possible to verify the processes independently, reducing the complexity of the proof. The key to defining such specifications is the concept of a private variable: a variable which is part of a shared object but belongs to just one process. Shared types can be implemented using an extended form of monitors; proof rules are given for verifying that a monitor correctly implements its specifications. Finally, it is shown how concurrent programs can be verified using the specifications of their shared types. The specification and proof techniques are illustrated with a number of examples involving a shared bounded buffer. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/607/CS-TR-77-607.pdf %R CS-TR-77-609 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complexity of combinatorial algorithms %A Tarjan, Robert Endre %D April 1977 %X This paper examines recent work on the complexity of combinatorial algorithms, highlighting the aims of the work, the mathematical tools used, and the important results. Included are sections discussing ways to measure the complexity of an algorithm, methods for proving that certain problems are very hard to solve, tools useful in the design of good algorithms, and recent improvements in algorithms for solving ten representative problems. The final section suggests some directions for future research. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/609/CS-TR-77-609.pdf %R CS-TR-77-611 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The logic of computer programming %A Manna, Z ohar %A Waldinger, Richard J. %D August 1977 %X Techniques derived from mathematical logic promise to provide an alternative to the conventional methodology for constructing, debugging, and optimizing computer programs. Ultimately, these techniques are intended to lead to the automation of many of the facets of the programming process. This paper provides a unified tutorial exposition of the logical techniques, illustrating each with examples. The strengths and limitations of each technique as a practical programming aid are assessed and attempts to implement these methods in experimental systems are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/611/CS-TR-77-611.pdf %R CS-TR-77-614 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The convergence of functions to fixedpoints of recursive definitions %A Manna, Z ohar %A Shamir, Adi %D May 1977 %X The classical method for constructing the least fixedpoint of a recursive definition is to generate a sequence of functions whose initial element is the totally undefined function and which converges to the desired least fixedpoint. This method, due to Kleene, cannot be generalized to allow the construction of other fixedpoints. In this paper we present an alternate definition of convergence and a new fixedpoint access method of generating sequences of functions for a given recursive definition. The initial function of the sequence can be an arbitrary function, and the sequence will always converge to a fixedpoint that is "close" to the initial function. This defines a monotonic mapping from the set of partial functions onto the set of all fixedpoints of the given recursive definition. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/614/CS-TR-77-614.pdf %R CS-TR-77-615 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical methods for the first biharmonic equation and for the two-dimensional Stokes problem %A Glowinski, Roland %A Pironneau, Olivier %D May 1977 %X We describe in this report various methods, iterative and "almost direct," for solving the first biharmonic problem on general two-dimensional domains once the continuous problem has been approximated by an appropriate mixed finite element method. Using the approach described in this report we recover some well known methods for solving the first biharmonic equation as a system of coupled harmonic equations, but some of the methods discussed here are completely new, including a conjugate gradient type algorithm. In the last part of this report we discuss the extension of the above methods to the numerical solution of the two dimensional Stokes problem in p- connected domains (p $\geq$ 1) through the stream function-vorticity formulation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/615/CS-TR-77-615.pdf %R CS-TR-77-616 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stability of the Fourier method %A Kreiss, Heinz-Otto %A Oliger, Joseph %D August 1977 %X In this paper we develop a stability theory for the Fourier (or pseudo-spectral) method for linear hyperbolic and parabolic partial differential equations with variable coefficients. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/616/CS-TR-77-616.pdf %R CS-TR-77-618 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A production system for automatic deduction %A Nilsson, Nils J. %D August 1977 %X A new predicate calculus deduction system based on production rules is proposed. The system combines several developments in Artificial Intelligence and Automatic Theorem Proving research including the use of domain-specific inference rules and separate mechanisms for forward and backward reasoning. It has a clean separation between the data base, the production rules, and the control system. Goals and subgoals are maintained in an AND/OR tree to represent assertions. The production rules modify these structures untll they "connect" in a fashion that proves the goal theorem. Unlike some previous systems that used production rules, ours is not limited to rules in Horn Clause form. Unlike previous PLANNER-like systems, ours can handle the full range of predicate calculus expressions including those with quantified variables, disjunctions and negations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/618/CS-TR-77-618.pdf %R CS-TR-77-619 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Time-space trade-offs in a pebble game %A Paul, Wolfgang J. %A Tarjan, Robert Endre %D July 1977 %X A certain pebble game on graphs has been studied in various contexts as a model for the time and space requirements of computations. In this note it is shown that there exists a family of directed acyclic graphs $G_n$ and constants $c_1$, $c_2$, $c_3$ such that (1) $G_n$ has n nodes and each node in $G_n$ has indegree at most 2. (2) Each graph $G_n$ can be pebbled with $c_1\sqrt{n}$ pebbles in n moves. (3) Each graph $G_n$ can also be pebbled with $C_2\sqrt{n}$ pebbles, $c_2$ < $c_1$, but every strategy which achieves this has at least $2^{c_3\sqrt{n}}$ moves. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/619/CS-TR-77-619.pdf %R CS-TR-77-621 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The art of artificial intelligence: I. Themes and case studies of knowledge engineering %A Feigenbaum, Edward A. %D August 1977 %X The knowledge engineer practices the art of bringing the principles and tools of AI research to bear on difficult applications problems requiring experts' knowledge for their solution. The technical issues of acquiring this knowledge, representing it, and using it appropriately to construct and explain lines-of-reasoning, are important problems in the design of knowledge-based systems. Various systems that have achieved expert level performance in scientific and medical inference illuminates the art of knowledge engineering and its parent science, Artificial Intelligence. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/621/CS-TR-77-621.pdf %R CS-TR-77-624 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Recent research in computer science. %A McCarthy, John %A Binford, Thomas O. %A Green, Cordell C. %A Luckham, David C. %A Manna, Z ohar %A Winograd, Terry A. %A Earnest, Lester D. %D June 1977 %X This report summarizes recent accomplishments in six related areas: (1) basic AI research and formal reasoning, (2) image understanding, (3) mathematical theory of computation, (4) program verification, (5) natural language understanding, and (6) knowledge based programming. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/624/CS-TR-77-624.pdf %R CS-TR-77-625 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A fast merging algorithm %A Brown, Mark R. %A Tarjan, Robert Endre %D August 1977 %X We give an algorithm which merges sorted lists represented as balanced binary trees. If the lists have lengths m and n (m $\leq$ n), then the merging procedure runs in O(m log n/m) steps, which is the same order as the lower bound on all comparison-based algorithms for this problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/625/CS-TR-77-625.pdf %R CS-TR-77-626 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the loop switching addressing problem %A Yao, Andrew Chi-Chih %D October 1977 %X The following graph addressing problem was studied by Graham and Pollak in devising a routing scheme for Pierce's Loop Switching Network. Let G be a graph with n vertices. It is desired to assign to each vertex $v_i$ an address in ${{0,1,*}}^\ell$, such that the Hamming distance between the addresses of any two vertices agrees with their distance in G. Let N(G) be the minimum length $\ell$ for which an assignment is possible. It was shown by Graham and Pollak that N(G) $\leq m_G$(n-1), where $m_G$ is the diameter of G. In the present paper, we shall prove that N(G) $\leq 1.09(lg m_G$)n + 8n by an explicit construction. This shows in particular that any graph has an addressing scheme of length O(n log n). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/626/CS-TR-77-626.pdf %R CS-TR-77-627 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A separator theorem for planar graphs %A Lipton, Richard J. %A Tarjan, Robert Endre %D October 1977 %X Let G be any n-vertex planar graph. We prove that the vertices of G can be partitioned into three sets A,B,C such that no edge joins a vertex in A with a vertex in B, neither A nor B contains more than 2n/3 vertices, and C contains no more than $2\sqrt{2}\sqrt{2}$ vertices. We exhibit an algorithm which finds such a partition A,B,C in O(n) time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/627/CS-TR-77-627.pdf %R CS-TR-77-628 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Applications of a planar separator theorem %A Lipton, Richard J. %A Tarjan, Robert Endre %D October 1977 %X Any n-vertex planar graph has the property that it can be divided into components of roughly equal size by removing only O($\sqrt{n}$) vertices. This separator theorem, in combination with a divide-and-conquer strategy, leads to many new complexity results for planar graph problems. This paper describes some of these results. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/628/CS-TR-77-628.pdf %R CS-TR-77-629 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The complexity of pattern matching for a random string %A Yao, Andrew Chi-Chih %D October 1977 %X We study the average-case complexity of finding all occurrences of a given pattern $\alpha$ in an input text string. Over an alphabet of q symbols, let c($\alpha$,n) be the minimum average number of characters that need to be examined in a random text string of length n. We prove that, for large m, almost all patterns $\alpha$ of length m satisfy c($\alpha$,n) = $\Theta (\lceil \log_q (${n-m}/{ln m} + 2)\rceil )$ if $m \leq n \leq 2m$, and c($\alpha$,n) = $\Theta ({\lceil \log_q m\rceil}/m n)$ if n > 2m. This in particular confirms a conjecture raised in a recent paper by Knuth, Morris, and Pratt [1977]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/629/CS-TR-77-629.pdf %R CS-TR-77-631 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Inference rules for program annotation %A Dershowitz, Nachum %A Manna, Z ohar %D October 1977 %X Methods are presented whereby an Algol-like program, given together with its specifications, can be documented automatically. The program is incrementally annotated with invariant relationships that hold between program variables at intermediate points in the program and explain the acutal workings of the program regardless of whether the program is correct. Thus this documentation can be used for proving the correctness of the program or may serve as an aid in the debugging of an incorrect program. The annotation techniques are formulated as Hoare-llike inference rules which derive invariants from the assignment statements, from the control structure of the program, or, heuristically, from suggested invariants. The application of these rules is demonstrated by two examples which have run on an experimental implementation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/631/CS-TR-77-631.pdf %R CS-TR-77-634 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A new proof of global convergence for the tridiagonal QL algorithm %A Hoffmann, Walter %A Parlett, Beresford N. %D October 1977 %X By exploiting the relation of the QL algorithm to inverse iteration we obtain a proof of global convergence which is more conceptual and less computational than previous analyses. The proof uses a new, but simple, error estimate for the first step of inverse iteration. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/634/CS-TR-77-634.pdf %R CS-TR-77-635 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A block Lanczos method to compute the singular values and corresponding singular vectors of a matrix %A Golub, Gene H. %A Luk, Franklin T. %A Overton, Michael L. %D October 1977 %X We present a block Lanczos method to compute the largest singular values and corresponding left and right singular vectors of a large sparse matrix. Our algorithm does not transform the matrix A but accesses it only through a user-supplied routine which computes AX or $A^t$X for a given matrix X. This paper also includes a thorough discussion of the various ways to compute the singular value decomposition of a banded upper triangular matrix; this problem arises as a subproblem to be solved during the block Lanczos procedure. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/635/CS-TR-77-635.pdf %R CS-TR-77-636 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T $C^m$ convergence of trigonometric interpolants %A Bube, Kenneth P. %D October 1977 %X For m $\geq$ 0, we obtain sharp estimates of the uniform accuracy of the m-th derivative of the n-point trigonometric interpolant of a function for two classes of periodic functions on R. As a corrollary, the n-point interpolant of a function in $C^k$ uniformly approximates the function to order o($n^{1/2-k}$), improving the recent estimate of O($n^{1-k}$). These results remain valid if we replace the trigonometric interpolant by its K-th partial sum, replacing n by K in the estimates. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/636/CS-TR-77-636.pdf %R CS-TR-77-637 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the gap structure of sequences of points on a circle %A Ramshaw, Lyle H. %D November 1977 %X Considerable mathematical effort has gone into studying sequences of points in the interval (0,1) which are evenly distributed, in the sense that certain intervals contain roughly the correct percentages of the first n points. This paper explores the related notion in which a sequence is evenly distributed if its first n points split a given circle into intervals which are roughly equal in length, regardless of their relative positions. The sequence $x_k$ = ($\log_2$(2k-1) mod 1) was introduced in this context by DeBruijn and Erdoes. We will see that the gap structure of this sequence is uniquely optimal in a certain sense, and optimal under a wide class of measures. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/637/CS-TR-77-637.pdf %R CS-TR-77-638 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A generalized conjugate gradient algorithm for solving a class of quadratic programming problems %A O'Leary, Dianne Prost %D December 1977 %X In this paper we apply matrix splitting techniques and a conjugate gradient algorithm to the problem of minimizing a convex quadratic form subject to upper and lower bounds on the variables. This method exploits sparsity structure in the matrix of the quadratic form. Choices of the splitting operator are discussed and convergence results are established. We present the results of numerical experiments showing the effectiveness of the algorithm on free boundary problems for elliptic partial differential equations, and we give comparisons with other algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/638/CS-TR-77-638.pdf %R CS-TR-77-639 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On program synthesis knowledge %A Green, Cordell C. %A Barstow, David R. %D November 1977 %X This paper presents a body of program synthesis knowledge dealing with array operations, space reutilization, the divide and conquer paradigm, conversion from recursive paradigms to iterative paradigms, and ordered set enumerations. Such knowledge can be used for the synthesis of efficient and in-place sorts including quicksort, mergesort, sinking sort, and bubble sort, as well as other ordered set operations such as set union, element removal, and element addition. The knowledge is explicated to a level of detail such that it is possible to codify this knowledge as a set of program synthesis rules for use by a computer-based synthesis system. The use and content of this set of programming rules is illustrated herein by the methodical synthesis of bubble sort, sinking sort, quicksort, and mergesort. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/639/CS-TR-77-639.pdf %R CS-TR-77-640 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Structured programming with recursion %A Manna, Z ohar %A Waldinger, Richard J. %D January 1978 %X No abstract available. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/640/CS-TR-77-640.pdf %R CS-TR-77-642 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On constructing minimum spanning trees in k-dimensional spaces and related problems %A Yao, Andrew Chi-Chih %D December 1977 %X The problem of finding a minimum spanning tree connecting n points in a k-dimensional space is discussed under three common distance metrics -- Euclidean, rectilinear, and $L_\infty$. By employing a subroutine that solves the post office problem, we show that, for fixed k $\geq$ 3, such a minimum spanning tree can be found in time O($n^{2-a(k)} {(log n)}^{1-a(k)}$), where a(k) = $2^{-(k+1)}$. The bound can be improved to O(${(n log n)}^{1.8}$) for points in the 3-dimensional Euclidean space. We also obtain o($n^2$) algorithms for finding a farthest pair in a set of n points and for other related problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/642/CS-TR-77-642.pdf %R CS-TR-77-645 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Generalized nested dissection %A Lipton, Richard J. %A Rose, Donald J. %A Tarjan, Robert Endre %D December 1977 %X J. A. George has discovered a method, called nested dissection, for solving a system of linear equations defined on an n = k $\times$ k square grid in O(n log n) space and O($n{3/2}$) time. We generalize this method without degrading the time and space bounds so that it applies to any system of equations defined on a planar or almost-planar graph. Such systems arise in the solution of two-dimensional finite element problems. Our method uses the fact that planar graphs have good separators. More generally, we show that sparse Gaussian elimination is efficient for any class of graphs which have good separators, and conversely that graphs without good separators (including almost all sparse graphs) are not amenable to sparse Gaussian elimination. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/645/CS-TR-77-645.pdf %R CS-TR-77-646 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fast decision algorithms based on congruence closure %A Nelson, Charles Gregory %A Oppen, Derek C. %D February 1978 %X We define the notion of the 'congruence closure' of a relation on a graph and give a simple algorithm for computing it. We then give decision procedures for the quantifier-free theory of equality and the quantifier-free theory of LISP list structure, both based on this algorithm. The procedures are fast enough to be practical in mechanical theorem proving: each procedure determines the satisfiability of a conjunction of length n of literals in time O($n^2$). We also show that if the axiomatization of the theory of list structure is changed slightly, the problem of determining the satisfiability of a conjunction of literals becomes NP-complete. We have implemented the decision procedures in our simplifier for the Stanford Pascal Verifier. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/646/CS-TR-77-646.pdf %R CS-TR-77-647 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A lower bound to palindrome recognition by probabilistic Turing machines %A Yao, Andrew Chi-Chih %D December 1977 %X We call attention to the problem of proving lower bounds on probabilistic Turing machine computations. It is shown that any probabilisitc Turing machine recognizing the language L = {w $\phi$ w | w $\epsilon$ ${{0,1}}^*$} with error $\lambda$ < 1/2 must take $\Omega$(n log n) time. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/647/CS-TR-77-647.pdf %R CS-TR-77-432 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A users manual for FOL. %A Weyhrauch, Richard W. %D July 1977 %X This manual explains how to use of the proof checker FOL, and supersedes all previous manuals. FOL checks proofs of a natural deduction style formulation of first order functional calculus with equality augumented in the following ways: (i) it is a many-sorted first-order logic in which a partial order over the sorts may be specified; (ii) conditional expressions are allowed for forming terms (iii) axiom schemata with predicate and function parameters are allowed (iv) purely propositional deductions can be made in a single step; (v) a partial model of the language can be built in a LISP environment and some deductions can be made by direct computation in this model; (vi) there is a limited ability to make metamathematical arguments; (vii) there are many operational conveniences. A major goal of FOL is to create an environment where formal proofs can be carefully examined with the eventual aim of designing practical tools for manipulating proofs in pure mathematics and about the correctness of programs. This includes checking proofs generated by other programs. FOL is also a research tool in modeling common-sense reasoning including reasoning about knowledge and belief. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/77/432/CS-TR-77-432.pdf %R CS-TR-76-533 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A generalized conjugate gradient method for the numerical solution of elliptic partial differential equations %A Concus, Paul %A Golub, Gene H. %A O'Leary, Dianne Prost %D January 1976 %X We consider a generalized conjugate gradient method for solving sparse, symmetric, positive-definite systems of linear equations, principally those arising from the discretization of boundary value problems for elliptic partial differential equations. The method is based on splitting off from the original coefficient matrix a symmetric, positive-definite one that corresponds to a more easily solvable system of equations, and then accelerating the associated iteration using conjugate gradients. Optimality and convergence properties are presented, and the relation to other methods is discussed. Several splittings for which the method seems particularly effective are also discussed, and for some, numerical examples are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/533/CS-TR-76-533.pdf %R CS-TR-76-535 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A generalized conjugate gradient method for nonsymmetric systems of linear equations %A Concus, Paul %A Golub, Gene H. %D January 1976 %X We consider a generalized conjugate gradient method for solving systems of linear equations having nonsymmetric coefficient matrices with positive-definite symmetric part. The method is based on splitting the matrix into its symmetric and skew-symmetric parts, and then accelerating the associated iteration using conjugate gradients, which simplifies in this case, as only one of the two usual parameters is required. The method is most effective for cases in which the symmetric part of the matrix corresponds to an easily solvable system of equations. Convergence properties are discussed, as well as an application to the numerical solution of elliptic partial differential equations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/535/CS-TR-76-535.pdf %R CS-TR-76-540 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Addition chains with multiplicative cost %A Graham, Ronald L. %A Yao, Andrew Chi-Chih %A Yao, F. Frances %D January 1976 %X If each step in an addition chain is assigned a cost equal to the product of the numbers added at that step, "binary" addition chains are shown to minimize total cost. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/540/CS-TR-76-540.pdf %R CS-TR-76-542 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The theoretical aspects of the optimal fixedpoint %A Manna, Z ohar %A Shamir, Adi %D March 1976 %X In thls paper we define a new type of fixedpoint of recursive definitions and investigate some of its properties. This optimal fixedpoint (which always uniquely exists) contains, in some sense, the maximal amount of "interesting" information which can be extracted from the recursive definition, and it may be strictly more defined than the program's least fixedpoint. This fixedpoint can be the basis for assigning a new semantics to recursive programs. This is a modified and extended version of part 1 of a paper presented at the Symposium on Theory of Computing, Albuquerque, New Mexico (May 1975). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/542/CS-TR-76-542.pdf %R CS-TR-76-543 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Optimal polyphase sorting %A Z ave, Derek A. %D March 1976 %X A read-forward polyphase merge algorithm is described which performs the polyphase merge starting from an arbitrary string distribution. This algorithm minimizes the volume of information moved. Since this volume is easily computed, it is possible to construct dispersion algorithms which anticipate the merge algorithm. Two such dispersion techniques are described. The first algorithm requires that the number of strings to be dispersed be known in advance; this algorithm is optimal. The second algorithm makes no such requirement, but is not always optimal. In addition, performance estimates are derived and both algorithmns are shown to be asymptotically optimal. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/543/CS-TR-76-543.pdf %R CS-TR-76-544 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Removing trivial assignments from programs %A Mont-Reynaud, Bernard %D March 1976 %X An assignment X $\leftarrow$ Y in a program is "trivial" when both X and Y are simple program variables. The paper describes a transformation which removes all such assignments from a program P, producing a program P' which executes faster than P but usually has a larger size. The number of variables used by P' is also minimized. Worst-case analysis of the transformation algorithm leads to nonpolynomial bounds. Such inefficiency, however, does not arise in typical situations, and the technique appears to be of interest for practical compiler optimization. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/544/CS-TR-76-544.pdf %R CS-TR-76-545 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Space bounds for a game on graphs %A Paul, Wolfgang L. %A Tarjan, Robert Endre %A Celoni, James R. %D March 1976 %X We study a one-person game played by placing pebbles, according to certain rules, on the vertices of a directed graph. In [J. Hopcroft, W. Paul, and L. Valiant, "On time versus space and related problems," Proc. 16th Annual Symp. on Foundations of Computer Science (1975), pp.57-64] it was shown that for each graph with n vertices and maximum in-degree d, there is a pebbling strategy which requires at most c(d) n/log n pebbles. Here we show that this bound is tight to within a constant factor. We also analyze a variety of pebbling algorithms, including one which achieves the O(n/log n) bound. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/545/CS-TR-76-545.pdf %R CS-TR-76-547 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Iterative algorithms for global flow analysis %A Tarjan, Robert Endre %D March 1976 %X This paper studies iterative methods for the global flow analsis of computer programs. We define a hierarchy of global flow problem classes, each solvable by an appropriate generalization of the "node listing" method of Kennedy. We show that each of these generalized methods is optimum, among all iterative algorithms, for solving problems within its class. We give lower bounds on the time required by iterative algorithms for each of the problem classes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/547/CS-TR-76-547.pdf %R CS-TR-76-549 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic program verification V: verification-oriented proof rules for arrays, records and pointers %A Luckham, David C. %A Suzuki, Norihisa %D March 1976 %X A practical method is presented for automating in a uniform way the verification of Pascal programs that operate on the standard Pascal data structures ARRAY, RECORD, and POINTER. New assertion language primitives are introduced for describing computational effects of operations on these data structures. Axioms defining the semantics of the new primitives are given. Proof rules for standard Pascal operations on pointer variables are then defined in terms of the extended assertion language. Similar rules for records and arrays are special cases. An extensible axiomatic rule for the Pascal memory allocation operation, NEW, is also given. These rules have been implemented in the Stanford Pascal program verifier. Examples illustrating the verification of programs which operate on list structures implemented with pointers and records are discussed. These include programs with side-effects. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/549/CS-TR-76-549.pdf %R CS-TR-76-550 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finding a maximum independent set %A Tarjan, Robert Endre %A Trojanowski, Anthony E. %D June 1976 %X We present an algorithm which finds a maximum independent set in an n-vertex graph in 0($2^{n/3}$) time. The algorithm can thus handle graphs roughly three times as large as could be analyzed using a naive algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/550/CS-TR-76-550.pdf %R CS-TR-76-551 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The state of the Art of Computer Programming %A Knuth, Donald E. %D June 1976 %X This report lists all corrections and changes to volumes 1 and 3 of "The Art of Computer Programming," as of May 14, 1976. The changes apply to the most recent printings of both volumes (February and March, 1975); if you have an earlier printing there have been many other changes not indicated here. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/551/CS-TR-76-551.pdf %R CS-TR-76-553 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complexity of monotone networks for computing conjunctions %A Tarjan, Robert Endre %D June 1976 %X Let $F_1$, $F_2$,..., $F_m$ be a set of Boolean functions of the form $F_i$ = $\wedge$ {x$\in X_i$}, where $\wedge$ denotes conjunction and each $X_i$ is a subset of a set X of n Boolean variables. We study the size of monotone Boolean networks for computing such sets of functions. We exhibit anomalous sets of conujunctions whose smallest monotone networks contain disjunctions. We show that if |$F_i$| is sufficiently small for all i, such anomalies cannot happen. We exhibit sets of m conjunctions in n unknowns which require $c_2$m$\alpha$(m,n) binary conjunctions, where $\alpha$(m,n) is a very slowly growing function related to a functional inverse of Ackermann's function. This class of examples shows that an algorithm given in [STAN-CS-75-512] for computing functions defined on paths in trees is optimum to within a constant factor. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/553/CS-TR-76-553.pdf %R CS-TR-76-555 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Monte Carlo simulation of tolerancing in discrete parts manufacturing and assembly %A Grossman, David D. %D May 1976 %X The assembly of discrete parts is strongly affected by imprecise components, imperfect fixtures and tools, and inexact measurements. It is often necessary to design higher precision into the manufacturing and assembly process than is functionally needed in the final product. Production engineers must trade off between alternative ways of selecting individual tolerances in order to achieve minimum cost while preserving product integrity. This paper describes a comprehensive Monte Carlo method for systematically analysing the stochastic implications of tolerancing and related forms of imprecision. The method is illustrated by four examples, one of which is chosen from the field of assembly by computer controlled manipulators. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/555/CS-TR-76-555.pdf %R CS-TR-76-558 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Is "sometime" sometimes better than "always"? Intermittent assertions in proving program correctness %A Manna, Z ohar %A Waldinger, Richard J. %D March 1977 %X This paper explores a technique for proving the correctness and termination of programs simultaneously. This approach, which we call the intermittent-assertion method, involves documenting the program with assertions that must be true at some time when control passes through the corresponding point, but that need not be true every time. The method, introduced by Burstall, promises to provide a valuable complement to the more conventional methods. We first introduce the intermittent-assertion method with a number of examples of correctness and termination proofs. Some of these proofs are markedly simpler than their conventional counterparts. On the other hand, we show that a proof of correctness or termination by any of the conventional techniques can be rephrased directly as a proof using intermittent assertions. Finally, we show how the intermittent assertion method can be applied to prove the validity of program transformations and the correctness of continuously operating programs. This is a revised and simplified version of a previous paper with the same title (AIM-281, June 1976). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/558/CS-TR-76-558.pdf %R CS-TR-76-559 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Rank degeneracy and least squares problems %A Golub, Gene H. %A Klema, Virginia C. %A Stewart, Gilbert W. %D August 1976 %X This paper is concerned with least squares problems when the least squares matrix A is near a matrix that is not of full rank. A definition of numerical rank is given. It is shown that under certain conditions when A has numerical rank r there is a distinguished r dimensional subspace of the column space of A that is insensitive to how it is approximated by r independent columns of A. The consequences of this fact for the least squares problem are examined. Algorithms are described for approximating the stable part of the column space of A. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/559/CS-TR-76-559.pdf %R CS-TR-76-561 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Mathematical Programming Language -- user's guide %A Woods, Donald R. %D August 1976 %X Mathematical Programming Language (MPL) is a aprogramming language specifically designed for the implementation of mathematical software and, in particular, experimental mathematical programming software. In the past there has been a wide gulf between the applied mathematicians who design mathematical algorithms (but often have little appreciation of the fine points of computing) and the professional programmer, who may have little or no understanding of the mathematics of the problem he is programming. The result is that a vast number of mathematical algorithms have been devised and published, with only a small fraction being actually implemented and experimentally compared on selected representative problems. MPL is designed to be as close as possible to the terminology used by the mathematician while retaining as far as possible programming sophistications which make for good software systems. The result is a programming langauge which (hopefully!) allows the writing of clear, concise, easily read programs, especially by persons who are not professional programmers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/561/CS-TR-76-561.pdf %R CS-TR-76-568 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Exploratory study of computer integrated assembly systems. Progress report 3, covering the period December 1, 1975 to July 31, 1976 %A Binford, Thomas O. %A Grossman, David D. %A Liu, C. Richard %A Bolles, Robert C. %A Finkel, Raphael A. %A Mujtaba, M. Shahid %A Roderick, Michael D. %A Shimano, Bruce E. %A Taylor, Russell H. %A Goldman, Ronald H. %A Jarvis, J. Pitts, III %A Scheinman, Victor D. %A Gafford, Thomas A. %D August 1976 %X The Computer Integrated Assembly Systems project is concerned with developing the software technology of programmable assembly devices, including computer controlled manipulators and vision systems. A complete hardware system has been implemented that includes manipulators with tactile sensors and TV cameras, tools, fixtures, and auxiliary devices, a dedicated minicomputer, and a time-shared large computer equipped with graphic display terminals. An advanced software system called AL has been developed that can be used to program assembly applications. Research currently underway includes refinement of AL, development of improved languages and interactive programming techniques for assembly and vision, extension of computer vision to areas which are currently infeasible, geometric modeling of objects and constraints, assembly simulation, control algorithms, and adaptive methods of calibration. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/568/CS-TR-76-568.pdf %R CS-TR-76-569 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Calculation of interpolating natural spline functions using de Boor's package for calculating with B-splines %A Herriot, John G. %D October 1976 %X A FORTRAN subroutine is described for finding interpolating natural splines of odd degree for an arbitrary set of data points. The subroutine makes use of several of the subroutines in de Boor's package for calculating with B-splines. An Algol W translation of the interpolating natural spline subroutine and of the required subroutines of the de Boor package are also given. Timing tests and accuracy tests for the routines are described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/569/CS-TR-76-569.pdf %R CS-TR-76-572 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An FOL primer %A Filman, Robert E. %A Weyhrauch, Richard W. %D September 1976 %X This primer is an introduction to FOL, an interactive proof checker for first order logic. Its examples can be used to learn the FOL system, or read independently for a flavor of our style of interactive proof checking. Several example proofs are presented, successively increasing in the complexity of the FOL commands employed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/572/CS-TR-76-572.pdf %R CS-TR-76-573 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The stationary p-tree forest %A Jonassen, Arne T. %D October 1976 %X This paper contains a theoretical analysis of the conditions of a priority queue strategy after an infinite number of alternating insert/remove steps. Expected insertion time, expected length, etc. are found. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/573/CS-TR-76-573.pdf %R CS-TR-76-574 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T SAIL %A Reiser, John F. %D August 1976 %X Sail is a high-level programming language for the PDP-10 computer. It includes an extended ALGOL 60 compiler and a companion set of execution-time routines. In addition to ALGOL, the language features: (1) flexible linking to hand-coded machine language algorithms, (2) complete access to the PDP-10 I/O facilities, (3) a complete system of compile-time arithmetic and logic as well as a flexible macro system, (4) a high-level debugger, (5) records and references, (6) sets and lists, (7) an associative data structure, (8) independent processes, (9) procedure variables, (10) user modifiable error handling, (11) backtracking, and (12) interrupt facilities. This manual describes the Sail language and the execution-time routines for the typical Sail user: a non-novice programmer with some knowledge of ALGOL. It lies somewhere between being a tutorial and a reference manual. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/574/CS-TR-76-574.pdf %R CS-TR-76-575 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T SAIL tutorial %A Smith, Nancy W. %D October 1976 %X This tutorial is designed for a beginning user of Sail, an ALGOL-like language for the PDP10. The first part covers the basic statements and expressions of the language; remaining topics include macros, records, conditional compilation, and input/output. Detailed examples of Sail programming are included throughout, and only a minimum of programming background is assumed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/575/CS-TR-76-575.pdf %R CS-TR-76-578 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Theoretical and practical aspects of some initial-boundary value problems in fluid dynamics %A Oliger, Joseph %A Sundstroem, Arne %D November 1976 %X Initial-boundary value problems for several systems of partial differential equations from fluid dynamics are discussed. Both rigid wall and open boundary problems are treated. Boundary conditions are formulated and shown to yield well-posed problems for the Eulerian equations for gas dynamics, the shallow-water equations, and linearized constant coefficient versions of the incompressible, anelastic equations. The "primitive" hydrostatic meteorological equations are shown to be ill-posed with any specification of local, pointwise boundary conditions. Analysis of simplified versions of this system illustrates the mechanism responsible for ill-posedness. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/578/CS-TR-76-578.pdf %R CS-TR-76-579 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The A0 inversion model of program paging behavior %A Baskett, Forest %A Rafii, Abbas %D November 1976 %X When the parameters of a simple stochastic model of the memory referencing behavior of computer programs are carefully selected, the model is able to mimic the paging behavior of a set of actual programs. The mimicry is successful using several different page replacement algorithms and a wide range of real memory sizes in a virtual memory environment. The model is based on the independent reference model with a new procedure for determining the page reference probabilities, the parameters of the model. We call the result the A0 inversion independent reference model. Since the fault rate (or miss ratio) is one aspect of program behavior that the model is able to capture for many different memory sizes, the model should be especially useful for evaluating multilevel memory organizations based on newly emerging memory technologies. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/579/CS-TR-76-579.pdf %R CS-TR-76-580 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Towards a procedural understanding of semantics %A Winograd, Terry A. %D November 1976 %X The term "procedural semantics" has been used in a variety of ways, not all compatible, and not all comprehensible. In this paper, I have chosen to apply the term to a broad paradigm for studying semantics (and in fact, all of linguistics). This paradigm has developed in a context of writing computer programs which use natural language, but it is not a theory of computer programs or programming techniques. It is "procedural" because it looks at the underlying structure of language as fundamentally shaped by the nature of processes for language production and comprehension. It is based on the belief that there is a level of explanation at which there are significant similarities between the psychological processes of human language use and the computational processes in computer programs we can construct and study. Its goal is to develop a body of theory at this level. This approach necessitates abandoning or modifying several currently accepted doctrines, including the way in which distinctions have been drawn between "semantics" and "pragmatics" and between "performance" and "competence". The paper has three major sections. It first lays out the paradigm assumptions which guide the enterprise, and elaborates a model of cognitive processing and language use. It then illustrates how some specific semantic problems might be approached from a procedural perspective, and contrasts the procedural approach with formal structural and truth conditional approaches. Finally, it discusses the goals of linguistic theory and the nature of the linguistic explanation. Much of what is presented here is a speculation about the nature of a paradigm yet to be developed. This paper is an attempt to be evocative rather than definitive; to convey intuitions rather than to formulate crucial arguments which justify this approach over others. It will be successful if it suggests some ways of looking at language which lead to further understanding. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/580/CS-TR-76-580.pdf %R CS-TR-76-581 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An overview of KRL, a Knowledge Representation Language %A Bobrow, Daniel G. %A Winograd, Terry A. %D November 1976 %X This paper describes KRL, a Knowledge Representation Language designed for use in understander systems. It outlines both the general concepts which underlie our research and the details of KRL-0, an experimental implementation of some of these concepts. KRL is an attempt to integrate procedural knowledge with a broad base of declarative forms. These forms provide a variety of ways to express the logical structure of the knowledge, in order to give flexibility in associating procedures (for memory and reasoning) with specific pieces of knowledge, and to control the relative accessibility of different facts and descriptions. The formalism for declarative knowledge is based on structured conceptual objects with associated descriptions. These objects form a network of memory units with several different sorts of linkages, each having well-specified implications for the retrieval process. Procedures can be associated directly with the internal structure of a conceptual object. This procedural attachment allows the steps for a particular operation to be determined by characteristics of the specific entities involved. The control structure of KRL is based on the belief that the next generation of intelligent programs will integrate data-directed and goal-directed processing by using multi-processing. It provides for a priority-ordered multi-process agenda with explicit (user-provided) strategies for scheduling and resource allocation. It provides procedure directories which operate along with process frameworks to allow procedural parameterization of the fundamental system processes for building, comparing, and retrieving memory structures. Future development of KRL will include integrating procedure definition with the descriptive formalism. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/581/CS-TR-76-581.pdf %R CS-TR-76-583 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Determining the stability number of a graph %A Chvatal, Vaclav %D December 1976 %X We formalize certain rules for deriving upper bounds on the stability number of a graph. The resulting system is powerful enough to (i) encompass the algorithms of Tarjan's type and (ii) provide very short proofs on graphs for which the stability number equals the clique-covering number. However, our main result shows that for almost all graphs with a (sufficiently large) linear number of edges, proofs within our system must have at least exponential length. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/583/CS-TR-76-583.pdf %R CS-TR-76-585 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical solution of nonlinear elliptic partial differential equations by a generalized conjugate gradient method %A Concus, Paul %A Golub, Gene H. %A O'Leary, Dianne Prost %D December 1976 %X We have studied previously a generallized conjugate gradient method for solving sparse positive-definite systems of linear equations arising from the discretization of ellilptic partial-differential boundary-value problems. Here, extensions to the nonlinear case are considered. We split the original discretized operator into the sum of two operators, one of which corresponds to a more easily solvable system of equations, and accelerate the associated iteration based on this splitting by (nonlinear) conjugate gradients. The behavior of the method is illustrated for the minimal surface equation with splittings corresponding to nonlinear SSOR, to approximate factorization of the Jacobian matrix, and to elliptic operators suitable for use with fast direct methods. The results of numerical experiments are given as well for a mildly nonlinear example, for which, in the corresponding linear case, the finite termination property of the conjugate gradient algorithm is crucial. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/585/CS-TR-76-585.pdf %R CS-TR-76-586 %Z Tue, 04 Jul 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The evolution of programs: a system for automatic program modification %A Dershowitz, Nachum %A Manna, Z ohar %D December 1976 %X An attempt is made to formulate techniques of program modification, whereby a program that achieves one result can be transformed into a new program that uses the same principles to achieve a different goal. For example, a program that uses the binary search paradigm to calculate the square-root of a number may be modified to divide two numbers in a similar manner, or vice versa. Program debugging is considered as a special case of modification: if a program computes wrong results, it must be modified to achieve the intended results. The application of abstract program schemata to concrete problems is also viewed from the perspective of modification techniques. We have embedded this approach in a running implementation; our methods are illustrated with several examples that have been performed by it. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/586/CS-TR-76-586.pdf %R CS-TR-76-405 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford Computer Science Department research report. %A Davis, Randall %A Wright, Margaret H. %D January 1976 %X This collection of reports is divided into two sections. The first contains the research summaries for individual faculty members and research associates in the Computer Science Department. Two professors from Electrical Engineering are included as "Affiliated Faculty" because their interests are closely related to those of the Department. The second section gives an overview of the activities of research groups in the Department. "Group" here is taken to imply many different things, including people related by various degrees of intellectual interests, physical proximity, or funding considerations. We have tried to describe any group whose scope of interest is greater than that of one person. The list of recent publications for each is not intended to be comprehensive, but rather to give a feeling for the range of topics considered. This collection of reports has been assembled to provide a reasonably comprehensive review of research activities in the Department. We hope that it will be widely useful -- in particular, students in the Department may find it helpful in discovering interesting projects and possible thesis topics. We expect also that it will be of interest to many other people, both within and outside the Department. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/76/405/CS-TR-76-405.pdf %R CS-TR-74-404 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A catalog of quadri/trivalent graphs. %A Sridharan, Natesa S. %D January 1974 %X In a previous report [1973] a method for computer generation of quadri/trivalent "vertex-graphs" was presented in detail. This report is a catalog of 13 classes of graphs generated by using this method. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/404/CS-TR-74-404.pdf %R CS-TR-74-405 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford Computer Science Department research report. %A Davis, Randall %A Wright, Margaret H. %D January 1974 %X This collection of reports is divided into two sections. The first contains the research summaries for individual faculty members and research associates in the Computer Science Department. Two professors from Electrical Engineering are included as "Affiliated Faculty' because their interests are closely related to those of the Department, while Professors George Dantzig and Roger Schank do not appear because they were on leave and unavailable when the summaries were prepared. The second section gives an overview of the activities of research groups in the Department. "Group" here is taken to imply many different things, including people related by various degrees of intellectual interests, physical proximity, or funding considerations. We have tried to describe any group whose scope of interest is greater than that of one person. The list of recent publications for each is not intended to be comprehensive, but rather to give a feeliny for the range of topics considered. This collection of reports has been assembled to provide a reasonably comprehensive review of research activities in the Department. We hope that it will be widely useful -- in particular, students in the Department may find it helpful in discovering interesting projects and possible thesis topics. We expect also that it will be of interest to many other people, both within and outside the Department. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/405/CS-TR-74-405.pdf %R CS-TR-74-406 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Memory model for a robot. %A Perkins, W. A. %D January 1974 %X A memory model for a robot has been designed and tested in a simple toy-block world for which it has shown clarity, efficiency, and generality. In a constrained pseudo-English one can ask the program to manipulate objects and query it about the present, past, and possible future states of its world. The program has a good understanding of its world and gives intelligent answers in reasonably good English. Past and hypothetical states of the world are handled by changing the state of the world in an imaginary context. Procedures interrogate and modify two global databases, one which contains the present representation of the world and another which contains the past history of events, conversations, etc. The program has the ability to create, destroy, and even resurrect objects in its world. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/406/CS-TR-74-406.pdf %R CS-TR-74-407 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T FAIL. %A Wright, F. H. G., II %A Gorin, Ralph E. %D April 1974 %X This is a reference manual for FAIL, a fast, one-pass assembler for PDP-10 and PDP-6 machine language. FAIL statements, pseudo-operations, macros, and conditional assembly features are described. Although FAIL uses substantially more main memory than MACRO-10, it assembles typical programs about five time faster. FAIL assembles the entire Stanford time-sharing operating system (two million characters) in less than four minutes of CPU time on a KA-10 processor. FAIL permits an ALGOL-type block structure which provides a way of localizing the usage of some symbols to certain parts of the program, such that the same symbol name can be used to mean different things in different blocks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/407/CS-TR-74-407.pdf %R CS-TR-74-409 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Final report: the first ten years of artificial intelligence research at Stanford. %A Earnest, Lester D. %A McCarthy, John %A Feigenbaum, Edward A. %A Lederberg, Joshua %D July 1973 %X The first ten years of research in artificial intelligence and related fields at Stanford University have yielded significant results in computer vision and control of manipulators, speech recognition, heuristic programming, representation theory, mathematical theory of computation, and modeling of organic chemical processes. This report summarizes the accomplishments and provides bibliographies in each research area. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/409/CS-TR-74-409.pdf %R CS-TR-74-411 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T After Leibniz...: discussions on philosophy and artificial intelligence. %A Anderson, D. Bruce %A Binford, Thomas O. %A Thomas, Arthur J. %A Weyhrauch, Richard W. %A Wilks, Yorick A. %D March 1974 %X This is an edited transcript of informal conversations which we have had over recent months, in which we looked at some of the issues which seem to arise when artificial intelligence and philosophy meet. Our aim was to see what might be some of the fundamental principles of attempts to build intelligent machines. The major topics covered are the relationship of AI and philosophy and what help they might be to each other: the mechanisms of natural inference and deduction; the question of what kind of theory of meaning would be involved in a successful natural language understanding program, and the nature of models in AI research. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/411/CS-TR-74-411.pdf %R CS-TR-74-414 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T GEOMED - a geometric editor. %A Baumgart, Bruce G. %D May 1974 %X GEOMED is a system for doing 3-D geometric modeling; used from a keyboard, it is an interactive drawing program; used as a package of SAIL or LISP accessible subroutines, it is a graphics language. With GEOMED, arbitrary polyhedra can be constructed, moved about and viewed in perspective with hidden lines eliminated. In addition to polyhedra, camera and image models are provided so that simulators relevant to computer vision, problem solving, and animation may be constructed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/414/CS-TR-74-414.pdf %R CS-TR-74-417 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some thoughts on proving clean termination of programs. %A Sites, Richard L. %D May 1974 %X Proof of clean termination is a useful sub-goal in the process of proving that a program is totally correct. Clean termination means that the program terminates (no infinite loops) and that it does so normally, without any execution-time semantic errors (integer overflow, use of undefined variables, subscript out of range, etc.). In contrast to proofs of correctness, proof of clean termination requires no extensive annotation of a program by a human user, but the proof says nothing about the results calculated by the program, just that whatever it does, it terminates cleanly. Two example proofs are given, of previously published programs: TREESORT3 by Robert Floyd, and SELECT by Ronald L. Rivest and Robert Floyd. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/417/CS-TR-74-417.pdf %R CS-TR-74-420 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Partially self-checking ciruits and their use in performing logical operations. %A Wakerly, John F. %D August 1973 %X A new class of circuits called partially self-checking circuits is described. These circuits have one mode of operation called secure mode in which they have the properties of totally self-checking circuits; that is, every fault is tested during normal operation and no fault can cause an undetected error. They also have an insecure mode of operation with the property that any fault which affects a result in insecure mode is tested by some input in secure mode; however, undetected errors may occur in insecure mode. One application of these circuits is in the arithmetic and logic unit of a computer with data encoded in an error-detecting code. While there is no code simpler than duplication which detects single errors in logical operations such as AND and OR, it is shown that there exist partially self-checking networks to perform these operations. A commercially available MSI chip, the 74181 4-bit ALU, can be used in a partially self-checking network to perform arithmetic and logical operations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/420/CS-TR-74-420.pdf %R CS-TR-74-423 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Asymptotic representation of the average number of active modules in an n-way interleaved memory. %A Rao, Gururaj S. %D April 1974 %X In an n-way interleaved memory the effective bandwidth depends on the average number of concurrently active modules. Using a model for the memory which does not permit queueing on busy modules and which assumes an infinite stream of calls on the modules, where the elements in the stream occur with equal probability, the average number is a combinatorial quantity. Hellerman has previously app oximated this quantity by $n^{0.56}$. We show in this paper that the average number is asymptotically equal to $sqrt{\frac{\pi n}{2}} - \frac{1}{3}$. The method is due to Knuth and expresses the combinatorial quantity in terms of the incomplete gamma function and its deriviatives. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/423/CS-TR-74-423.pdf %R CS-TR-74-431 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Pattern-matching rules for the recognition of natural language dialogue expressions. %A Colby, Kenneth Mark %A Parkison, Roger C. %A Faught, William S. %D June 1974 %X Man-machine dialogues using everyday conversational English present problems for computer processing of natural language. Grammar-based parsers which perform a word-by-word, parts-of-speech analysis are too fragile to operate satisfactorily in real time intervieus allowing unrestricted English. In constructing a simulation of paranoid thought processes, we designed an algorithm capable of handling the linguistic expressions used by interviewers in teletyped diagnostic psychiatric interviews. The algorithm uses pattern-matching rules which attempt to characterize the input expressions by progressively transforming them into patterns uhich match, completely or fuzzily, abstract stored patterns. The power of this approach lies in its ability to ignore recognized and unrecognized words and still grasp the meaning of the message. The methods utilized are general and could serve any "host" system uhich takes natural language input. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/431/CS-TR-74-431.pdf %R CS-TR-74-433 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On automating the construction of programs. %A Buchanan, Jack R. %A Luckham, David C. %D May 1974 %X An experimental system for automatically generating certain simple kinds of programs is described. The programs constructed are expressed in a subset of ALGOL containing assignments, function calls, conditional statements, while loops, and non-recursive procedure calls. The input is an environment of primitive programs and programming methods specified in a lnaugage currently used to define the semantics of the output programming language. The system has been used to generate programs for symbolic manipulation, robot control, every day planning, and computing arithmetical functions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/433/CS-TR-74-433.pdf %R CS-TR-74-435 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Balanced computer systems. %A Price, Thomas G. %D April 1974 %X We use the central server model to extend Buzen's results on balance and bottlenecks. We develop two measures which appear to be useful for evaluating and improving computer system performance. The first measure, called the balance index, is useful for balancing requests to the peripheral processors. The second quantity, called the sensitivity index, indicates which processing rates have the most effect on overall system performance. We define the capacity of a central server model as the maximum throughput as we vary the peripheral processor probabilities. We show that the reciprocal of the CPU utilization is a convex function of the peripheral processor probabilities and that a necessary and sufficient condition for the peripheral processor probabilities to achieve capacity is that the balance indexes are equal for all peripheral processors. We give a method to calculate capacity using classical optimization techniques. Finally, we consider the problem of balancing the processing rates of the processors. Two conditions for "balance" are derived. The first condition maximizes our uncertainty about the next state of the system. This condition has several desirable properties concerning throughput, utilizations, overlap, and resistance to changes in job mix. The second condition is based on obtaining the most throughput for a given cost. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/435/CS-TR-74-435.pdf %R CS-TR-74-436 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Natural language understanding systems within the AI paradigm: a survey and some comparisons. %A Wilks, Yorick A. %D December 1974 %X The paper surveys the major projects on the understanding of natural language that fall within what may now be called the artificial intelligence paradigm for natural language systems. Some space is devoted to arguing that the paradigm is now a reality and different in significant respects from the generative paradigm of present day linguistics. The comparisons between systems center around questions of the relative perspicuity of procedural and static representations; the advantages and disadvantages of developing systems over a period survey and some comparisons. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/436/CS-TR-74-436.pdf %R CS-TR-74-439 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the solution of large, structured linear complementarity problems: III. %A Cottle, Richard W. %A Golub, Gene H. %A Sacher, Richard S. %D August 1974 %X This paper addresses the problem of solving a class of specially-structured linear complementarity problems of potentially very large size. An efficient method which couples a modification of the block successive overrelaxation technique and several techniques discussed by the authors in previous papers is proposed. Problems of the type considered arise, for example, in solving approximations to both the free boundary problem for finite-length journal bearings and percolation problems in porous dams by numerical methods. These applications and our computational experience with the method are presented here. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/439/CS-TR-74-439.pdf %R CS-TR-74-442 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Estimating the efficiency of backtrack programs. %A Knuth, Donald E. %D August 1974 %X One of the chief difficulties associated with the so-called backtracking technique for combinatorial problems has been our inability to predict the efficiency of a given algorithm, or to compare the efficiencies of different approaches, without actually writing and running the programs. This paper presents a simple method which produces reasonable estimates for most applications, requiring only a modest amount of hand calculation. The method should prove to be of considerable utility in connection with D. H. Lehmer's branch-and-bound approach to combinatorial optimization. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/442/CS-TR-74-442.pdf %R CS-TR-74-444 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Progress report on program-understanding systems. %A Green, C. Cordell %A Waldinger, Richard J. %A Barstow, David R. %A Elschlager, Robert A. %A Lenat, Douglas B. %A McCune, Brian P. %A Shaw, David E. %A Steinberg, Louis I. %D August 1974 %X This progress report covers the first year and one half of work by our automatic-programming research group at the Stanford Artificial Intelligence Laboratory. Major emphasis has been placed on methods of program specification, codification of programming knowledge, and implementation of pilot systems for program writing and understanding. List processing has been used as the general problem domain for this work. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/444/CS-TR-74-444.pdf %R CS-TR-74-446 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T LCFsmall: an implementation of LCF. %A Aiello, Luigia %A Weyhrauch, Richard W. %D August 1974 %X This is a report on a computer program implementing a simplified version of LCF. It is written (with minor exceptions) entirely in pure LISP and has none of the user oriented features of the implementation described by Milner. We attempt to represent directly in code the metamathematical notions necessary to describe LCF. We hope that the code is simple enough and the metamathematics is clear enough so that properties of this particular program (e.g. its correctness) can eventually be proved. The program is reproduced in full. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/446/CS-TR-74-446.pdf %R CS-TR-74-447 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The semantics of PASCAL in LCF. %A Aiello, Luigia %A Aiello, Mario %A Weyhrauch, Richard W. %D August 1974 %X We define a semantics for the arithmetic part of PASCAL by giving it an interpretation in LCF, a language based on the typed $\lambda$-calculus. Programs are represented in terms of their abstract syntax. We show sample proofs, using LCF, of some general properties of PASCAL and the correctness of some particular programs. A program implementing the McCarthy Airline reservation system is proved correct. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/447/CS-TR-74-447.pdf %R CS-TR-74-455 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Edge-disjoint spanning trees, dominators, and depth-first search. %A Tarjan, Robert Endre %D September 1974 %X This paper presents an algorithm for finding two edge-disjoint spanning trees rooted at a fixed vertex of a directed graph. The algorithm uses depth-first search, an efficient method for computing disjoint set unions, and an efficient method for computing dominators. It requires O(V log V + E) time and O(V + E) space to analyze a graph with V vertices and E edges. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/455/CS-TR-74-455.pdf %R CS-TR-74-456 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T AL, a programming system for automation. %A Finkel, Raphael A. %A Taylor, Russell H. %A Bolles, Robert C. %A Paul, Richard P. %A Feldman, Jerome A. %D November 1974 %X AL is a high-level programming system for specification of manipulatory tasks such as assembly of an object from parts. AL includes an ALGOL-like source language, a translator for converting programs into runnable code, and a runtime system for controlling manipulators and other devices. The system includes advanced features for describing individual motions of manipulators, for using sensory information, and for describing assembly algorithms in terms of common domain-specific primitives. This document describes the design of AL, which is currently being implemented as a successor to the Stanford WAVE system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/456/CS-TR-74-456.pdf %R CS-TR-74-457 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Ten criticisms of PARRY. %A Colby, Kenneth Mark %D September 1974 %X Some major criticisms of a computer simulation of paranoid processes (PARRY) are reviewed and discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/457/CS-TR-74-457.pdf %R CS-TR-74-460 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Random insertion into a priority queue structure. %A Porter, Thomas %A Simon, Istvan %D October 1974 %X The average number of levels that a new element moves up when inserted into a heap is investigated. Two probabilistic models, under which such an average might be computed are proposed. A "lemma of conservation of ignorance" is formulated and used in the derivation of an exact formula for the average in one of these models. It is shown that this average is bounded by a constant and its asymptotic behavior is discussed. Numerical data for the second model is also provided and analyzed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/460/CS-TR-74-460.pdf %R CS-TR-74-462 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A fast, feature-driven stereo depth program. %A Pingle, Karl K. %A Thomas, Arthur J. %D May 1975 %X In this paper we describe a fast, feature-driven program for extracting depth information from stereoscopic sets of digitized TV images. This is achieved by two means: in the simplest case, by statistically correlating variable-sized windows on the basis of visual texture, and in the more complex case by pre-processing the images to extract significant visual features such as corners, and then using these features to control the correlation process. The program runs on the PDP-10 but uses a PDP-11/45 and an SPS-41 Signal Processing Computer as subsidiary processors. The use of the two small, fast machines for the performance of simple but often-repeated computations effects an increase in speed sufficient to allow us to think of using this program as a fast 3-dimensional segmentation method, preparatory to more complex image processing. It is also intended for use in visual feedback tasks involved in hand-eye coordination and automated assembly. The current program is able to calculate the three-dimensional positions of 10 points to within 5 millimeters, using 5 seconds of computation for extracting features, 1 second per image for correlation, and 0.1 second for the depth calculation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/462/CS-TR-74-462.pdf %R CS-TR-74-466 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Recent research in artificial intelligence, heuristic programming, and network protocols. %A Earnest, Lester D. %A McCarthy, John %A Feigenbaum, Edward A. %A Lederberg, Joshua %A Cerf, Vinton G. %D July 1974 %X This is a progress report for ARPA-sponsored research projects in computer science for the period July 1973 to July 1974. Accomplishments are reported in artificial intelligence (especially heuristic programming, robotics, theorem proving, automatic programming, and natural language understanding), mathematical theory of computation, and protocol development for computer communication networks. References to recent publications are provided for each topic. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/466/CS-TR-74-466.pdf %R CS-TR-74-467 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Checking proofs in the metamathematics of first order logic. %A Aiello, Mario %A Weyhrauch, Richard W. %D August 1974 %X This is a report on some of the first experiments of any size carried out using the new first order proof checker FOL. We present two different first order axiomatizations of the metamathematics of the logic which FOL itself checks and show several proofs using each one. The difference between the axiomatizations is that one defines the metamathematics in a many sorted logic, the other does not. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/467/CS-TR-74-467.pdf %R CS-TR-74-468 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A combinatorial base for some optimal matroid intersection algorithms. %A Krogdahl, Stein %D November 1974 %X E. Lawler has given an algorithm for finding maximum weight intersections for a pair of matroids, using linear programming concepts and constructions to prove its correctness. In this paper another theoretical base for this algorithm is given which depends only on the basic properties of matroids, and which involves no linear programming concepts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/468/CS-TR-74-468.pdf %R CS-TR-74-469 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Molecular structure elucidation III. %A Brown, Harold %D December 1974 %X A computer implemented algorithm to solve the following graph theoretical problem is presented: given the empirical formula for a molecule and one or more non-overlapping substructural fragments of the molecule, determine all the distinct molecular structures based on the formula and containing the fragments. That is, given a degree sequence of labeled nodes and one or more connected multigraphs, determine a representative set of the isomorphism classes of the connected multigraphs based on the degree sequence and containing the given multi-graphs as non-overlapping subgraphs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/469/CS-TR-74-469.pdf %R CS-TR-74-470 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stable sorting and merging with optimal space and time bounds. %A Trabb-Pardo, Luis I. %D December 1974 %X This work introduces two algorithms for stable merging and stable sorting of files. The algorithms have optimal worst case time bounds, the merge is linear and the sort is of order n log n. Extra storage requirements are also optimal, since both algorithms make use of a fixed number of pointers. Files are handled only by means of the primitives exchange and comparison of records and basic pointer transformations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/470/CS-TR-74-470.pdf %R CS-TR-74-471 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The interaction of inferences, affects, and intentions, in a model of paranoia. %A Faught, William S. %A Colby, Kenneth Mark %A Parkison, Roger C. %D December 1974 %X The analysis of natural language input into its underlying semantic content is but one of the tasks necessary for a system (human or non-human) to use natural language. Responding to natural language input requires performing a number of tasks: 1) deriving facts about the input and the situation in which it was spoken; 2) attending to the system's needs, desires, and interests; 3) choosing intentions to fulfill these interests; 4) deriving and executing actions from these intentions. We describe a series of processes in a model of paranoia which performs these tasks. We also describe the modifications made by the paranoid processes to the normal processes. A computer program has been constructed to testst this theory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/471/CS-TR-74-471.pdf %R CS-TR-74-472 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford automatic photogrammetry research. %A Quam, Lynn H. %A Hannah, Marsha Jo %D December 1974 %X This report documents the feasiblity study done at Stanford University's Artificial Intelligence Laboratory on the problem of computer automated aerial/orbital photogrammetry. The techniques investigated were based on correlation matching of small areas in digitized pairs of stereo images taken from high altitude or planetary orbit, with the objective of deriving a 3-dimensional model for the surface of a planet. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/472/CS-TR-74-472.pdf %R CS-TR-74-473 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic program verification II: verifying programs by algebraic and logical reduction. %A Suzuki, Norihisa %D December 1974 %X Methods for verifying programs written in a higher level programming language are devised and implemented. The system can verify programs written in a subset of PASCAL, which may have data structures and control structures such as WHILE, REPEAT, FOR, PROCEDURE, FUNCTION and COROUTINE. The process of creation of verification conditions is an extension of the work done by Igarashi, London and Luckham which is based on the deductive theory by Hoare. Verification conditions are proved using specialized simplification and proof techniques, which consist of an arithmetic simplifier, equality replacement rules, fast algorithm for simplifying formulas using propositional truth value evaluation, and a depth first proof search process. The basis of deduction mechanism used in this prover is Gentzen-type formal system. Several sorting programs including Floyd's TREESORT3 and Hoare's FIND are verified. It is shown that the resulting array is not only well-ordered but also a permutation of the input array. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/473/CS-TR-74-473.pdf %R CS-TR-74-474 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic program verification III: a methodology for verifying programs. %A von Henke, Friedrich W. %A Luckham, David C. %D December 1974 %X The paper investigates methods for applying an on-line interactive verification system designed to prove properties of PASCAL programs. The methodology is intended to provide techniques for developing a debugged and verified version starting from a program, that (a) is possibly unfinished in some respects, (b) may not satisfy the given specifications, e.g., may contain bugs, (c) may have incomplete documentation, (d) may be written in non-standard ways, e.g., may depend on user-defined data structures. The methodology involves (i) interactive application of a verification condition generator, an algebraic simplifier and a theorem-prover; (ii) techniques for describlng data structures, type constraints, and properties of programs and subprograms (i.e. lower level procedures); (iii) the use of (abstract) data types in structuring programs and proofs. Within each unit (i.e. segment of a problem), the interactive use is aimed at reduclng verification conditions to manageable proportions so that the non-trivial factors may be analysed. Analysis of verification conditions attempts to localize errors in the program logic, to extend assertions inside the program, to spotlight additional assumptions on program subfunctions (beyond those already specified by the programmer), and to generate appropriate lemmas that allow a verification to be completed. Methods for structuring correctness proofs are discussed that are similar to those of "structured programming". A detailed case study of a pattern matching algorithm illustrating the various aspects of the methodology (including the role played by the user) is given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/74/474/CS-TR-74-474.pdf %R CS-TR-75-476 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A hypothetical dialogue exhibiting a knowledge base for a program-understanding system. %A Green, C. Cordell %A Barstow, David R. %D January 1975 %X A hypothetical dialogue with a fictitious program-understanding system is presented. In the interactive dialogue the computer carries out a detailed synthesis of a simple insertion sort program for linked lists. The content, length and complexity of the dialogue reflect the underlying programming knowledge which would be required for a system to accomplish this task. The nature of the knowledge is discussed and the codification of such programming knowledge is suggested as a major research area in the development of program-understanding systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/476/CS-TR-75-476.pdf %R CS-TR-75-477 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Longest common subsequences of two random sequences. %A Chvatal, Vaclav %A Sankoff, David %D January 1975 %X Given two random k-ary sequences of length n, what is f(n,k), the expected length of their longest common subsequence? This problem arises in the study of molecular evolution. We calculate f(n,k) for all k, where n $\leq$ 5 , and f(n,2) where n $\leq$ 10. We study the limiting behavior of $n^{-1}$f(n,k) and derive upper and lower bounds on these limits for all k. Finally we estimate by Monte-Carlo methods f(100,k), f(1000,2) and f(5000,2). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/477/CS-TR-75-477.pdf %R CS-TR-75-478 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Ill-conditioned eigensystems and the computation of the Jordan canonical form. %A Golub, Gene H. %A Wilkinson, James H. %D February 1975 %X The solution of the complete eigenvalue problem for a non-normal matrix A presents severe practical difficulties when A is defective or close to a defective matrix. However in the presence of rounding errors one cannot even determine whether or not a matrix is defective. Several of the more stable methods for computing the Jordan canonical form are discussed together with the alternative approach of computing well-defined bases (usually orthogonal) of the relevant invariant subspaces. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/478/CS-TR-75-478.pdf %R CS-TR-75-479 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Error bounds in the approximation of eigenvalues of differential and integral operators. %A Chatelin, Francois %A Lemordant, J. %D February 1975 %X Various methods of approximating the eigenvalues and invariant subspaces of nonself-adjoint differential and integral operators are unified in a general theory. Error bounds are given, from which most of the error bounds in the literature can be derived. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/479/CS-TR-75-479.pdf %R CS-TR-75-481 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Hybrid difference methods for the initial boundary-value problem for hyperbolic equations. %A Oliger, Joseph E. %D February 1975 %X The use of lower order approximations in the neighborhood of boundaries coupled with higher order interior approximations is examined for the mixed initial boundary-value problem for hyperbolic partial differential equations. Uniform error can be maintained using smaller grid intervals with the lower order approximations near the boundaries. Stability results are presented for approximations to the initial boundary-value problem for the model equation $u_t$ + ${cu}_x$ = O which are fourth order in space and second order in time in the interior and second order in both space and time near the boundaries. These results are generalized to a class of methods of this type for hyperbolic systems. Computational results are presented and comparisons are made with other methods. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/481/CS-TR-75-481.pdf %R CS-TR-75-483 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On packing squares with equal squares. %A Erdoes, Paul %A Graham, Ronald L. %D March 1975 %X The following problem arises in connection with certain multi-dimensional stock cutting problems: How many non-overlapping open unit squares may be packed into a large square of side $\alpha$? Of course, if $\alpha$ is a positive integer, it is trivial to see that unit squares ean be successfully packed. However, if $\alpha$ is not an integer, the problem beeomes much more complicated. Intuitively, one feels that for $\alpha$ = N + 1/100, say, (where N is an integer), one should pack $N^2$ unit squares in the obvious way and surrender the uncovered border area (which is about $\alpha$/50) as unusable waste. After all, how could it help to place the unit squares at all sorts of various skew angles? In this note, we show how it helps. In particular, we prove that we can always keep the amount of uncovered area down to at most proportional to ${\alpha}^{7/11}$, which for large $\alpha$ is much less than the linear waste produced by the "natural" packing above. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/483/CS-TR-75-483.pdf %R CS-TR-75-484 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On subgraph number independence in trees. %A Graham, Ronald L. %A Szemeredi, Endre %D March 1975 %X For finite graphs F and G, let $N_F$(G) denote the number of occurrences of F in G, i.e., the number of subgraphs of G which are isomorphic to F. If ${\cal F}$ and ${\cal G}$ are families of graphs, it is natural to ask them whether or not the quantities $N_F$(G), $F \in {\cal F}$, are linearly independent when G is restricted to ${\cal G}$. For example, if ${\cal F}$ = {$K_1$,$K_2$} (where $K_n$ denotes the complete graph on n vertices) and ${\cal G}$ is the family of all (finite) $\underline{trees}$ then of course $N_{K_{1}}$(T) - $N_{K_{2}}$(T) = 1 for all $T \in {\cal G}$. Slightly less trivially, if ${\cal F} = {$S_n$: n = 1,2,3,...} (where $S_n$ denotes the $\underline{star}$ on n edges) and ${\cal G}$ again is the family of all trees then $\sum_{n-1}^{\infty} {(-1)}^{n+1} N_{S_{n}} (T) = 1 for all T \in {\cal G}$. It will be proved that such a linear dependence can $\underline{never}$ occur if ${\cal F}$ is finite, no $F \in {\cal F}$ has an isolated point and ${\cal G}$ contains all trees. This result has important applications in recent work of L. Lovasz and one of the authors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/484/CS-TR-75-484.pdf %R CS-TR-75-485 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On multiplicative representations of integers. %A Erdoes, Paul %A Szemeredi, Endre %D March 1975 %X In 1969 it was shown by P. Erdoes that if 0 < $a_1$ < $a_2$ < ... < $a_k \leq x$ is a sequence of integers for which the products $a_i a_j$ are all distinct then the maximum possible value of k satisfies $\pi$(x) + $c_2$ $x^{3/4}$/${(log x)}^{3/2}$ < max k < $\pi$(x) + $c_1$ $x^{3/4}$/$(log x)^{3/2}$ where $\pi$(x) denotes the number of primes not exceeding x and $c_1$ and $c_2$ are absolute constants. In this paper we will be concerned with similar results of the following type. Suppose 0 < $a_1$ < ... < $a_k \leq x$, 0 < $b_1$ < ... < $b_{\ell} \leq x$ are sequences of integers. Let g(n) denote the number of representations of n in the form $a_i b_j$. Then we prove: (i) If g(n) $\leq$ 1 for all n then for some constant $c_3$, k$\ell$ < $c_3 x^2$/log x. (ii) For every c there is an f(c) so that if g(n) $\leq$ c for all n then for some constant $c_4$, k$\ell$ < $c_4 x^2$/log x ${(log log x)}^{f(c)}. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/485/CS-TR-75-485.pdf %R CS-TR-75-486 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Eigenproblems for matrices associated with periodic boundary conditions. %A Bjorck, Ake %A Golub, Gene H. %D March 1975 %X A survey of algorithms for solving the eigenproblem for a class of matrices of nearly tridiagonal form is given. These matrices arise from eigenvalue problems for differentia1 equations where the solution is subject to periodic boundary conditions. Algorithms both for computing selected eigenvalues and eigenvectors and for solving the complete eigenvalue problem are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/486/CS-TR-75-486.pdf %R CS-TR-75-488 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On complete subgraphs of r-chromatic graphs. %A Bollabas, Bela %A Erdoes, Paul %A Szemeredi, Endre %D April 1975 %X Denote by G(p,q) a graph of p vertices and q edges. $K_r$ = G(r,($(^{r}_{2}$)) is the complete graph with r vertices and $K_r$(t) is the complete r chromatic (i.e., r-partite) graph with t vertices in each color class. $G_r$(n) denotes an r-chromatic graph, and $\delta$(G) is the minimal degree of a vertex of graph G. Furthermore denote by $f_r$(n) the smalleest integer so that every $G_r$(n) with $\delta${$G_r$(n)) > $f_r$(n) contains a $K_r$. It is easy to see that $\lim_{n \rightarrow \infty} f_r$(n)/n = $c_r$ exists. We show that $c_4 \geq$ 2 + 1/9 and $c_r \geq$ r-2 + 1/2 - $\frac{1}{2(r-2)}$ for r > 4. We prove that if $\delta${$G_3$(n)) $\geq$ n+t then G contains at least $t^3$ triangles but does not have to contain more than 4$t^3$ of them. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/488/CS-TR-75-488.pdf %R CS-TR-75-489 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Regular partitions of graphs. %A Szemeredi, Endre %D April 1975 %X A crucial lemma in recent work of the author (showing that k-term arithmetic progression-free sets of integers must have density zero) stated (approximately) that any large bipartite graph can be decomposed into relatively few "nearly regular" bipartite subgraphs. In this note we generalize this result to arbitrary graphs, at the same time strengthening and simplifying the original bipartite result. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/489/CS-TR-75-489.pdf %R CS-TR-75-490 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical experiments with the spectral test. %A Gosper, R. William %D May 1975 %X Following Marsaglia and Dieter, the spectral test for linear congruential random number generators is developed from the grid or lattice point model rather than the Fourier transform model. Several modifications to the published algorithms were tried. One of these refinements, which uses results from lesser dimensions to compute higher dimensional ones, was found to decrease the computation time substantially. A change in the definition of the spectral test is proposed in the section entitled "A Question of Independence." %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/490/CS-TR-75-490.pdf %R CS-TR-75-493 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Describing automata in terms of languages associated with their peripheral devices. %A Kurki-Suonio, Reino %D May 1975 %X A unified approach is presented to deal with automata having different kinds of peripheral devices. This approach is applied to pushdown automata and Turing machines, leading to elementary proofs of several well-known theorems concerning transductions, relationship between pushdown automata and context-free languages, as well as homomorphic characterization and undecidability questions. In general, this approach leads to homomorphic characterization of language families generated by a single language by finite transduction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/493/CS-TR-75-493.pdf %R CS-TR-75-500 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Towards better structured definitions of programming languages. %A Kurki-Suonio, Reino %D September 1975 %X The use of abstract syntax and a behavioral model is discussed from the viewpoint of structuring the complexity in definitions of programming languages. A formalism for abstract syntax is presented which reflects the possibility of having one defining occurrence and an arbitrary number of applied occurrences of objects. Attributes can be associated with such a syntax for restricting the set of objects generated, and for defining character string representations and semantic interpretations for the objects. A system of co-operating automata, described by another abstract syntax, is proposed as a behavioral model for semantic definition. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/500/CS-TR-75-500.pdf %R CS-TR-75-501 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Procedural events as software interrupts. %A Pettersen, Odd %D June 1975 %X The paper deals with procedural events, providing a basis for synchronization and scheduling, particularly applied on real-time program systems of multiple parallel activities ("multi-task"). There is a great need for convenient scheduling mechanisms for minicomputer systems as used in process control, but so far mechanisms somewhat similar to those proposed here are found only in PL/I among the generally known high-level languages. PL/I, however, is not very common on computers of this size. Also, the mechanisms in PL/I seem more restricted, as campared to those proposed here. A new type of boolean program variable, the EVENTMARK, is proposed. Eventmarks represent events of any kind that may occur within a computational process and are believed to give very efficient and convenient activation and scheduling of program modules in a real-time system. An eventmark is declared similar to a procedure, and the proposed feature could easily be amended as an extension to existing languages, as well as incorporated in future language designs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/501/CS-TR-75-501.pdf %R CS-TR-75-502 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Synchronization of concurrent processes. %A Pettersen, Odd %D July 1975 %X The paper gives an overview of commonly used synchronization primitives and literature, and presents a new form of primitive expressing conditional critical regions. A new solution is presented to the problem of "readers and writers", utilizing the proposed synchronization primitive. The solution is simpler and shorter than other known algorithms. The first sections of the paper give a tutorial introduction into established methods, in order to provide a suitable background for the remaining parts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/502/CS-TR-75-502.pdf %R CS-TR-75-503 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The macro processing system STAGE2: transfer of comments to the generated text. %A Pettersen, Odd %D July 1975 %X This paper is a short description of a small extension of STAGE2, providing possibilities to copy comments etc. from the source text to the generated text. The description presupposes familiarity with the STAGE2 system: its purpose, use and descriptions. Only section 3 of this paper requires knowledge of the internal structures and working of the system, and that section is unnecessary for the plain use of the described feature. The extension, if not used, is completely invisible to the user: No rules, as described in the original literature, are changed. A user, unaware of the extension, will see no difference from the original version. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/503/CS-TR-75-503.pdf %R CS-TR-75-504 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On sparse graphs with dense long paths. %A Erdoes, Paul %A Graham, Ronald L. %A Szemeredi, Endre %D September 1975 %X The following problem was raised by H.-J. Stoss in connection with certain questions related to the complexity of Boolean functions. An acyclic directed graph G is said to have property P(m,n) if for any set X of m vertices of G, there is a directed path of length n in G which does not intersect X. Let f(m,n) denote the minimum number of edges a graph with porperty P(m,n) can have. The problem is to estimate f(m,n). For the remainder of the paper, we shall restrict ourselves to the case m = n. We shall prove (1) $c_1$n log n/log log n < f(n,n) < $c_2$n log n (where $c_1$,$c_2$,..., will hereafter denote suitable positive constaints). In fact, the graph we construct in order to establish the upper bound on f(n,n) in (1) will have just $c_3$n vertices. In this case the upper bound in (1) is essentially best possible since it will also be shown that for $c_4$ sufficiently large, every graph on $c_4$n vertices having property P(n,n) must have at least $c_5$n log n edges. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/504/CS-TR-75-504.pdf %R CS-TR-75-505 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some linear programming aspects of combinatorics. %A Chvatal, Vaclav %D September 1975 %X This is the text of a lecture given at the Conference on Algebraic Aspects of Combinatorics at the University of Toronto in January 1975. The lecture was expository, aimed at an audience with no previous knowledge of linear programming. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/505/CS-TR-75-505.pdf %R CS-TR-75-506 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Operational reasoning and denotational semantics. %A Gordon, Michael J. C. %D August 1975 %X "Obviously true" properties of programs can be hard to prove when meanings are specified with a denotational semantics. One cause of this is that such a semantics usually abstracts away from the running process - thus properties which are obvious when one thinks about this lose the basis of their obviousness in the absence of it. To enable process-based intuitions to be used in constructing proofs one can associate with the semantics an abstract interpreter so that reasoning about the semantics can be done by reasoning about computations on the interpreter. This technique is used to prove several facts about a semantics of pure LISP. First a denotatlonal semantics and an abstract interpreter are described. Then it is shown that the denotation of any LISP form is correctly computed by the interpreter. This is used to justify an inference rule - called "LlSP-induction" - which formalises induction on the size of computations on the interpreter. Finally LlSP-induction is used to prove a number of results. In particular it is shown that the function eval is correct relative to the semantics - i.e. that it denotes a mapping which maps forms (coded as S-expressions) on to their correct values. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/506/CS-TR-75-506.pdf %R CS-TR-75-507 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Towards a semantic theory of dynamic binding. %A Gordon, Michael J. C. %D August 1975 %X The results in this paper contribute to the formulation of a semantic theory of dynamic binding (fluid variables). The axioms and theorems are language independent in that they don't talk about programs - i.e, syntactic objects - but just about elements in certain domains. Firstly the equivalence (in the circumstances where it's true) of "tying a knot" through the environment (elaborated in the paper) and taking a least fixed point is shown. This is central in proving the correctness of LISP "eval" type interpreters. Secondly the relation which must hold between two environments if a program is to have the same meaning in both is established. It is shown how the theory can be applied to LISP to yield previously known facts. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/507/CS-TR-75-507.pdf %R CS-TR-75-508 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On computing the transitive closure of a relation. %A Eve, James %D September 1975 %X An algorithm is presented for computing the transitive closure of an arbitrary relation which is based upon a variant of Tarjan's algorithm [1972] for finding the strongly connected components of a directed graph. This variant leads to a more compact statement of Tarjan's algorithm. If V is the number of vertices in the directed graph representing the relation then the worst case behavior of the proposed algorithm involves O($V^3$) operations. In this respect it is inferior to existing algorithms which require O($V^3$/log V) and O($V^{{log}_2 7}$ log V) operations respectively. The best case behavior involves only O($V^2$) operations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/508/CS-TR-75-508.pdf %R CS-TR-75-509 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finding the maximal incidence matrix of a large graph. %A Overton, Michael L. %A Proskurowski, Andrzej %D September 1975 %X This paper deals with the computation of two canonical representations of a graph. A computer program is presented which searches for "the maximal incidence matrix" of a large connected graph without multiple edges or self-loops. The use of appropriate algorithms and data structures is discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/509/CS-TR-75-509.pdf %R CS-TR-75-511 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Software implementation of a new method of combinatorial hashing. %A Dubost, Pierre %A Trousse, Jean-Michel %D September 1975 %X This is a study of the software implementation of a new method of searching with retrieval on secondary keys. A new family of partial match file designs is presented, the 'worst case' is determined, a detailed algorithm and program are given and the average execution time is studied. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/511/CS-TR-75-511.pdf %R CS-TR-75-512 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Applications of path compression on balanced trees. %A Tarjan, Robert Endre %D August 1975 %X We devise a method for computing functions defined on paths in trees. The method is based on tree manipulation techniques first used for efficiently representing equivalence relations. It has an almost-linear running time. We apply the method to give O(m $\alpha$(m,n)) algorithms for two problems. A. Verifying a minimum spanning tree in an undirected graph (best previous bound: O(m log log n) ). B. Finding dominators in a directed graph (best previous bound: O(n log n + m) ). Here n is the number of vertices and m the number of edges in the problem graph, and $\alpha$(m,n) is a very slowly growing function which is related to a functional inverse of Ackermann's function. The method is also useful for solving, in O(m $\alpha$(m,n)) time, certain kinds of pathfinding problems on reducible graphs. Such problems occur in global flow analysis of computer programs and in other contexts. A companion paper will discuss this application. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/512/CS-TR-75-512.pdf %R CS-TR-75-513 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A survey of techniques for fixed radius near neighbor searching. %A Bentley, Jon Louis %D August 1975 %X This paper is a survey of techniques used for searching in a multidimensional space. Though we consider specifically the problem of searching for fixed radius near neighbors (that is, all points within a fixed distance of a given point), the structures presented here are applicable to many different search problems in multidimensional spaces. The orientation of this paper is practical; no theoretical results are presented. Many areas open for further research are mentioned. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/513/CS-TR-75-513.pdf %R CS-TR-75-514 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A microprogram control unit based on a tree memory. %A Tokura, Nobuki %D August 1975 %X A modularized control unit for microprocessors is proposed that implements ancestor tree programs. This leads to a reduction of storage required for address information. The basic architecture is extended to paged tree memory to enhance the memory space usage. Finally, the concept of an ancestor tree with shared subtrees is introduced, and the existence of an efficient algorithm to find sharable subtrees is shown. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/514/CS-TR-75-514.pdf %R CS-TR-75-517 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Distances in orientations of graphs. %A Chvatal, Vaclav %A Thomassen, Carsten %D August 1975 %X We prove that there is a function h(k) such that every undirected graph G admits an orientation H with the following property: if an edge uv belongs to a cycle of length k in G, then uv or vu belongs to a directed cycle of length at most h(k) in H. Next, we show that every undirected bridgeless graph of radius r admits an orientation of radius at most $R^2$+r, and this bound is best possible. We consider the same problem with radius replaced by diameter. Finally, we show that the problem of deciding whether an undirected graph admits an orientation of diameter (resp. radius) two belongs to a class of problems called NP-hard. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/517/CS-TR-75-517.pdf %R CS-TR-75-518 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Aggregation of inequalities in integer programming. %A Chvatal, Vaclav %A Hammer, Peter L. %D August 1975 %X Given an m $\times$ n zero-one matrix $\underset\tilde\to A$ we ask whether there is a single linear inequality $\underset\tilde\to a \underset\tilde\to x \leq b$ whose zero-one solutions are precisely the zero-one solutions of $\underset\tilde\to A \underset\tilde\to x \leq e$. We develop an algorithm for answering this question in O(m$n^2$) steps and investigate other related problems. Our results may be interpreted in terms of graph theory and threshold logic. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/518/CS-TR-75-518.pdf %R CS-TR-75-520 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the representation of data structures in LCF with applications to program generation. %A von Henke, Friedrich W. %D September 1975 %X In this paper we discuss techniques of exploiting the obvious relationship between program structure and data structure for program generation. We develop methods of program specification that are derived from a representation of recursive data structures in the Logic for Computable Functions (LCF). As a step towards a formal problem specification language we define definitional extensions of LCF. These include a calculus for (computable) homogeneous sets and restricted quantification. Concepts that are obtained by interpreting data types as algebras are used to derive function definition schemes from an LCF term representing a data structure; they also lead to techniques for the simplification of expressions in the extended language. The specification methods are illustrated with a detailed example. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/520/CS-TR-75-520.pdf %R CS-TR-75-521 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Depth perception in stereo computer vision. %A Thompson, Clark %D October 1975 %X This report describes a stereo vision approach to depth perception; the author has build upon a set of programs that decompose the problem in the following way: 1) Production of a camera model: the position and orientation of the cameras in 3-space. 2) Generation of matching point-pairs: loci of corresponding features in the two pictures. 3) Computation of the point in 3-space for each point-pair. 4) Presentation of the resultant depth information. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/521/CS-TR-75-521.pdf %R CS-TR-75-522 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic program verification IV: proof of termination within a weak logic of programs. %A Luckham, David C. %A Suzuki, Norihisa %D October 1975 %X A weak logic of programs is a formal system in which statements that mean "the program halts" cannot be expressed. In order to prove termination, we would usually have to use a stronger logical system. In this paper we show how we can prove termination of both iterative and recursive programs within a weak logic by adding pieces of code and placing restrictions on loop invariants and entry conditions. Thus, most of the existing verifiers which are based on a weak logic of programs can be used to prove termination of programs without any modification. We give examples of proofs of termination and of accurate bounds on computation time that were obtained using the Stanford Pascal program verifier. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/522/CS-TR-75-522.pdf %R CS-TR-75-523 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T BAIL: a debugger for SAIL. %A Reiser, John F. %D October 1975 %X BAIL is a debugging aid for SAIL programs, where SAIL is an extended dialect of ALGOL60 which runs on the PDP-10 computer. BAIL consists of a breakpoint package and an expression interpreter which allow the user to stop his program at selected points, examine and change the values of variables, and evaluate general SAIL expressions. In addition, BAIL can display text from the source file corresponding to the current location in the program. In may respects BAIL is like DDT or RAID, except that BAIL is oriented towards SAIL and knows about SAIL data types, primitive operations, and procedure implementation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/523/CS-TR-75-523.pdf %R CS-TR-75-526 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Graph theory and Gaussian elimination. %A Tarjan, Robert Endre %D November 1975 %X This paper surveys graph-theoretic ideas which apply to the problem of solving a sparse system of linear equations by Gaussian elimination. Included are a discussion of bandwidth, profile, and general sparse elimination schemes, and of two graph-theoretic partitioning methods. Algorithms based on these ideas are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/526/CS-TR-75-526.pdf %R CS-TR-75-527 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Center for Reliable Computing: current research. %A McCluskey, Edward J. %A Wakerly, John F. %A Ogus, Roy C. %D October 1975 %X This report summarizes the research work which has been performed, and is currently active in the Center for Reliable Computing in the Digital Systems Laboratory, Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/527/CS-TR-75-527.pdf %R CS-TR-75-528 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Solving path problems on directed graphs. %A Tarjan, Robert Endre %D October 1975 %X This paper considers path problems on directed graphs which are solvable by a method similar to Gaussian elimination. The paper gives an axiom system for such problems which is a weakening of Salomaa's axioms for a regular algebra. The paper presents a general solution method which requires O($n^3$) time for dense graphs with n vertices and considerably less time for sparse graphs. The paper also presents a decomposition method which solves a path problem by breaking it into subproblems, solving each subproblem by elimination, and combining the solutions. This method is a generalization of the "reducibility" notion of data flow analysis, and is a kind of single-element "tearing". Efficiently implemented, the method requires O(m $\alpha$(m,n)) time plus time to solve the subproblems, for problem graphs with n vertices and m edges. Here $\alpha$(m,n) is a very slowly growing function which is a functional inverse of Ackermann's function. The paper considers variants of the axiom system for which the solution methods still work, and presents several applications including solving simultaneous linear equations and analyzing control flow in computer programs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/528/CS-TR-75-528.pdf %R CS-TR-75-530 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An adaptive finite difference solver for nonlinear two point boundary problems with mild boundary layers. %A Lentini, M. %A Pereyra, Victor %D November 1975 %X A variable order variable step finite difference algorithm for approximately solving m-dimensional systems of the form y' = f(t,y), t $\in$ [a,b] subject to the nonlinear boundary conditions g(y(a),y(b)) = 0 is presented. A program, PASVAR, implementing these ideas has been written and the results on several test runs are presented together with comparisons with other methods. The main features of the new procedure are: a) Its ability to produce very precise global error estimates, which in turn allow a very fine control between desired tolerance and actual output precision. b) Non-uniform meshes allow an economical and accurate treatment of boundary layers and other sharp changes in the solutions. c) The combination of automatic variable order (via deferred corrections) and automatic (adaptive) mesh selection produces, as in the case of initial value problem solvers, a versatile, robust, and efficient algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/530/CS-TR-75-530.pdf %R CS-TR-75-531 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Algorithmic aspects of vertex elimination on directed graphs. %A Rose, Donald J. %A Tarjan, Robert Endre %D November 1975 %X We consider a graph-theoretic elimination process which is related to performing Gaussian elimination on sparse systems of linear eauations. We give efficient algorithms to: (1) calculate the fill-in produced by any elimination ordering; (2) find a perfect elimination ordering if one exists; and (3) find a minimal elimination ordering. We also show that problems (1) and (2) are at least as time-consuming as testing whether a directed graph is transitive, and that the problem of finding a minimum ordering is NP-complete. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/531/CS-TR-75-531.pdf %R CS-TR-75-532 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bibliography of Computer Science Department technical reports. %A Jacobs, Patricia E. %D November 1975 %X This report lists, in chronological order, all reports from the Stanford Computer Science series (STAN-CS-xx-xxx), Artificial Intelligence Memos (AIM), Digital Systems Laboratory Technical reports (TR) and Technical Notes (TN), plus Stanford Linear Accelerator Center publications (SLACP) and reports (SLACR). Also, for the first time, we have provided an author index for these reports (at the end of the report listings). The bibliography issued in October of 1973 is hereby brought up to date. Each report is identified by title, author's name, National Technical Information Service (NTIS) retrieval number, date, number of pages and the computer science areas treated. Subsequent journal publication (when known) is also indicated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/532/CS-TR-75-532.pdf %R CS-TR-75-536 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interactive generation of object models with a manipulator. %A Grossman, David D. %A Taylor, Russell H. %D December 1975 %X Manipulator programs in a high level language consist of manipulation procedures and object model declaratlons. As higher level languages are developed, the procedures wlll shrink while the declarations will grow. This trend makes it desirable to develop means for automating the generation of these declarations. A system is proposed which would permit users to specify certain object models interactively, using the manipulator itself as a measuring tool in three dimensions. A preliminary version of the system has been tested. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/536/CS-TR-75-536.pdf %R CS-TR-75-537 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Verification Vision within a programmable assembly system: an introductory discussion. %A Bolles, Robert C. %D December 1975 %X This paper defines a class of visual feedback tasks called "Verification Vision" which includes a significant portion of the feedback tasks required within a programmable assembly system. It characterizes a set of general-purpose capabilities which, if implemented, would provide a user with a system in which to write programs to perform such tasks. Example tasks and protocols are used to motivate these semantic capabilities. Of particular importance are the tools required to extract as much information as possible from planning and/or training sessions. Four different levels of verification systems are discussed. They range from a straightforward interactive system which could handle a subset of the verification vision tasks, to a completely automatic system which could plan its own strategies and handle the total range of verification tasks. Several unsolved problems in the area are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/537/CS-TR-75-537.pdf %R CS-TR-75-539 %Z Wed, 23 Aug 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A new approach to recursive programs. %A Manna, Z ohar %A Shamir, Adi %D December 1975 %X In this paper we critically evaluate the classical least-fixedpoint approach towards recursive programs. We suggest a new approach which extracts the maximal amount of valuable information embedded in the programs. The presentation is informal, with emphasis on examples. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/539/CS-TR-75-539.pdf %R CS-TR-75-482 %Z Wed, 10 Jun 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Algroithm for Finding Best Matches in Logarithmic Expected Time %A Friedman, Jerome %A Bentley, Jon Louis %A Finkel, Raphael Ari %D July 1976 %X An algorithm and data structure are presented for searching a file containing N records, each described by k real valued keys, for the m closest matches or nearest neighbors to a given query record. The computation required to organize the file is proportional to kNlogN. The expected number of records examined in each search is independent of the file size. The expected computation to perform each search is proportional to logN. Empirical evidence suggests that except for very small files, this algorithm is considerably faster than other methods. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/75/482/CS-TR-75-482.pdf %R CS-TR-73-330 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Axioms and theorems for integers, lists and finite sets in LCF. %A Newey, Malcolm C. %D January 1973 %X LCF (Logic for Computable Functions) is being promoted as a formal language suitable for the discussion of various problems in the Mathematical Theory of Computation (MTC). To this end, several examples of MTC problems have been formalised and proofs have been exhibited using the LCF proof-checker. However, in these examples, there has been a certain amount of ad-hoc-ery in the proofs; namely, many mathematical theorems have been assumed without proof and no axiomatisation of the mathematical domains involved was given. This paper describes a suitable mathematical environment for future LCF experiments and its axiomatic basis. The environment developed, deemed appropriate for such experiments, consists of a large body of theorems from the areas of integer arithmetic, list manipulation and finite set theory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/330/CS-TR-73-330.pdf %R CS-TR-73-331 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The computing time of the Euclidean algorithm. %A Collins, George E. %D January 1973 %X The maximum, minimum and average computing times of the classical Euclidean algorithm for the greatest common divisor of two integers are derived, to within codominance, as functions of the lengths of the two inputs and the output. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/331/CS-TR-73-331.pdf %R CS-TR-73-332 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Models of LCF. %A Milner, Robin %D January 1973 %X LCF is a deductive system for computable functions proposed by D. Scott in 1969 in an unpublished memorandum. The purpose of the present paper is to demonstrate the soundness of the system with respect to certain models, which are partially ordered domains of continuous functions. This demonstration was supplied by Scott in his memorandum; the present paper is merely intended to make this work more accessible. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/332/CS-TR-73-332.pdf %R CS-TR-73-333 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the power of programming features. %A Chandra, Ashok K. %A Manna, Z ohar %D January 1973 %X We consider the power of several programming features such as counters, pushdown stacks, queues, arrays, recursion and equality. In this study program schemas are used as the model for computation. The relations between the powers of these features is completely described by a comparison diagram. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/333/CS-TR-73-333.pdf %R CS-TR-73-334 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T URAND: a universal random number generator. %A Malcolm, Michael A. %A Moler, Cleve B. %D January 1973 %X A subroutine for generating uniformly-distributed floating-point numbers in the interval [O,1) is presented in ANSI standard Fortran. The subroutine, URAND, is designed to be relatively machine independent. URAND has undergone minimal testing on various machines and is thought to work properly on any machine having binary integer number representation, integer multiplication modulo m and integer addition either modulo m or yielding at least ${log}_2$ (m) significant bits, where m is some integral power of 2. Upon the first call of URAND, the value of m is automatically determined and appropriate constants for a linear congruential generator are computed following the suggestions of D. E. Knuth, volume 2. URAND is guaranteed to have a full-length cycle. Readers are invited to apply their favorite statistical tests to URAND, using any binary machine, and report the results to the authors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/334/CS-TR-73-334.pdf %R CS-TR-73-335 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computation of the stationary distribution of an infinite Markov matrix. %A Golub, Gene H. %A Seneta, Eugene %D January 1973 %X An algorithm is presented for computing the unique stationary distribution of an infinite stochastic matrix possessing at least one column whose elements are bounded away from zero. Elementwise convergence rate is discussed by means of two examples. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/335/CS-TR-73-335.pdf %R CS-TR-73-337 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Aesthetics systems. %A Gips, James %A Stiny, George %D January 1973 %X The formal structure of aesthetics systems is defined. Aesthetics systems provide for the essential tasks of interpretation and evaluation in aesthetic analysis. Kolmogorov's formulation of information theory is applicable. An aesthetics system for a class of non-representational, geometric paintings and its application to three actual paintings is described in the Appendix. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/337/CS-TR-73-337.pdf %R CS-TR-73-338 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A finite basis theorem revisited. %A Klarner, David A. %D February 1973 %X Let S denote a set of k-dimensional boxes each having integral sides. Let $\Gamma$(S) denote the set of all boxes which can be filled completely with translates of elements of S. It is shown here that S contains a finite subset B such that $\Gamma$(B) = $\Gamma$(S). This result was proved for k = 1,2 in an earlier paper, but the proof for k > 2 contained an error. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/338/CS-TR-73-338.pdf %R CS-TR-73-339 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computation of the limited information maximum likelihood estimator. %A Dent, Warren T. %A Golub, Gene H. %D February 1973 %X Computation of the Limited Information Maximum Likelihood Estimator (LIMLE) of the set of coefficients in a single equation of a system of interdependent relations is sufficiently complicated to detract from other potentially interesting properties. Although for finite samples the LIMLE has no moments, asymptotically it remains normally distributed and retains other properties associated with maximum likelihood. The most extensive application of the estimator has been made in the Brookings studies. We believe that current methods of estimation are clumsy, and present a numerically stable estimation schema based on Householder transformations and the singular value decomposition. The analysis permits a convenient demonstration of equivalence with the Two Stage Least Squares Estimator (TSLSE) in the instance of just identification. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/339/CS-TR-73-339.pdf %R CS-TR-73-340 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Notes on a problem involving permutations as subsequences. %A Newey, Malcolm C. %D March 1973 %X The problem (attributed to R. M. Karp by Knuth) is to describe the sequences of minimum length which contain, as subsequences, all the permutations of an alphabet of n symbols. This paper catalogs some of the easy observations on the problem and proves that the minimum lengths for n=5, n=6 & n=7 are 19, 28 and 39 respectively. Also presented is a construction which yields (for n>2) many appropriate sequences of length $n^2$-2n+4 so giving an upper bound on length of minimum strings which matches exactly all known values. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/340/CS-TR-73-340.pdf %R CS-TR-73-341 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A heuristic approach to program verification. %A Katz, Shmuel M. %A Manna, Z ohar %D March 1973 %X We present various heuristic techniques for use in proving the correctness of computer programs. The techniques are designed to obtain automatically the "inductive assertions" attached to the loops of the program which previously required human "understanding" of the program's performance. We distinguish between two general approaches: one in which we obtain the inductive assertion by analyzing predicates which are known to be true at the entrances and exits of the loop ($underline{top-down}$ approach), and another in which we generate the inductive assertion directly from the statements of the loop ($underline{bottom-up}$ approach). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/341/CS-TR-73-341.pdf %R CS-TR-73-342 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Matroid partitioning. %A Knuth, Donald E. %D March 1973 %X This report discusses a modified version of Edmonds's algorithm for partitioning of a set into subsets independent in various given matroids. If ${\cal M}_1$,...,${\cal M}_k$ are matroids defined on a finite set E, the algorithm yields a simple necessary and sufficient condition for whether or not the elements of E can be colored with k colors such that (i) all elements of color j are independent in ${\cal M}_j$, and (ii) the number of elements of color j lies between given limits, $n_j \leq \| E_j \| \leq {n'}_j$. The algorithm either finds such a coloring or it finds a proof that none exists, after making at most $n^3$ + $n^2$k tests of independence in the given matroids, where n is the number of elements in E. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/342/CS-TR-73-342.pdf %R CS-TR-73-344 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The fourteen primitive actions and their inferences. %A Schank, Roger C. %D March 1973 %X In order to represent the conceptual information underlying a natural language sentence, a conceptual structure has been established that uses the basic actor-action-object framework. It was the intent that these structures have only one representation for one meaning, regardless of the semantic form of the sentence being represented. Actions were reduced to their basic parts so as to effect this. It was found that only fourteen basic actions were needed as building blocks by which all verbs can be represented. Each of these actions has a set of actions or states which can be inferred when they are present. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/344/CS-TR-73-344.pdf %R CS-TR-73-345 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The minimum root separation of a polynomial. %A Collins, George E. %A Horowitz, Ellis %D April 1973 %X The minimum root separation of a complex polynomial A is defined as the minimum of the distances between distinct roots of A. For polynomials with Gaussian integer coefficients and no multiple roots, three lower bounds are derived for the root separation. In each case the bound is a function of the degree, n, of A and the sum, d, of the absolute values of the coefficients of A. The notion of a semi-norm for a commutative ring is defined, and it is shown how any semi-norm can be extended to polynomial rings and matrix rings, obtaining a very general analogue of Hadamard's determinant theorem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/345/CS-TR-73-345.pdf %R CS-TR-73-347 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Multidimensional analysis in evaluating a simulation of paranoid thought. %A Colby, Kenneth Mark %A Hilf, Franklin Dennis %D May 1973 %X The limitations of Turing's Test as an evaluation procedure are reviewed. More valuable are tests which ask expert judges to make ratings along multiple dimensions essential to the model. In this way the model's weaknesses become clarified and the model builder learns where the model must be improved. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/347/CS-TR-73-347.pdf %R CS-TR-73-346 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The rationale for computer based treatment of language difficulties in nonspeaking autistic children. %A Colby, Kenneth Mark %D March 1973 %X The principles underlying a computer-based treatment method for language acquisition in nonspeaking autistic children are described. The main principle involves encouragement of exploratory learning with minimum adult interference. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/346/CS-TR-73-346.pdf %R CS-TR-73-348 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T High order finite difference solution of differential equations. %A Pereyra, Victor %D April 1973 %X These seminar notes give a detailed treatment of finite difference approximations to smooth nonlinear two-point boundary value problems for second order differential equations. Consistency, stability, convergence, and asymptotic expansions are discussed. Most results are stated in such a way as to indicate extensions to more general problems. Successive extrapolations and deferred corrections are described and their implementations are explored thoroughly. A very general deferred correction generator is developed and it is employed in the implementation of a variable order, variable (uniform) step method. Complete FORTRAN programs and extensive numerical experiments and comparisons are included together with a set of 48 references. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/348/CS-TR-73-348.pdf %R CS-TR-73-349 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Two papers on the selection problem: Time Bounds for Selection [by Manual Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan] and Expected Time Bounds for Selection [by Robert W. Floyd and Ronald L. Rivest]. %A Blum, Manual %A Floyd, Robert W. %A Pratt, Vaughan R. %A Rivest, Ronald L. %A Tarjan, Robert Endre %D April 1973 %X (1) The number of comparisons required to select the i-th smallest of n numbers is shown to be at most a linear function of n by analysis of a new selection algorithm -- PICK. Specifically, no more than 5.4305 n comparisons are ever required. This bound is improved for extreme values of i, and a new lower bound on the requisite number of comparisons is also proved. (2) A new selection algorithm is presented which is shown to be very efficient on the average, both theoretically and practically. The number of comparisons used to select the i-th smallest of n numbers is n + min(i,n-i) + o(n). A lower bound within 9% of the above formula is also derived. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/349/CS-TR-73-349.pdf %R CS-TR-73-350 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An almost-optimal algorithm for the assembly line scheduling problem. %A Kaufman, Marc T. %D January 1973 %X This paper considers a solution to the multiprocessor scheduling problem for the case where the ordering relation between tasks can be represented as a tree. Assume that we have n identical processors, and a number of tasks to perform. Each task $T_i$ requires an amount of time ${\mu}_i$ to complete, 0 < ${\u}_i \leq$ k, so that k is an upper bound on task length. Tasks are indivisible, so that a processor once assigned must remain assigned until the task completes (no preemption). Then the "longest path" scheduling method is almost-optimal in the following sense: Let $\omage$ be the total time required to process all of the tasks by the "longest path" algorithm. Let ${\omega}_o$ be the minimal time in which all of the tasks can be processed. Let ${\omega}_p$ be the minimal time to process all of the tasks if arbitrary preemption of processors is allowed. Then: ${\omega}_p \leq {\omega}_o \leq \omega \leq {\omega}_p$ + k - k/n, where n is the number of processors available to any of the algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/350/CS-TR-73-350.pdf %R CS-TR-73-351 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Performance of an I/O channel with multiple paging drums (digest edition). %A Fuller, Samuel H. %D August 1972 %X For rotating storage units, a paging drum organization is known to offer substantially better response time to I/O requests than is a more conventional (file) organization [Abate and Dubner, 1969; Fuller and Baskett, 1972]. When several, asynchronous paging drums are attached to a single I/O channel, however, much of the gain in response time due to the paging organization is lost; this article investigates the reasons for this loss in performance. A model of an I/O channel with multiple paging drums is presented and we embed into the model a Markov chain that closely approximates the behavior of the I/O channel. The analysis then leads to the moment generating function of sector queue size and the Laplace-Stieltjes transform of the waiting time. A significant observation is that the expected waiting time for an I/O request to a drum can be divided into two terms: one independent of the load of I/O requests to the drum and another that monotonically increases with increasing load. Moreover, the load varying term of the waiting time is nearly proportional to (2 - l/k) where k is the number of drums connected to the I/O channel. The validity of the Markov chain approximation is examined in several cases by a comparison of the analytic results to the actual performance of an I/O channel with several paging drums. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/351/CS-TR-73-351.pdf %R CS-TR-73-352 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The expected difference between the SLTF and MTPT drum scheduling disciplines (digest edition). %A Fuller, Samuel H. %D August 1972 %X This report is a sequel to an earlier report [Fuller, 1971] that develops a minimal-total-processing-time (MTPT) drum scheduling algorithm. A quantitative comparison between MTPT schedules and shortest-latency-time-first (SLTF) schedules, commonly acknowledged as good schedules for drum-like storage units, is presented here. The analysis develops an analogy to random walks and proves several asymptotic properties of collections of records on drums. These properties are specialized to the MTPT and SLTF algorithms and it is shown that for sufficiently large sets of records, the expected processing time of a SLTF schedule is longer than a MTPT schedule by the expected record length. The results of a simulation study are also presented to show the difference in MTPT and SLTF schedules for small sets of records and for situations not covered in the analytic discussion. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/352/CS-TR-73-352.pdf %R CS-TR-73-353 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Random arrivals and MTPT disk scheduling disciplines. %A Fuller, Samuel H. %D August 1972 %X This article investigates the application of minimal-total-processing-time (MTPT) scheduling disciplines to rotating storage units when random arrival of requests is allowed. Fixed-head drum and moving-head disk storage units are considered and particular emphasis is placed on the relative merits of the MTPT scheduling discipline with respect to the shortest-latency-time-first (SLTF) scheduling discipline. The data presented are the results of simulation studies. Situations are discovered in which the MTPT discipline is superior to the SLTF discipline, and situations are also discovered in which the opposite is true. An implementation of the MTPT scheduling algorithm is presented and the computational requirements of the algorithm are discussed. It is shown that the sorting procedure is the most time consuming phase of the algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/353/CS-TR-73-353.pdf %R CS-TR-73-354 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The number of SDR's in certain regular systems. %A Klarner, David A. %D April 1973 %X Let ($a_1$,...,$a_k$) = $\bar{a}$ denote a vector of numbers, and let C($\bar{a}$,n) denote the n $\times$ n cyclic matrix having ($a_1$,...,$a_k$,0,...,0) as its first row. It is shown that the sequences (det C($\bar{a}$,n): n = k,k+1,...) and (per C($\bar{a}$,n): n = k,k+1,...) satisfy linear homogeneous difference equations with constant coefficients. The permanent, per C, of a matrix C is defined like the determinant except that one forgets about ${(-1)}^{sign \pi}$ where $\pi$ is a permutation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/354/CS-TR-73-354.pdf %R CS-TR-73-355 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An analysis of central processor scheduling in multiprogrammed computer systems (digest edition). %A Price, Thomas G. %D October 1972 %X A simple finite source model is used to gain insight into the effect of central processor scheduling in multiprogrammed computer systems. CPU utilization is chosen as the measure of performance and this decision is discussed. A relation between CPU utilization and flow time is developed. It is shown that the shortest-remaining-processing-time discipline maximizes both CPU utilization and I/O utilization for the queueing model M/G/1/N. An exact analysis of processor utilization using shortest-remaining-processing-time scheduling for systems with two jobs is given and it is observed that the processor utilization is independent of the form of the processing time distribution. The effect of the CPU processing time distribution on performance is discussed. For first-come-first-served scheduling, it is shown that distributions with the same mean and variance can yield significantly different processor utilizations and that utilization may or may not significantly decrease with increasing variance. The results are used to compare several scheduling disciplines of practical interest. An approximate expression for CPU utilization using shortest-remaining-processing-time scheduling in systems with N jobs is given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/355/CS-TR-73-355.pdf %R CS-TR-73-356 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MLISP2. %A Smith, David Canfield %A Enea, Horace J. %D May 1973 %X MLISP2 is a high-level programming language based on LISP. Features: 1. The notation of MLISP. 2. Extensibility---the ability to extend the language and to define new languages. 3. Pattern matching---the ability to match input against context free or sensitive patterns. 4. Backtracking--the ability to set decision points, manipulate contexts and backtrack. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/356/CS-TR-73-356.pdf %R CS-TR-73-357 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A conceptually based sentence paraphraser. %A Goldman, Neil M. %A Riesbeck, Christopher K. %D May 1973 %X This report describes a system of programs which performs natural language processing based on an underlying language free (conceptual) representation of meaning. This system is used to produce sentence paraphrases which demonstrate a form of understanding with respect to a given context. Particular emphasis has been placed on the major subtasks of language analysis (mapping natural language into conceptual structures) and language generation (mapping conceptual structures into natural language), and on the interaction between these processes and a conceptual memory model. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/357/CS-TR-73-357.pdf %R CS-TR-73-358 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Inference and the computer understanding of natural language. %A Schank, Roger C. %A Rieger, Charles J., III %D May 1973 %X The notion of computer understanding of natural language is examined relative to inference mechanisms designed to function in a language-free deep conceptual base (Conceptual Dependency). The conceptual analysis of a natural language sentence into this conceptual base, and the nature of the memory which stores and operates upon these conceptual structures are described from both theoretical and practical standpoints. The various types of inferences which can be made during and after the conceptual analysis of a sentence are defined, and a functioning program which performs these inference tasks is described. Actual computer output is included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/358/CS-TR-73-358.pdf %R CS-TR-73-360 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Open, closed, and mixed networks of queues with different classes of customers. %A Muntz, Richard R. %A Baskett, Forest, III %D August 1972 %X We derive the joint equilibrium distribution of queue sizes in a network of queues containing N service centers and R classes of customers. The equilibrium state probabilities have the general form: P(S) - Cd(S) $f_1$($x_1$)$f_2$($x_2$)...$f_N$($x_N$) where S is the state of the system, $x_i$ is the configuration of customers at the ith service center, d(S) is a function of the state of the model, $f_i$ is a function that depends on the type of the ith service center, and C is a normalizing constant. We consider four types of service centers to model central processors, data channels, terminals, and routing delays. The queueing disciplines associated with these service centers include first-come-first-served, processor sharing, no queueing, and last-come-first-served. Each customer belongs to a single class of customers while awaiting or receiving service at a service center but may change classes and service centers according to fixed probabilities at the completion of a service request. For open networks we consider state dependent arrival processes. Closed networks are those with no arrivals. A network may be closed with respect to some classes of customers and open with respect to other classes of customers. At three of the four types of service centers, the service times of customers are governed by probability distributions having rational Laplace transforms, different classes of customers having different distributions. At first-come-first-served type service centers the service time distribution must be identical and exponential for all classes of customers. Many of the network results of Jackson on arrival and service rate dependencies, of Posner and Bernholtz on different classes of customers, and of Chandy on different types of service centers are combined and extended in this paper. The results become special cases of the model presented here. An example shows how different classes of customers can affect models of computer systems. Finally, we show that an equivalent model encompassing all of the results involves only classes of customers with identical exponentially distributed service times. All of the other structure of the first model can be absorbed into the fixed probabilities governing the change of class and change of service center of each class of customers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/360/CS-TR-73-360.pdf %R CS-TR-73-361 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An algorithm for the construction of the graphs of organic molecules. %A Brown, Harold %A Masinter, Larry M. %D May 1973 %X A description and a formal proof of an efficient computer implemented algorithm for the construction of graphs is presented. This algorithm, which is part of a program for the automated analysis of organic compounds, constructs all of the non-isomorphic, connected multi-graphs based on a given degree sequence of nodes and which arise from a relatively small "catolog" of certain canonical graphs. For the graphs of the more common organic molecules, a catolog of most of the canonical graphs is known, and the algorithm can produce all of the distinct valence isomers of these organic molecules. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/361/CS-TR-73-361.pdf %R CS-TR-73-364 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Estimation of probability density using signature tables for appplications to pattern recognition. %A Thosar, Ravindra B. %D May 1973 %X Signature table training method consists of cumulative evaluation of a function (such as a probability density) at pre-assigned co-ordinate values of input parameters to the table. The training is conditional: based on a binary valued "learning" input to a table which is compared to the label attached to each training sample. Interpretation of an unknown sample vector is then equivalent of a table look-up, i.e. extraction of the function value stored at the proper co-ordinates. Such a technique is very useful when a large number of samples must be interpreted as in the case of speech recognition and the time required for the training as well as for the recognition is at a premium. However, this method is limited by prohibitive storage requirements, even for a moderate number of parameters, when their relative independence cannot be assumed. This report investigates the conditions under which the higher dimensional probability density function can be decomposed so that the density estimate is obtained by a hierarchy of signature tables with consequent reduction in the storage requirement. Practical utility of the theoretical results obtained in the report is demonstrated by a vowel recognition experiment. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/364/CS-TR-73-364.pdf %R CS-TR-73-365 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic program verification I: a logical basis and its implementation. %A Igarashi, Shigeru %A London, Ralph L. %A Luckham, David C. %D May 1973 %X Defining the semantics of programming languages by axioms and rules of inference yields a deduction system within which proofs may be given that programs satisfy specifications. The deduction system herein is shown to be consistent and also deduction complete with respect to Hoare's system. A subgoaler for the deduction system is described whose input is a significant subset of Pascal programs plus inductive assertions. The output is a set of verification conditions or lemmas to be proved. Several non-trivial arithmetic and sorting programs have been shown to satisfy specifications by using an interactive theorem prover to automatically generate proofs of the verification conditions. Additional components for a more powerful verification system are under construction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/365/CS-TR-73-365.pdf %R CS-TR-73-368 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The goals of linguistic theory revisited. %A Schank, Roger C. %A Wilks, Yorick A. %D May 1973 %X We examine the original goals of generative linguistic theory. We suggest that these goals were well defined but misguided with respect to their avoidance of the problem of modelling performance. With developments such as Generative Semantics, it is no longer clear that the goals are clearly defined. We argue that it is vital for linguistics to concern itself with the procedures that humans use in language. We then introduce a number of basic human competencies, in the field of language understanding, understanding in context and the use of inferential information, and argue that the modelling of these aspects of language understanding requires procedures of a sort that cannot be easily accomodated within the dominant paradigm. In particular, we argue that the procedures that will be required in these cases ought to be linguistic, and that the simple-minded importation of techniques from logic may create a linguistics in which there cannot be procedures of the required sort. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/368/CS-TR-73-368.pdf %R CS-TR-73-369 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The development of conceptual structures in children. %A Schank, Roger C. %D May 1973 %X Previous papers by the author have hypothesized that it is possible to represent the meaning of natural language sentences using a framework which has only fourteen primitive ACTs. This paper addresses the problem of when and how these ACTs might be learned by children. The speech of a child of age 2 is examined for possible knowledge of the primitive ACTs as well as the conceptual relations underlying language. It is shown that there is evidence that the conceptual structures underlying language are probably complete by age 2. Next a child is studied from birth to age 1. The emergence of the primitive ACTs and the conceptual relations is traced. The hypothesis is made that the structures that underlie and are necessary for language are present by age 1. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/369/CS-TR-73-369.pdf %R CS-TR-73-371 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A review of "Structured Programming". %A Knuth, Donald E. %D June 1973 %X The recent book $\underline{Structured Programming} by 0. J. Dahl, E. W. Dijkstra, and C. A. R. Hoare promises to have a significant impact on computer science. This report contains a detailed review of the topics treated in that book, in the form of three informal "open letters" to the three authors. It is hoped that circulation of these letters to a wider audience at this time will help to promote useful discussion of the important issues. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/371/CS-TR-73-371.pdf %R CS-TR-73-373 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T SAIL user manual. %A VanLehn, Kurt A. %D July 1973 %X SAIL is a high-level programming language for the PDP-10 computer. It includes an extended ALGOL 60 compiler and a companion set of execution-time routines. In addition to ALGOL, the language features: (1) flexible linking to hand-coded machine language algorithms, (2) complete access to the PDP-10 I/O facilities, (3) a complete system of compile-time arithmetic and logic as well as a flexible macro system, (4) user modifiable error handling, (5) backtracking, and (6) interrupt facilities. Furthermore, a subset of the SAIL language, called LEAP, provides facilities for (1) sets and lists, (2) an associative data structure, (3) independent processes, and (4) procedure variables. The LEAP subset of SAIL is an extension of the LEAP language, which was designed by J. Feldman and P. Rovner, and implemented on Lincoln Laboratory's TX-2 (see [Feldman & Rovner, "An Algol-Based Associative Language," Communications of the ACM, v.12, no. 8 (Aug. 1969), pp.439-449]). The extensions to LEAP are partially described in "Recent Developments is SAIL" (see [Feldman et al., Proceedings of the AFIPS Fall Joint Computer Conference, 1972, pp. 1193-1202]). This manual describes the SAIL language and the execution-time routines for the typical SAIL user: a non-novice programmer with some knowledge of ALGOL. It lies somewhere between being a tutorial and a reference manual. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/373/CS-TR-73-373.pdf %R CS-TR-73-376 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lower estimates for the error of best uniform approximation. %A Meinardus, Guenter %A Taylor, Gerald D. %D July 1973 %X In this paper the lower bounds of de La Vallee Poussin and Remes for the error of best uniform approximation from a linear subspace are generalized to give analogous estimates based on k points, k = 1,...,n. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/376/CS-TR-73-376.pdf %R CS-TR-73-378 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The optimum comb method of pitch period analysis of continuous digitized speech. %A Moorer, James Anderson %D July 1973 %X A new method of tracking the fundamental frequency of voiced speech is described. The method is shown to be of similar accuracy as the Cepstrum technique. Since the method involves only additions, no multiplication, it is shown to be faster than the SIFT algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/378/CS-TR-73-378.pdf %R CS-TR-73-382 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Axiomatic approach to total correctness of programs. %A Manna, Z ohar %A Pnueli, Amir %D July 1973 %X We present here an axiomatic approach which enables one to prove by formal methods that his program is "totally correct" (i.e., it terminates and is logically correct -- does what it is supposed to do). The approach is similar to Hoare's approach for proving that a program is "partially correct" (i.e., that whenever it terminates it produces correct results). Our extension to Hoare's method lies in the possibility of proving correctness $underline{and}$ termination at once, and in the enlarged scope of properties that can be proved by it. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/382/CS-TR-73-382.pdf %R CS-TR-73-383 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Natural language inference. %A Wilks, Yorick A. %D August 1973 %X The paper describes the way in which a Preference Semantics system for natural language analysis and generation tackles a difficult class of anaphoric inference problems (finding the correct referent for an English pronoun in context): those requiring either analysis (conceptual) knowledge of a complex sort, or requiring weak inductive knowledge of the course of events in the real world. The method employed converts all available knowledge to a canonical template form and endeavors to create chains of non-deductive inferences from the unknowns to the possible referents. Its method of selecting among possible chains of inferences is consistent with the overall principle of "semantic preference" used to set up the original meaning representation, of which these anaphoric inference procedures are a manipulation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/383/CS-TR-73-383.pdf %R CS-TR-73-384 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The generation of French from a semantic representation. %A Herskovits, Annette %D August 1973 %X The report contains first a brief description of Preference Semantics, a system of representation and analysis of the meaning structure of natural language. The analysis algorithm which transforms phrases into semantic items called templates has been considered in detail elsewhere, so this report concentrates on the second phase of analysis, which binds templates together into a higher level semantic block corresponding to an English paragraph, and which, in operation, interlocks with the French generation procedure. During this phase, the semantic relations between templates are extracted, pronouns are referred and those word disambiguations are done that require the context of a whole paragraph. These tasks require items called PARAPLATES which are attached to keywords such as prepositions, subjunctions and relative pronouns. The system chooses the representation which maximises a carefully defined "semantic density." A system for the generation of French sentences is then described, based on the recursive evaluation of procedural generation patterns called STEREOTYPES. The stereotypes are semantically context sensitive, are attached to each sense of English words and keywords and are carried into the representation by the analysis procedure. The representation of the meaning of words, and the versatility of the stereotype format, allow for fine meaning distinctions to appear in the French, and for the construction of French differing radically from the English original. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/384/CS-TR-73-384.pdf %R CS-TR-73-385 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Recognition of continuous speech: segmentation and classification using signature table adaptation. %A Thosar, Ravindra B. %D September 1973 %X This report explores the possibility of using a set of features for segmentation and recognition of continuous speech. The features are not necessarily "distinctive" or minimal, in the sense that they do not divide the phonemes into mutually exclusive subsets, and can have high redundancy. This concept of feature can thus avoid apriori binding between the phoneme categories to be recognized and the set of features defined in a particular system. An adaptive technique is used to find the probability of the presence of a feature. Each feature is treated independently of other features. An unknown utterance is thus represented by a feature graph with associated probabilities. It is hoped that such a representation would be valuable for a hypothesize-test paradigm as opposed to one which operates on a linear symbolic input. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/385/CS-TR-73-385.pdf %R CS-TR-73-386 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A corner finder for visual feedback. %A Perkins, W. A. %A Binford, Thomas O. %D August 1973 %X In visual-feedback work often a model of an object and its approximate location are known and it is only necessary to determine its location and orientation more accurately. The purpose of the program described herein is to provide such information for the case in which the model is an edge or corner. Given a model of a line or a corner with two or three edges, the program searches a TV window of arbitrary size looking for one or all corners which match the model. A model-driven program directs the search. It calls on another program to find all lines inside the window. Then it looks at these lines and eliminates lines which cannot match any of the model lines. It next calls on a program to form vertices and then checks for a matching vertex. If this simple procedure fails, the model-driver has two backup procedures. First it works with the lines that it has and tries to form a matching vertex (corner). If this fails, it matches parts of the model with vertices and lines that are present and then takes a careful look in a small region in which it expects to find a missing line. The program often finds weak contrast edges in this manner. Lines are found by a global method after the entire window has been scanned with the Hueckel edge operator. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/386/CS-TR-73-386.pdf %R CS-TR-73-387 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Analysis of behavior of chemical molecules: rule formation on non-homogeneous classes of objects. %A Buchanan, Bruce G. %A Sridharan, Natesa S. %D August 1973 %X An information processing model of some important aspects of inductive reasoning is presented within the context of one scientific discipline. Given a collection of experimental (mass spectrometry) data from several chemical molecules the computer program described here separates the molecules into "well-behaved" subclasses and selects from the space of all explanatory processes the "characteristic" processes for each subclass. The definitions of "well-behaved" and "characteristic" embody several heuristics which are discussed. Some results of the program are discussed which have been useful to chemists and which lend credibility to this approach. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/387/CS-TR-73-387.pdf %R CS-TR-73-388 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interconnections for parallel memories to unscramble p-ordered vectors. %A Swanson, Roger C. %D May 1973 %X Several methods are being considered for storing arrays in a parallel memory system so that various useful partitions of an array can be fetched from the memory with a single access. Some of these methods fetch vectors in an order scrambled from that required for a computation. This paper considers the problem of unscrambling such vectors when the vectors belong to a class called p-ordered vectors and the memory system consists of a prime number of modules. Pairs of interconnections are described that can unscramble p-ordered vectors in a number of steps that grows as the square root of the number of memories. Lower and upper bounds are given for the number of steps to unscramble the worst case vector. The upper bound calculation that is derived also provides an upper bound on the minimum diameter of a star polygon with a fixed number of nodes and two interconnections. An algorithm is given that has produced optimal pairs of interconnections for all sizes of memory that have been tried. The algorithm appears to find optimal pairs for all memory sizes, but no proof has yet been found. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/388/CS-TR-73-388.pdf %R CS-TR-73-390 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A construction for the inverse of a Turing machine. %A Gips, James %D August 1973 %X A direct construction for the inverse of a Turing machine is presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/390/CS-TR-73-390.pdf %R CS-TR-73-391 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Search strategies for the task of organic chemical synthesis. %A Sridharan, Natesa S. %D October 1973 %X A computer program has been written that successfully discovers syntheses for complex organic chemical molecules. The definition of the search space and strategies for heuristic search are described in this paper. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/391/CS-TR-73-391.pdf %R CS-TR-73-392 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sorting and Searching - errata and addenda. %A Knuth, Donald E. %D October 1973 %X This report lists all the typographical errors, in $underline{The Art of Computer Programming / Volume 3}$, that are presently known to its author. Several recent developments and references to the literature, which will be incorporated in the second printing, are also included in an attempt to keep the book up-to-date. Several dozen corrections to the second (1971) printing of volume two are also included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/392/CS-TR-73-392.pdf %R CS-TR-73-394 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel programming: an axiomatic approach. %A Hoare, C. A. R. %D October 1973 %X This paper develops some ideas expounded in [C.A.R. Hoare. "Towards a Theory of Parallel Programming," in $\underline{Operating Systems Techniques}, ed. C.A.R. Hoare and R.H. Perrot. Academic Press. 1972]. It distinguishes a number of ways of using parallelism, including disjoint processes, competition, cooperation, communication and "colluding". In each case an axiomatic proof rule is given. Some light is thrown on traps or ON conditions. Warning: the program structuring methods described here are not suitable for the construction of operating systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/394/CS-TR-73-394.pdf %R CS-TR-73-396 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The use of sensory feedback in a programmable assembly system. %A Bolles, Robert C. %A Paul, Richard P. %D October 1973 %X This article describes an experimental, automated assembly system which uses sensory feedback to control an electro-mechanical arm and TV camera. Visual, tactile, and force feedback are used to improve positional information, guide manipulations, and perform inspections. The system has two phases: a 'planning' phase in which the computer is programmed to assemble some object, and a 'working' phase in which the computer controls the arm and TV camera in actually performing the assembly. The working phase is designed to be run on a mini-computer. The system has been used to assemble a water pump, consisting of a base, gasket, top, and six screws. This example is used to explain how the sensory data is incorporated into the control system. A movie showing the pump assembly is available from the Stanford Artificial Intelligence Laboratory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/396/CS-TR-73-396.pdf %R CS-TR-73-398 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Image contouring and comparing. %A Baumgart, Bruce G. %D October 1973 %X A contour image representation is stated and an algorithm for converting a set of digital television images into this representation is explained. The algorithm consists of five steps: digital image thresholding, binary image contouring, polygon nesting, polygon smoothing, and polygon comparing. An implementation of the algorithm is the main routine of a program called CRE; auxiliary routines provide cart and turn table control, TV camera input, image display, and xerox printer output. A serendip application of CRE to type font construction is explained. Details about the intended application of CRE to the perception of physical objects will appear in sequels to this paper. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/398/CS-TR-73-398.pdf %R CS-TR-73-401 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Monitors: an operating system structuring concept. %A Hoare, C. A. R. %D November 1973 %X This paper develops Brinch-Hansen's concept of a monitor as a method of structuring an operating system. It introduces a form of synchronization, describes a possible method of implementation in terms of semaphores, and gives a suitable proof rule. Illustrative examples include a single resource scheduler, a bounded buffer, an alarm clock, a buffer pool, a disc head optimizer, and a version of the problem of readers and writers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/401/CS-TR-73-401.pdf %R CS-TR-73-403 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Hints on programming language design. %A Hoare, C. A. R. %D December 1973 %X This paper (based on a keynote address presented at the SIGACT/SIGPLAN Symposium on Principles of Programming Languages, Boston, October 1-3, 1973) presents the view that a programming language is a tool which should assist the programmer in the most difficult aspects of his art, namely program design, documentation, and debugging. It discusses the objective criteria for evaluating a language design, and illustrates them by application to language features of both high level languages and machine code programming. It concludes with an annotated reading list, recommended for all intending language designers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/403/CS-TR-73-403.pdf %R CS-TR-73-379 %Z Mon, 25 Sep 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The hetrodyne filter as a tool for analysis of transient waveforms. %A Moorer, James Anderson %D July 1973 %X A method of analysis of transient waveforms is discussed. Its properties and limitations are presented in the context of musical tones. The method is shown to be useful when the risetimes of the partials of the tone are not too short. An extension to inharmonic partials and polyphonic musical sound is discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/73/379/CS-TR-73-379.pdf %R CS-TR-72-252 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Large-scale linear programming using the Cholesky factorization. %A Saunders, Michael A. %D January 1972 %X A variation of the revised simplex method is proposed for solving the standard linear programming problem. The method is derived from an algorithm recently proposed by Gill and Murray, and is based upon the orthogonal factorization B = LQ or, equivalently, upon the Cholesky factorization ${BB}^T = {LL}^T$ where B is the usual square basis, L is lower triangular and Q is orthogonal. We wish to retain the favorable numerical properties of the orthogonal factorization, while extending the work of Gill and Murray to the case of linear programs which are both large and sparse. The principal property exploited is that the Cholesky factor L depends only on $underline{which}$ variables are in the basis, and not upon the $underline{order}$ in which they happen to enter. A preliminary ordering of the rows of the full data matrix therefore promises to ensure that L will remain sparse throughout the iterations of the simplex method. An initial (in-core) version of the algorithm has been implemented in Algol W on the IBM 360/91 and tested on several medium-scale problems from industry (up to 930 constraints). While performance has not been especially good on problems of high density, the method does appear to be efficient on problems which are very sparse, and on structured problems which have either generalized upper bounding, block-angular, or staircase form. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/252/CS-TR-72-252.pdf %R CS-TR-72-253 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Total complexity and the inference of best programs. %A Feldman, Jerome A. %A Shields, Paul C. %D April 1972 %X Axioms for a total complexity measure for abstract programs are presented. Essentially, they require that total complexity be an unbounded increasing function of the Blum time and size measures. Algorithms for finding the best program on a finite domain are presented, and their limiting behaviour for infinite domains described. For total complexity, there are important senses in which a machine $underline{can}$ find the best program for a large class of functions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/253/CS-TR-72-253.pdf %R CS-TR-72-254 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Von Neumann's comparison method for random sampling from the normal and other distributions. %A Forsythe, George E. %D January 1972 %X The author presents a generalization he worked out in 1950 of von Neumann's method of generating random samples from the exponential distribution by comparisons of uniform random numbers on (0,1). It is shown how to generate samples from any distribution whose probability density function is piecewise both absolutely continuous and monotonic on ($-\infty$,$\infty$). A special case delivers normal deviates at an average cost of only 4.036 uniform deviates each. This seems more efficient than the Center-Tail method of Dieter and Ahrens, which uses a related, but different, method of generalizing the von Neumann idea to the normal distribution. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/254/CS-TR-72-254.pdf %R CS-TR-72-255 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic programming. %A Feldman, Jerome A. %D February 1972 %X The revival of interest in Automatic Programming is considered. The research is divided into direct efforts and theoretical developments and the successes and prospects of each are described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/255/CS-TR-72-255.pdf %R CS-TR-72-256 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Edmonds polyhedra and weakly hamiltonian graphs. %A Chvatal, Vaclav %D January 1972 %X Jack Edmonds developed a new way of looking at extremal combinatorial problems and applied his technique with a great success to the problems of the maximal-weight degree-constrained subgraphs. Professor C. St. J. A. Nash-Williams suggested to use Edmonds' approach in the context of hamiltonian graphs. In the present paper, we determine a new set of inequalities (the "comb inequalities") which are satisfied by the characteristic functions of hamiltonian circuits but are not explicit in the straightforward integer programming formulation. A direct application of the linear programming duality theorem then leads to a new necessary condition for the existence of hamiltonian circuits; this condition appears to be stronger than the previously known ones. Relating linear programming to hamiltonian circuits, the present paper can also be seen as a continuation of the work of Dantzig, Fulkerson and Johnson on the travelling salesman problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/256/CS-TR-72-256.pdf %R CS-TR-72-257 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On "PASCAL," code generation, and the CDC 6000 computer. %A Wirth, Niklaus %D February 1972 %X "PASCAL" is a general purpose programming language with characteristics similar to ALGOL 60, but with an enriched set of program- and data structuring facilities. It has been implemented on the CDC 6000 computer. This paper discusses selected topics of code generation, in particular the selection of instruction sequences to represent simple operations on arithmetic, Boolean, and powerset operands. Methods to implement recursive procedures are briefly described, and it is hinted that the more sophisticated solutions are not necessarily also the best. The CDC 6000 architecture appears as a frequent source of pitfalls and nuisances, and its main trouble spots are scrutinized and discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/257/CS-TR-72-257.pdf %R CS-TR-72-258 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some basic machine algorithms for integral order computations. %A Brown, Harold %D February 1972 %X Three machine implemented algorithms for computing with integral orders are described. The algorithms are: 1. For an integral order R given in terms of its left regular representation relative to any basis, compute the nil radical J(R) and a left regular representation of R/J(R). 2. For a semisimple order R given in terms of its left regular representation relative to any basis, compute a new basis for R and the associated left regular representation of R such that the first basis element of the transformed basis is an integral multiple of the identity element in Q $\bigotimes$ R. 3. Relative to any fixed Z -basis for R, compute a unique canonical form for any given finitely generated Z -submodule of Q $\bigotimes$ R described in terms of that basis. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/258/CS-TR-72-258.pdf %R CS-TR-72-261 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The differentiation of pseudoinverses and nonlinear least squares problems whose variables separate. %A Golub, Gene H. %A Pereyra, Victor %D February 1972 %X For given data ($t_i\ , y_i), i=1, \ldots ,m$ , we consider the least squares fit of nonlinear models of the form F($\underset ~\to a\ , \underset ~\to \alpha\ ; t) = \sum_{j=1}^{n}\ g_j (\underset ~\to a ) \varphi_j (\underset ~\to \alpha\ ; t) , \underset ~\to a\ \epsilon R^s\ , \underset ~\to \alpha\ \epsilon R^k\ $. For this purpose we study the minimization of the nonlinear functional r($\underset ~\to a\ , \underset ~\to \alpha ) = \sum_{i=1}^{m} {(y_i - F(\underset ~\to a , \underset ~\to \alpha , t_i))}^2$. It is shown that by defining the matrix ${ \{\Phi (\underset ~\to \alpha\} }_{i,j} = \varphi_j (\underset ~\to \alpha ; t_i)$ , and the modified functional $r_2(\underset ~\to \alpha ) = \l\ \underset ~\to y\ - \Phi (\underset ~\to \alpha )\Phi^+(\underset ~\to \alpha ) \underset ~\to y \l_2^2$, it is possible to optimize first with respect to the parameters $\underset ~\to \alpha$ , and then to obtain, a posteriori, the optimal parameters $\overset ^\to {\underset ~\to a}$. The matrix $\Phi^+(\underset ~\to \alpha$) is the Moore-Penrose generalized inverse of $\Phi (\underset ~\to \alpha$), and we develop formulas for its Frechet derivative under the hypothesis that $\Phi (\underset ~\to \alpha$) is of constant (though not necessarily full) rank. From these formulas we readily obtain the derivatives of the orthogonal projectors associated with $\Phi (\underset ~\to \alpha$), and also that of the functional $r_2(\underset ~\to \alpha$). Detailed algorithms are presented which make extensive use of well-known reliable linear least squares techniques, and numerical results and comparisons are given. These results are generalizations of those of H. D. Scolnik [1971]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/261/CS-TR-72-261.pdf %R CS-TR-72-263 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A procedure for improving the upper bound for the number of n-ominoes. %A Klarner, David A. %A Rivest, Ronald L. %D February 1972 %X An n-omino is a plane figure composed of n unit squares joined together along their edges. Every n-omino is generated by joining the edge of a unit square to the edge of a unit square in some (n-1)-omino so that the new square does not overlap any squares. Let t(n) denote the number of n-ominoes, then it is known that the sequence ${((t(n))}^{1/n} : n = 1,2,\ldots )$ increases to a limit $\Theta$ , and 3.72 < $\Theta$ < 6.75 . A procedure exists for computing an increasing sequence of numbers bounded above by $\Theta$. (Chandra recently showed that the limit of this sequence is $\Theta$ .) In the present work we give a procedure for computing a sequence of numbers bounded below by $\Theta$ . Whether or not the limit of this sequence is $\Theta$ remains an open question. By computing the first ten terms of our sequence, we have shown that $\Theta$ < 4.65 . %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/263/CS-TR-72-263.pdf %R CS-TR-72-264 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An artificial intelligence approach to machine translation. %A Wilks, Yorick A. %D February 1972 %X The paper describes a system of semantic analysis and generation, programmed in LISP 1.5 and designed to pass from paragraph length input in English to French via an interlingual representation. A wide class of English input forms will be covered, but the vocabulary will initially be restricted to one of a few hundred words. With this subset working, and during the current year (71-72), it is also hoped to map the interlingual representation onto some predicate calculus notation so as to make possible the answering of very simple questions about the translated matter. The specification of the translation system itself is complete, and its main points of interest that distinguish it from other systems are: i) It translated phrase by phrase -- with facilities for reordering phrases and establishing essential semantic connectivities between them -- by mapping complex semantic structures of "message" onto each phrase. These constitute the interlingual representation to be translated. This matching is done without the explicit use of a conventional syntax analysis, by taking as the appropriate matched structure the "most dense" of the alternative structures derived. This method has been found highly successful in earlier versions of this analysis system. ii) The French output strings are generated without the explicit use of a generative grammar. That is done by means of STEREOTYPES: strings of French words, and functions evaluating to French words, which are attached to English word senses in the dictionary and built into the interlingual representation by the analysis routines. The generation program thus receives an interlingual representation that already contains both French output and implicit procedures for assembling the output, since the stereotypes are in effect recursive procedures specifying the content and production of the ouput word strings. Thus the generation program at no time consults a word dictionary or inventory of grammar rules. It is claimed that the system of notation and translation described is a convenient one for expressing and handling the items of semantic information that are ESSENTIAL to any effective MT system, I discuss in some detail the semantic information needed to ensure the correct choice of output prepositions in French, a vital matter inadequately treated by virtually all previous formalisms and projects. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/264/CS-TR-72-264.pdf %R CS-TR-72-265 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Primitive concepts underlying verbs of thought. %A Schank, Roger C. %A Goldman, Neil M. %A Rieger, Charles J. %A Riesbeck, Christopher K. %D February 1972 %X In order to create conceptual structures that will uniquely and unambiguously represent the meaning of an utterance, it is necessary to establish 'primitive' underlying actions and states into which verbs can be mapped. This paper presents analyses of the most common mental verbs in terms of such primitive actions and states. In order to represent the way people speak about their mental processes, it was necessary to add to the usual ideas of memory structure the notion of Immediate Memory. It is then argued that there are only three primitive mental ACTs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/265/CS-TR-72-265.pdf %R CS-TR-72-267 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Mathematical Programming Language: an appraisal based on practical experiments. %A Bonzon, Pierre E. %D March 1972 %X The newly proposed Mathematical Programming Language is approached from the user's point of view. To demonstrate its facility of use, three programs are presented which solve large scale linear programming problems with the generalized upper-bounding structure. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/267/CS-TR-72-267.pdf %R CS-TR-72-268 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Degrees and matchings. %A Chvatal, Vaclav %D March 1972 %X Let n, b, d be positive integers. D. Hanson proposed to evaluate f(n,b,d), the largest possible number of edges in a graph with n vertices having no vertex of degree greater than d and no set of more than b independent edges. Using the alternating path method, he found partial results in this direction. We complete Hanson's work; our proof technique has a linear programming flavor and uses Berge's matching formula. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/268/CS-TR-72-268.pdf %R CS-TR-72-269 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Arithmetic properties of certain recursively defined sets. %A Klarner, David A. %A Rado, Richard %D March 1972 %X Let R denote a set of linear operations defined on the set P of positive integers; for example, a typical element of R has the form $\rho (x_1, \ldots ,x_r) = m_0 + m_1 x_1 + \ldots + m_r x_r where m_0, \ldots ,m_r$ denote certain integers. Given a set A of positive integers, there is a smallest set of positive integers denoted which contains A as a subset and is closed under every operation in R. The set can be constructed recursively as follows: Let $A_0$ = A, and define $A_{k+1} = A_k \cup \{\rho (\bar{a}): \rho \in R,\bar{a}\in A_k \times \ldots \times A_k\}$ (k = 0,1,\ldots ), then it can be shown that = $A_0 \cup A_1 \cup \ldots $. The sets sometimes have an elegant form, for example, the set <2x + 3y: 1> consists of all positive numbers congruent to 1 or 5 modulo 12. The objective is to give an arithmetic characterization of elements of a set , and this paper is a report on progress made on this problem last year. Many of the questions left open here have since been resolved by one of us (Klarner). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/269/CS-TR-72-269.pdf %R CS-TR-72-270 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Lanczos algorithm for the symmetric Ax = $\lambda$Bx problem. %A Golub, Gene H. %A Underwood, Richard R. %A Wilkinson, James H. %D March 1972 %X The problem of computing the eigensystem of Ax = $\lambda$Bx when A and B are symmetric and B is positive definite is considered. A generalization of the Lanczos algorithm for reducing the problem to a symmetric tridiagonal eigenproblem is given. A numerically stable variant of the algorithm is described. The new algorithm depends heavily upon the computation of elementary Hermitian matrices. An ALGOL W procedure and a numerical example are also given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/270/CS-TR-72-270.pdf %R CS-TR-72-272 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fixpoint approach to the theory of computation. %A Manna, Z ohar %A Vuillemin, Jean %D March 1972 %X Following the fixpoint theory of Scott, we propose to define the semantics of computer programs in terms of the least fixpoints of recursive programs. This allows one not only to justify all existing verification techniques, but also to extend them to handle various properties of computer programs, including correctness, termination and equivalence, in a uniform manner. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/272/CS-TR-72-272.pdf %R CS-TR-72-273 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Chromatic automorphisms of graphs. %A Chvatal, Vaclav %A Sichler, Jiri %D March 1972 %X The coloring group and the full automorphism group of an n-chromatic graph are independent if and only if n is an integer $\geq$ 3. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/273/CS-TR-72-273.pdf %R CS-TR-72-274 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Linear combinations of sets of consecutive integers. %A Klarner, David A. %A Rado, Richard %D March 1972 %X Let k-1,$m_1, \ldots ,m_k$ denote non-negative integers, and suppose the greatest common divisor of $m_1, \ldots ,m_k$ is 1. We show that if $S_1, \ldots ,S_k$ are sufficiently long blocks of consecutive integers, then the set $m_1 S_1 + \ldots\ m_k S_k$ contains a sizable block of consecutive integers. For example, if m and n are relatively prime natural numbers, and u, U, v, V are integers with U-u $\geq$ n-1, V-v $geq$ m-1, then the set m{u,u+1, $\ldots$,U} + n{v,v+1, $\ldots$,V} contains the set {mu + nv - $\sigma$(m,n), $\ldots$,mU + nV - $\sigma$(m,n)} where $\sigma${m,n) = (m-1)(n-1) is the largest number such that $\sigma$(m,n)-1 cannot be expressed in the form mx + ny with x and y non-negative integers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/274/CS-TR-72-274.pdf %R CS-TR-72-275 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Sets generated by iteration of a linear operation. %A Klarner, David A. %D March 1972 %X This note is a continuation of the paper "Arithmetic properties of certain recursively defined sets," written in collaboration with Richard Rado. Here the sets under consideration are those having the form S = <$m_1 x_1 + \ldots\ + m_r x_r : 1$> where $m_1,\ldots ,m_r$ are given natural numbers with greatest common divisor 1. The set S is the smallest set of natural numbers which contains 1 and is closed under the operation $m_1 x_1 + \ldots\ + m_r x_r$. Also, S can be constructed by iterating the operation $m_1 x_1 + \ldots\ + m_r X_r$ over the set {1}. For example, <2x + 3y: 1> = {1, 5, 13, 17, 25, $\ldots\ } = (1 + 12N) $\cup$ (5 + 12N) where N = {0,1,2,$\ldots$}. It is shown in this note that S contains an infinite arithmetic progression for all natural numbers r-1,$m_1,\ldots ,m_r$. Furthermore, if ($m_1,\ldots ,m_r$) = ($m_1\ldots m_r,m_1 + \ldots\ + m_r$) = 1, then S is a per-set; that is, S is a finite union of infinite arithmetic progressions. In particular, this implies is a per-set for all pairs {m,n} of relatively prime natural numbers. It is an open question whether S is a per-set when ($m_1,\ldots ,m_r$) = 1, but ($m_1\ldots m_r,m_1 + \ldots\ + m_r$) > 1. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/275/CS-TR-72-275.pdf %R CS-TR-72-278 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Use of fast direct methods for the efficient numerical solution of nonseparable elliptic equations. %A Concus, Paul %A Golub, Gene H. %D April 1972 %X We study an iterative technique for the numerical solution of strongly elliptic equations of divergence form in two dimensions with Dirichlet boundary conditions on a rectangle. The technique is based on the repeated solution by a fast direct method of a discrete Helmholtz equation on a uniform rectangular mesh. The problem is suitably scaled before iteration, and Chebyshev acceleration is applied to improve convergence. We show that convergence can be exceedingly rapid and independent of mesh size for smooth coefficients. Extensions to other boundary conditions, other equations, and irregular mesh spacings are discussed, and the performance of the technique is illustrated with numerical examples. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/278/CS-TR-72-278.pdf %R CS-TR-72-279 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Topics in optimization. %A Osborne, Michael R. %D April 1972 %X These notes are based on a course of lectures given at Stanford, and cover three major topics relevant to optimization theory. First an introduction is given to those results in mathematical programming which appear to be most important for the development and analysis of practical algorithms. Next unconstrained optimization problems are considered. The main emphasis is on that subclass of descent methods which (a) requires the evaluation of first derivatives of the objective function, and (b) has a family connection with the conjugate direction methods. Numerical results obtained using a program based on this material are discussed in an Appendix. In the third section, penalty and barrier function methods for mathematical programming problems are studied in some detail, and possible methods for accelerating their convergence indicated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/279/CS-TR-72-279.pdf %R CS-TR-72-281 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computer interactive picture processing. %A Quam, Lynn H. %A Liebes, Sidney, Jr. %A Tucker, Robert B. %A Hannah, Marsha Jo %A Eross, Botond G. %D April 1972 %X This report describes work done in image processing using an interactive computer system. Techniques for image differencing are described and examples using images returned from Mars by the Mariner Nine spacecraft are shown. Also described are techniques for stereo image processing. Stereo processing for both conventional camera systems and the Viking 1975 Lander camera system is reviewed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/281/CS-TR-72-281.pdf %R CS-TR-72-282 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficient compilation of linear recursive programs. %A Chandra, Ashok K. %D April 1972 %X We consider the class of linear recursive programs. A linear recursive program is a set of procedures where each procedure can make at most one recursive call. The conventional stack implementation of recursion requires time and space both proportional to n, the depth of recursion. It is shown that in order to implement linear recursion so as to execute in time n one doesn't need space proportional to n: $n^\epsilon$ for arbitrarily small $\epsilon$ will do. It is also known that with constant space one can implement linear recursion in time $n^2$. We show that one can do much better: $n^{1+\epsilon}$ for arbitrarily small $\epsilon$. We also describe an algorithm that lies between these two: it takes time n.log(n) and space log(n). It is shown that several problems are closely related to the linear recursion problem, for example, the problem of reversing an input tape given a finite automaton with several one-way heads. By casting all these problems into a canonical form, efficient solutions are obtained simultaneously for all. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/282/CS-TR-72-282.pdf %R CS-TR-72-284 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Edmonds polyhedra and a hierarchy of combinatorial problems. %A Chvatal, Vaclav %D May 1972 %X Let S be a set of linear inequalities that determine a bounded polyhedron P. The closure of S is the smallest set of inequalities that contains S and is closed under two operations: (i) taking linear combinations of inequalities, (ii) replacing an inequality $\sum\ a_j x_j \leq\ a_0$, where $a_1, a_2, ... , a_n$ are integers, by the inequality $\sum\ a_j x_j \leq\ a$ with $a \geq\ [a_0]$. Obviously, if integers $x_1, x_2, ... , x_n$ satisfy all the inequalities in S then they satisfy also all the inequalities in the closure of S. Conversely, let $\sum\ c_j x_j \leq\ c_0$ hold for all choices of integers $x_1, x_2, ... , x_n$, that satisfy all the inequalities in S. Then we prove that $\sum\ c_j x_j \leq\ c_0$ belongs to the closure of S. To each integer linear programming problem, we assign a nonnegative integer, called its rank. (The rank is the minimum number of iterations of the operation (ii) that are required in order to eliminate the integrality constraint.) We prove that there is no upper bound on the rank of problems arising from the search for largest independent sets in graphs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/284/CS-TR-72-284.pdf %R CS-TR-72-286 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the solution of Moser's problem in four dimensions, and related issues. A collection of two papers: On the solution of Moser's problem in four dimensions and Independent permutations as related to a problem of Moser and a theorem of Polya. %A Chandra, Ashok K. %D May 1972 %X The problem of finding the largest set of nodes in a d-cube of side 3 such that no three nodes are collinear was proposed by Moser. Small values of d (viz., $d \leq\ 3$) resulted in elegant symmetric solutions. It is shown that this does not remain the case in 4 dimensions where at most 43 nodes can be chosen, and these must not include the center node. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/286/CS-TR-72-286.pdf %R CS-TR-72-288 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Logic for Computable Functions: description of a machine implementation. %A Milner, Robin %D May 1972 %X This paper is primarily a user's manual for LCF, a proof-checking program for a logic of computable functions proposed by Dana Scott in 1969 but unpublished by him. We use the name LCF also for the logic itself, which is presented at the start of the paper. The proof-checking program is designed to allow the user interactively to generate formal proofs about computable functions and functionals over a variety of domains, including those of interest to the computer scientist - for example, integers, lists and computer programs and their semantics. The user's task is alleviated by two features: a subgoaling facility and a powerful simplification mechanism. Applications include proofs of program correctness and in particular of compiler correctness; these applications are not discussed herein, but are illustrated in the papers referenced in this introduction. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/288/CS-TR-72-288.pdf %R CS-TR-72-289 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lakoff on linguistics and natural logic. %A Wilks, Yorick A. %D June 1972 %X The paper examines and criticises Lakoff's notions of a natural logic and of a generative semantics described in terms of logic. I argue that the relationship of these notions to logic as normally understood is unclear, but I suggest, in the course of the paper, a number of possible interpretations of his thesis of generative semantics. I argue further that on these interpretations the thesis (of Generative Semantics) is false, unless it be taken as a mere notational variant of Chomskyan theory. I argue, too, that Lakoff's work may provide a service in that it constitutes a reductio ad absurdum of the derivational paradigm of modern linguistics; and shows, inadvertently, that only a system with the ability to reconsider its own inferences can do the job that Lakoff sets up for linguistic enquiry -- that is to say, only an "artificial intelligence" system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/289/CS-TR-72-289.pdf %R CS-TR-72-290 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Adverbs and belief. %A Schank, Roger C. %D June 1972 %X The treatment of a certain class of adverbs in conceptual representation is given. Certain adverbs are shown to be representative of complex belief structures. These adverbs serve as pointers that explain where the sentence that they modify belongs in a belief structure. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/290/CS-TR-72-290.pdf %R CS-TR-72-291 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some combinatorial lemmas. %A Knuth, Donald E. %D June 1972 %X This report consists of several short papers which are completely independent of each other: 1. "Wheels Within Wheels." Every finite strongly connected digraph is either a single point or a set of n smaller strongly connected digraphs joined by an oriented cycle of length n. This result is proved in somewhat stronger form, and two applications are given. 2. "An Experiment in Optimal Sorting." An unsuccessful attempt, to sort 13 or 14 elements in less comparisons than the Ford-Johnson algorithm, is described. (Coauthor: E.B. Kaehler.) 3. "Permutations With Nonnegative Partial Sums." A sequence of s positive and t negative real numbers, whose sum is zero, can be arranged in at least (s+t-1)! and at most (s+t)!/(max(s,t)+1) < 2(s+t-1)! ways such that the partial sums $x_1 + ... + x_j$ are nonnegative for $1 \leq\ j \leq\ s+t$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/291/CS-TR-72-291.pdf %R CS-TR-72-292 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Selected combinatorial research problems. %A Chvatal, Vaclav %A Klarner, David A. %A Knuth, Donald E. %D June 1972 %X Thirty-seven research problems are described, covering a wide range of combinatorial topics. Unlike Hilbert's problems, most of these are not especially famous and they might be "do-able" in the next few years. (Problems 1-16 were contributed by Klarner, 17-26 by Chvatal, 27-37 by Knuth. All cash awards are Chvatal's responsibility.) %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/292/CS-TR-72-292.pdf %R CS-TR-72-299 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T semantic categories of nominals for conceptual dependency analysis of natural language. %A Russell, Sylvia Weber %D July 1972 %X A system for the semantic categorization of conceptual objects (nominals) is provided. The system is intended to aid computer understanding of natural language. Specific implementations for "noun-pairs" and prepositional phrases are offered. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/299/CS-TR-72-299.pdf %R CS-TR-72-300 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Counterexample to a conjecture of Fujii, Kasami and Ninomiya. %A Kaufman, Marc T. %D June 1972 %X In a recent paper [1], Fujii, Kasami and Ninomiya presented a procedure for the optimal scheduling of a system of unit length tasks represented as a directed acyclic graph on two identical processors. The authors conjecture that the algorithm can be extended to the case where more than two processors are employed. This note presents a counterexample to that conjecture. [1] Fujii, M., T. Kasami and K. Ninomiya, "Optimal Sequencing of Two Equivalent Processors, SIAM J. Appl. Math., Vol. 17, No.4, July 1969, pp. 784-789. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/300/CS-TR-72-300.pdf %R CS-TR-72-301 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Product form of the Cholesky factorization for large-scale linear programming. %A Saunders, Michael A. %D August 1972 %X A variation of Gill and Murray's version of the revised simplex algorithm is proposed, using the Cholesky factorization ${BB}^T = {LDL}^T$ where B is the usual basis, D is diagonal and L is unit lower triangular. It is shown that during change of basis L may be updated in product form. As with standard methods using the product form of inverse, this allows use of sequential storage devices for accumulating updates to L. In addition the favorable numerical properties of Gill and Murray's algorithm are retained. Cloase attention is given to efficient out-of-core implementation. In the case of large-scale block-angular problems, the updates to L will remain very sparse for all iterations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/301/CS-TR-72-301.pdf %R CS-TR-72-304 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Richardson's non-stationary matrix iterative procedure. %A Anderssen, Robert S. %A Golub, Gene H. %D August 1972 %X Because of its simplicity, Richardson's non-stationary iterative scheme is a potentially powerful method for the solution of (linear) operator equations. However, its general application has more or less been blocked by (a) the problem of constructing polynomials, which deviate least from zero on the spectrum of the given operator, and which are required for the determination of the iteration parameters of the non-stationary method, and (b) the instability of this scheme with respect to rounding error effects. Recently, these difficulties were examined in two Russian papers. In the first, Lebedev [1969] constructed polynomials which deviate least from zero on a set of subintervals of the real axis which contains the spectrum of the given operator. In the second, Lebedev and Finogenov [1971] gave an ordering for the iteration parameters of the non-stationary Richardson scheme which makes it a stable numerical process. Translation of these two papers appear as Appendices 1 and 2, respectively, in this report. The body of the report represents an examination of the properties of Richardson's non-stationary scheme and the pertinence of the two mentioned papers along with the results of numerical experimentation testing the actual implementation of the procedures given in them. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/304/CS-TR-72-304.pdf %R CS-TR-72-306 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A bibliography on computer graphics. %A Pollack, Bary W. %D August 1972 %X This bibliography includes the most important works describing the softwre aspects of generative computer graphics. As such it will be of most usefullness to researchers, system designers and programmers whose interests and responsibilities include the development of software systems for interactive graphical input/output. The bibliography does include a short section on hardware systems. Image analysis, pattern recognition and picture processing and related fields are rather poorly represented here. The interested researcher is referred to journals in this field and to the reports of Azriel Rosenfeld, University of Maryland, which include excellent bibliographic references. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/306/CS-TR-72-306.pdf %R CS-TR-72-307 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Hadamard transform for speech wave analysis. %A Tanaka, Hozumi %D August 1972 %X Two methods of speech wave analysis using the Hadamard transform are discussed. The first method is a direct application of the Hadamard transform for speech waves. The reason this method yields poor results is discussed. The second method is the application of the Hadamard transform to a log-magnitude frequency spectrum. After the application of the Fourier transform the Hadamard transform is applied to detect a pitch period or to get a smoothed spectrum. This method shows some positive aspects of the Hadamard transform for the analysis of a speech wave with regard to the reduction of processing time required for smoothing, but at the cost of precision. A formant tracking program for voiced speech is implemented by using this method and an edge following technique used in scene analysis. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/307/CS-TR-72-307.pdf %R CS-TR-72-308 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Recent developments in SAIL, an algol-based language for artificial intelligence. %A Feldman, Jerome A. %A Low, James R. %A Swinehart, Daniel C. %A Taylor, Russell H. %D November 1972 %X New features added to SAIL, an ALGOL based language for the PDP-10, are discussed. The features include: procedure variables; multiple processes; coroutines; a limited form of backtracking; an event mechanism for inter-process communication; and matching procedures, a new way of searching the LEAP associative data base. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/308/CS-TR-72-308.pdf %R CS-TR-72-310 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Anomalies in scheduling unit-time tasks. %A Kaufman, Marc T. %D June 1972 %X In this paper we examine the problem of scheduling a set of tasks on a system with a number of identical processors. Several timing anomalies are known to exist for the general case, in which the execution time can increase when inter-task constraints are removed or processors are added. It is shown that these anomalies also exist when tasks are restricted to be of equal (unit) length. Several, increasingly restrictive, heuristic scheduling algorithms are reviewed. The "added processor" anomaly is shown to persist through all of them, though in successively weaker form. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/310/CS-TR-72-310.pdf %R CS-TR-72-317 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An analysis of drum storage units. %A Fuller, Samuel H. %A Baskett, Forest %D August 1972 %X This article discusses the modeling and analysis of drum-like storage units. Two common forms of drum organizations and two common scheduling disciplines are considered: the file drum and the paging drum; first-in-first-out (FIFO) scheduling and shortest-latency-time-first (SLTF) scheduling. The modeling of the I/O requests to the drum is an important aspect of this analysis. Measurements are presented to indicate that it is realistic to model requests for records, or blocks of information to a file drum, as requests that have starting addresses uniformly distributed around the circumference of the drum and transfer times that are exponentially distributed with a mean of 1/2 to 1/3 of a drum revolution. The arrival of I/O requests is first assumed to be a Poisson process and then generalized to the case of a computer system with a finite degree of multiprogramming. An exact analysis of all the models except the SLTF file drum is presented; in this case the complexity of the drum organization has forced us to accept an approximate analysis. In order to examine the error introduced into the analysis of the SLTF file drum by our approximations, the results of the analytic models are compared to a simulation model of the SLTF file drum. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/317/CS-TR-72-317.pdf %R CS-TR-72-318 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Constructive graph labeling using double cosets. %A Brown, Harold %A Masinter, Larry M. %A Hjelmeland, Larry %D October 1972 %X Two efficient computer implemented algorithms are presented for explicitly constructing all distinct labelings of a graph G with a set of (not necessarily distinct) labels L, given the symmetry group B of G. Two recursive reductions of the problem and a precomputation involving certain orbits of stabilizer subgroups are the techniques used by the algorithm. Moreover, for each labeling, the subgroup of B which preserves that labeling is calculated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/318/CS-TR-72-318.pdf %R CS-TR-72-319 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On a characterization of the best $\ell_2$ scaling of a matrix. %A Golub, Gene H. %A Varah, James M. %D October 1972 %X This paper is concerned with best two-sided scaling of a general square matrix, and in particular with a certain characterization of that best scaling: namely that the first and last singular vectors (on left and right) of the scaled matrix have components of equal modulus. Necessity, sufficiency, and its relation with other characterizations are discussed. Then the problem of best scaling for rectangular matrices is introducted and a conjecture made regarding a possible best scaling. The conjecture is verified for some special cases. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/319/CS-TR-72-319.pdf %R CS-TR-72-320 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Winged edge polyhedron representation. %A Baumgart, Bruce G. %D October 1972 %X A winged edge polyhedron representation is stated and a set of primitives that preserve Euler's F-E+V = 2 equation are explained. Present use of this representation in artificial intelligence for computer graphics and world modeling is illustrated and its intended future application to computer vision is described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/320/CS-TR-72-320.pdf %R CS-TR-72-322 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Methods for modifying matrix factorizations. %A Gill, Phillip E. %A Golub, Gene H. %A Murray, Walter A. %A Saunders, Michael A. %D November 1972 %X In recent years several algorithms have appeared for modifying the factors of a matrix following a rank-one change. These methods have always been given in the context of specific applications and this has probably inhibited their use over a wider field. In this report several methods are described for modifying Cholesky factors. Some of these have been published previously while others appear for the first time. In addition, a new algorithm is presented for modifying the complete orthogonal factorization of a general matrix, from which the conventional QR factors are obtained as a special case. A uniform notation has been used and emphasis has been placed on illustrating the similarity between different methods. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/322/CS-TR-72-322.pdf %R CS-TR-72-323 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A fast method for solving a class of tri-diagonal linear systems. %A Malcolm, Michael A. %A Palmer, John %D November 1972 %X The solution of linear systems having real, symmetric, diagonally dominant, tridiagonal coefficient matrices with constant diagonals is considered. It is proved that the diagonals of the LU decomposition of the coefficient matrix rapidly converge to full floating-point precision. It is also proved that the computed LU decomposition converges when floating-point arithmetic is used and that the limits of the LU diagonals using floating point are roughly within machine precision of the limits using real arithmetic. This fact is exploited to reduce the number of floating-point operations required to solve a linear system from 8n-7 to 5n+2k-3, where k is much less than n, the order of the matrix. If the elements of the sub- and superdiagonals are 1, then only 4n+2k-3 operations are needed. The entire LU decomposition takes k words of storage, and considerable savings in array subscripting are achieved. Upper and lower bounds on k are obtained in terms of the ratio of the coefficient matrix diagonal constants and parameters of the floating-point number system. Various generalizations of these results are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/323/CS-TR-72-323.pdf %R CS-TR-72-325 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Review of Hubert Dreyfus' What Computers Can't Do: a Critique of Artificial Reason (Harper & Row, New York, 1972). %A Buchanan, Bruce G. %D November 1972 %X The recent book $\underline{What Computers Can't Do}$ by Hubert Dreyfus is an attack on artificial intelligence research. This review takes the position that the philosophical content of the book is interesting, but that the attack on artificial intelligence is not well reasoned. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/325/CS-TR-72-325.pdf %R CS-TR-72-326 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Can expert judges, using transcripts of teletyped psychiatric interviews, distinguish human paranoid patients from a computer simulation of paranoid processes? %A Colby, Kenneth Mark %A Hilf, Franklin Dennis %D December 1972 %X Expert judges (psychiatrists and computer scientists) could not correctly distinguish a simulation model of paranoid processes from actual paranoid patients. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/326/CS-TR-72-326.pdf %R CS-TR-72-328 %Z Mon, 16 Oct 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An efficient implementation of Edmonds' maximum matching algorithm. %A Gabow, Harold N. %D June 1972 %X A matching in a graph is a collection of edges, no two of which share a vertex. A maximum matching contains the greatest number of edges possible. This paper presents an efficient implementation of Edmonds' algorithm for finding maximum matchings. The computation time is proportional to $V^3$, where V is the number of vertices; previous algorithms have computation time proportional to $V^4$. The implementation avoids Edmonds' blossom reduction by using pointers to encode the structure of alternating paths. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/72/328/CS-TR-72-328.pdf %R CS-TR-71-188 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The translation of 'go to' programs to 'while' programs %A Ashcroft, Edward A. %A Manna, Z ohar %D January 1971 %X In this paper we show that every flowchart program can be written without $underline{go to}$ statements by using $underline{while}$ statements. The main idea is to introduce new variables to preserve the values of certain variables at particular points in the program; or alternatively, to introduce special boolean variables to keep information about the course of the computation. The 'while' programs produced yield the same final results as the original flowchart program but need not perform computations in exactly the same way. However, the new programs do preserve the 'topology' of the original flowchart program, and are of the same order of efficiency. We also show that this cannot be done in general without adding variables. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/188/CS-TR-71-188.pdf %R CS-TR-71-189 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Mathematical theory of partial correctness %A Manna, Z ohar %D January 1971 %X In this work we show that it is possible to express most properties regularly observed in algorithms in terms of 'partial correctness' (i.e., the property that the final results of the algorithm, if any, satisfy some given input-output relation). This result is of special interest since 'partial correctness' has already been formulated in predicate calculus and in partial function logic for many classes of algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/189/CS-TR-71-189.pdf %R CS-TR-71-190 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An n log n algorithm for minimizing states in a finite automaton %A Hopcroft, John E. %D January 1971 %X An algorithm is given for minimizing the number of states in a finite automaton or for determining if two finite automata are equivalent. The asymptotic running time of the algorithm is bounded by k n log n where k is some constant and n is the number of states. The constant k depends linearly on the size of the input alphabet. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/190/CS-TR-71-190.pdf %R CS-TR-71-191 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An introduction to the direct emulation of control structures by a parallel micro-computer %A Lesser, Victor R. %D January 1971 %X This paper is an investigation of the organization of a parallel micro-computer designed to emulate a wide variety of sequential and parallel computers. This micro-computer allows tailoring of its control structure so that it is appropriate for the particular computer to be emulated. The control structure of this micro-computer is dynamically modified by changing the organization of its data structure for control. The micro-computer contains six primitive operators which dynamically manipulate and generate a tree type data structure for control. This data structure for control is used as a syntactic framework within which particular implementations of control concepts, such as iteration, recursion, co-routines, parallelism, interrupts, etc., can be easily expressed. The major features of the control data structure and the primitive operators are: (1) once the fixed control and data linkages among processes have been defined, they need not be rebuilt on subsequent executions of the control structure; (2) micro-programs may be written so that they execute independently of the number of physical processors present and still take advantage of available processors; (3) control structures for I/O processes, data-accessing processes, and computational processes are expressed in a single uniform framework. An emulator programmed on this micro-computer works as an iterative two-step process similar to the process of dynamic compilation or run time macro-expansion. This dynamic compilation approach to emulation differs considerably from the conventional approach to emulation, and provides a unifying approach to the emulation of a wide variety of sequential and parallel computers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/191/CS-TR-71-191.pdf %R CS-TR-71-192 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An n log n algorithm for isomorphism of planar triply connected graphs %A Hopcroft, John E. %D January 1971 %X It is shown that the isomorphism problem for triply connected planar graphs can be reduced to the problem of minimizing states in a finite automaton. By making use of an n log n algorithm for minimizing the number of states in a finite automaton, an algorithm for determining whether two planar triply connected graphs are isomorphic is developed. The asymptotic growth rate of the algorithm grows as n log n where n is the number of vertices in the graph. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/192/CS-TR-71-192.pdf %R CS-TR-71-193 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Intention, memory, and computer understanding %A Schank, Roger C. %D January 1971 %X Procedures are described for discovering the intention of a speaker by relating the Conceptual Dependency representation of the speaker's utterance to the computer's world model such that simple implications can be made. These procedures function at levels higher than that of the sentence by allowing for predictions based on context and the structure of the memory. Computer understanding of natural language is shown to consist of the following parts: assigning a conceptual representation to an input; relating that representation to the memory such as to extract the intention of the speaker; and selecting the correct response type triggered by such an utterance according to the situation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/193/CS-TR-71-193.pdf %R CS-TR-71-195 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The direct solution of the discrete Poisson equation on irregular regions %A Buzbee, B. L. %A Dorr, Fred W. %A George, John Alan %A Golub, Gene H. %D December 1970 %X There are several very fast direct methods which can be used to solve the discrete Poisson equation on rectangular domains. We show that these methods can also be used to treat problems on irregular regions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/195/CS-TR-71-195.pdf %R CS-TR-71-197 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MIX/360 user's guide %A Knuth, Donald E. %A Sites, Richard L. %D March 1971 %X MIX/360 is an assembler and simulator for the hypothetical MIX machine, which is described for example in Knuth's $\underline{The Art of Computer Programming}$, Section 1.3.1. The system contains several debugging aids to help program construction and verification. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/197/CS-TR-71-197.pdf %R CS-TR-71-201 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Planarity testing in V log V steps: extended abstract %A Hopcroft, John E. %A Tarjan, Robert Endre %D February 1971 %X An efficient algorithm is presented for determining whether or not a given graph is planar. If V is the number of vertices in the graph, the algorithm requires time proportional to V log V and space proportional to V when run on a random-access computer. The algorithm constructs the facial boundaries of a planar representation without backup, using extensive list-processing features to speed computation. The theoretical time bound improves on that of previously published algorithms. Experimental evidence indicates that graphs with a few thousand edges can be tested within seconds. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/201/CS-TR-71-201.pdf %R CS-TR-71-202 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Communicating semaphores %A Saal, Harry J. %A Riddle, William E. %D February 1971 %X This paper describes two extensions to the semaphore operators originally introduced by Dijkstra. These extensions can be used to reduce: 1) the number of semaphore references; 2) the time spent in critical sections; and 3) the number of distinct semaphores required for proper synchronization without greatly increasing the time required for semaphore operations. Communicating semaphores may be utilized not only for synchronization but also for message switching, resource allocation from pools and as general queueing mechanisms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/202/CS-TR-71-202.pdf %R CS-TR-71-203 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Heuristic DENDRAL program for explaining empirical data %A Buchanan, Bruce G. %A Lederberg, Joshua %D February 1971 %X The Heuristic DENDRAL program uses an information processing model of scientific reasoning to explain experimental data in organic chemistry. This report summarizes the organization and results of the program for computer scientists. The program is divided into three main parts: planning, structure generation, and evaluation. The planning phase infers constraints on the search space from the empirical data input to the system. The structure generation phase searches a tree whose termini are models of chemical molecules using pruning heuristics of various kinds. The evaluation phase tests the candidate structures against the original data. Results of the program's analyses of some test data are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/203/CS-TR-71-203.pdf %R CS-TR-71-204 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T FETE: a Fortran execution time estimator %A Ingalls, Daniel H. H. %D February 1971 %X If you want to live cheaply, you must make a list of how much money is spent on each thing every day. This enumeration will quickly reveal the principal areas of waste. The same method works for saving computer time. Originally, one had to put his own timers and counters into a program to determine the distribution of time spent in each part. Recently several automated systems have appeared which either insert counters automatically or interrupt the program during its execution to produce the tallies. FETE is a system of the former type which has two outstanding characteristics: it is very easy to implement and it is very easy to use. By demonstrating such convenience, it should establish execution timing as a standard tool in program development. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/204/CS-TR-71-204.pdf %R CS-TR-71-205 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An algebraic definition of simulation between programs %A Milner, Robin %D February 1971 %X A simulation relation between programs is defined which is quasi-ordering. Mutual simulation is then an equivalence relation, and by dividing out by it we abstract from a program such details as how the sequencing is controlled and how data is represented. The equivalence classes are approxiamtions to the algorithms which are realized, or expressed, by their member programs. A technique is given and illustrated for proving simulation and equivalence of programs; there is an analogy with Floyd's technique for proving correctness of programs. Finally, necessary and sufficient conditions for simulation are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/205/CS-TR-71-205.pdf %R CS-TR-71-207 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficient algorithms for graph manipulation %A Hopcroft, John E. %A Tarjan, Robert Endre %D March 1971 %X Efficient algorithms are presented for partitioning a graph into connected components, biconnected components and simple paths. The algorithm for partitioning of a graph into simple paths is iterative and each iteration produces a new path between two vertices already on paths. (The start vertex can be specified dynamically.) If V is the number of vertices and E is the number of edges each algorithm requires time and space proportional to max(V,E) when executed on a random access computer. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/207/CS-TR-71-207.pdf %R CS-TR-71-209 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Project technical report %A McCarthy, John %A Samuel, Arthur L. %A Feigenbaum, Edward A. %A Lederberg, Joshua %D March 1971 %X An overview is presented of current research at Stanford in artificial intelligence and heuristic programming. This report is largely the text of a proposal to the Advanced Research Projects Agency for fiscal years 1972-73. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/209/CS-TR-71-209.pdf %R CS-TR-71-210 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ACCESS: a program for the catalog and access of information %A Purdy, J. Gerry %D March 1971 %X ACCESS is a program for the catalog and access of information. The program is primarily designed for and intended to handle a personal library, although larger applications are possible. ACCESS produces a listing of all entries by locator code (so one knows where to find the entry in his library), a listing of entry titles by user-specified category codes, and a keyword-in-context KWIC listing (each keyword specified by the user). ACCESS is presently programmed in FORTRAN and operates on any IBM System/360 under OS (it uses the IBM SORT/MERGE package). It is anticipated a machine language version (soon to be implemented) will greatly decrease the running time of the program. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/210/CS-TR-71-210.pdf %R CS-TR-71-211 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Algorithms to reveal properties of floating-point arithmetic %A Malcolm, Michael A. %D March 1971 %X Two algorithms are presented in the form of Fortran subroutines. Each subroutine computes the radix and number of digits of the floating-point numbers and whether rounding or chopping is done by the machine on which it is run. The methods are shown to work on any "reasonable" floating-point computer. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/211/CS-TR-71-211.pdf %R CS-TR-71-212 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Time and memory requirements for solving linear systems %A Morgana, Maria Aurora %D March 1971 %X The Computer Science Department program library contains a number of ALGOL W procedures and FORTRAN subroutines which can be used to solve systems of linear equations. This report describes the results of tests to determine the amount of time and memory required to solve systems of various orders. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/212/CS-TR-71-212.pdf %R CS-TR-71-213 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The switchyard problem: sorting using networks of queues and stacks %A Tarjan, Robert Endre %D April 1971 %X The problem of sorting a sequence of numbers using a network of queues and stacks is presented. A characterization of sequences sortable using parallel queues is given, and partial characterizations of sequences sortable using parallel stacks and networks of queues are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/213/CS-TR-71-213.pdf %R CS-TR-71-215 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T PL360 (revised): a programming language for the IBM 360 %A Malcolm, Michael A. %D May 1972 %X In 1968, N. Wirth (Jan. JACM) published a formal description of PL360, a programming language designed specifically for the IBM 360. PL360 has an appearance similar to that of Algol, but it provides the facilities of a symbolic machine language. Since 1968, numerous extensions and modifications have been made to the PL360 compiler which was originally designed and implemented by N. Wirth and J. Wells. Interface and input-output subroutines have been written which allow the use of PL360 under OS, DOS, MTS and Orvyl. A formal description of PL360 as it is presently implemented is given. The description of the language is followed by sections on the use of PL360 under various operating systems, namely OS, DOS and MTS. Instructions on how to use the PL360 compiler and PL360 programs in an interactive mode under the Orvyl time-sharing monitor are also included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/215/CS-TR-71-215.pdf %R CS-TR-71-217 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Decidable properties of monadic functional schemas %A Ashcroft, Edward A. %A Manna, Z ohar %A Pneuli, Amir %D July 1971 %X We define a class of (monadic) functional schemas which properly includes 'Ianov' flowchart schemas. We show that the termination, divergence and freedom problems for functional schemas are decidable. Although it is possible to translate a large class of non-free functional schemas into equivalent free functional schemas, we show that this cannot be done in general. We show also that the equivalence problem for free functional schemas is decidable. Most of the results are obtained from well-known results in Formal Languages and Automata Theory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/217/CS-TR-71-217.pdf %R CS-TR-71-221 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A heuristic programming study of theory formation in science %A Buchanan, Bruce G. %A Feigenbaum, Edward A. %A Lederberg, Joshua %D July 1971 %X The Meta-DENDRAL program is a vehicle for studying problems of theory formation in science. The general strategy of Meta-DENDRAL is to reason from data to plausible generalizations and then to organize the generalizations into a unified theory. Three main subproblems are discussed: (1) explain the experimental data for each individual chemical structure, (2) generalize the results from each structure to all structures, and (3) organize the generalizations into a unified theory. The program is built upon the concepts and programmed routines already available in the Heuristic DENDRAL performance program, but goes beyond the performance program in attempting to formulate the theory which the performance program will use. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/221/CS-TR-71-221.pdf %R CS-TR-71-224 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Parallel programming %A Ershov, Andrei P. %D July 1971 %X This report is based on lectures given at Stanford University by Dr. Ershov in November, 1970. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/224/CS-TR-71-224.pdf %R CS-TR-71-225 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical methods for computing angles between linear subspaces %A Bjoerck, Ake %A Golub, Gene H. %D July 1971 %X Assume that two subspaces F and G of unitary space are defined as the ranges (or nullspaces) of given rectangular matrices A and B. Accurate numerical methods are developed for computing the principal angles $\theta_k (F,G)$ and orthogonal sets of principal vectors $u_k\ \epsilon\ F$ and $v_k\ \epsilon\ G$, k = 1,2,..., q = dim(G) $\leq$ dim(F). An important application in statistics is computing the canonical correlations $\sigma_k\ = cos \theta_k$ between two sets of variates. A perturbation analysis shows that the condition number for $\theta_k$ essentially is max($\kappa (A),\kappa (B)$), where $\kappa$ denotes the condition number of a matrix. The algorithms are based on a preliminary QR-factorization of A and B (or $A^H$ and $B^H$), for which either the method of Householder transformations (HT) or the modified Gram-Schmidt method (MGS) is used. Then cos $\theta_k$ and sin $\theta_k$ are computed as the singular values of certain related matrices. Experimental results are given, which indicates that MGS gives $\theta_k$ with equal precision and fewer arithmetic operations than HT. However, HT gives principal vectors, which are orthogonal to working accuracy, which is not in general true for MGS. Finally the case when A and/or B are rank deficient is discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/225/CS-TR-71-225.pdf %R CS-TR-71-226 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T SIMPLE: a simple precedence translator writing system %A George, James E. %D July 1971 %X SIMPLE is a translator writing system composed of a simple precedence syntax analyzer and a semantic constructor and is implemented in PL/I. It provides an error diagnostic and recovery mechanism for any system implemented using SIMPLE. The removal of precedence conflicts is discussed in detail with several examples. The utilization of SIMPLE is illustrated by defining a command language meta system for the construction of scanners for a wide variety of command oriented languages. This meta system is illustrated by defining commands from several text editors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/226/CS-TR-71-226.pdf %R CS-TR-71-228 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Function minimization and automatic therapeutic control %A Kaufman, Linda C. %D July 1971 %X No abstract available. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/228/CS-TR-71-228.pdf %R CS-TR-71-229 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Variational study of nonlinear spline curves %A Lee, Erastus H. %A Forsythe, George E. %D August 1971 %X This is an exposition of the variational and differential properties of nonlinear spline curves, based on the Euler-Bernoulli theory for the bending of thin beams or elastica. For both open and closed splines through prescribed nodal points in the euclidean plane, various types of nodal constraints are considered, and the corresponding algebraic and differential equations relating curvature, angle, arc length, and tangential force are derived in a simple manner. The results for closed splines are apparently new, and they cannot be derived by the consideration of a constrained conservative system. There is a survey of the scanty recent literature. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/229/CS-TR-71-229.pdf %R CS-TR-71-230 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ALGOL W reference manual %A Sites, Richard L. %D February 1972 %X "A Contribution to the Development of ALGOL" by Niklaus Wirth and C. A. R. Hoare was the basis for a compiler developed for the IBM 360 at Stanford University. This report is a description of the implemented language, ALGOL W. Historical background and the goals of the language may be found in the Wirth and Hoare paper. This manual refers to the version of the Algol W compiler dated 16 January 1972. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/230/CS-TR-71-230.pdf %R CS-TR-71-234 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Some modified eigenvalue problems %A Golub, Gene H. %D August 1971 %X We consider the numerical calculation of several eigenvalue problems which require some manipulation before the standard algorithms may be used. This includes finding the stationary values of a quadratic form subject to linear constraints and determining the eigenvalues of a matrix which is modified by a matrix of rank one. We also consider several inverse eigenvalue problems. This includes the problem of computing the Gauss-Radau and Gauss-Lobatto quadrature rules. In addition, we study several eigenvalue problems which arise in least squares. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/234/CS-TR-71-234.pdf %R CS-TR-71-236 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical computations for univariate linear models %A Golub, Gene H. %A Styan, George P. H. %D September 1971 %X We consider the usual univariate linear model E($\underset ~\to y$) = $\underset ~\to X \underset ~\to \gamma$ , V ($\underset ~\to y$) = $\sigma^2 \underset ~\to I$. In Part One of this paper $\underset ~\to X$ has full column rank. Numerically stable and efficient computational procedures are developed for the least squares estimation of $\underset ~\to \gamma$ and the error sum of squares. We employ an orthogonal triangular decomposition of $\underset ~\to X$ using Householder transformations. A lower bound for the condition number of $\underset ~\to X$ is immediately obtained from this decomposition. Similar computational procedures are presented for the usual F-test of the general linear hypothesis $\underset ~\to L\ ' \underset ~\to \gamma$ = $\underset ~\to 0$ ; $\underset ~\to L\ ' \underset ~\to \gamma$ = $\underset ~\to m$ is also considered for $\underset ~\to m\ \neq\ 0$. Updating techniques are given for adding to or removing from ($\underset ~\to X ,\underset ~\to y$) a row, a set of rows or a column . In Part Two, $\underset ~\to X$ has less than full rank. Least squares estimates are obtained using generalized inverses. The function $\underset ~\to L '\underset ~\to \gamma$ is estimable whenever it admits an unbiased estimator linear in $\underset ~\to y$. We show how to computationally verify estimability of $\underset ~\to L '\underset ~\to \gamma$ and the equivalent testability of $\underset ~\to L '\underset ~\to \gamma\ = \underset ~\to 0$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/236/CS-TR-71-236.pdf %R CS-TR-71-237 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A generalization of the divide-sort-merge strategy for sorting networks %A Van Voorhis, David C. %D August 1971 %X With a few notable exceptions the best sorting networks known have employed a "divide-sort-merge" strategy. That is, the N inputs are divided into 2 groups - - normally of size $\lceil \frac{1}{2} N\rceil$ and $\lfloor \frac{1}{2} N\rfloor$ [Here $\lceil x\rceil$ denotes the smallest integer greater than or equal to x, whereas $\lfloor x\rfloor$ denotes the largest integer less than or equal to x] - - that are sorted independently and then "merged" together to form a single sorted sequence. An N-sorter network that uses this strategy consists of 2 smaller sorting networks followed by a merge network. The best merge networks known are also constructed recursively, using 2 smaller merge networks followed by a simple arrangement of $\lceil \frac{1}{2} N\rceil$ - 1 comparators. We consider a generalization of the divide-sort-merge strategy in which the N inputs are divided into g $\geq$ 2 disjoint groups that are sorted independently and then merged together. The merge network that combines these g sorted groups uses d $\geq$ 2 smaller merge networks as an initial subnetwork. The two parameters g and d together define what we call a "[g,d]" strategy. A [g,d] N-sorter network consists of g smaller sorting networks followed by a [g,d] merge network. The initial portion of the [g,d] merge network consists of d smaller merge networks; the final portion, which we call the "f-network," includes whatever additional comparators are required to complete the merge. When g = d = 2, the f-network is a simple arrangement of $\lceil \frac{1}{2} N\rceil$ - 1 comparators; however, for larger g,d the structure of the [g,d] f-network becomes increasingly complicated. In this paper we describe how to construct [g,d] f-networks for arbitrary g,d. For N > 8 the resulting [g,d] N-sorter networks are more economical than any previous networks that use the divide-sort-merge strategy; for N > 34 the resulting networks are more economical than previous networks of any construction. The [4,4] N-sorter network described in this paper requires $\frac{1}{4} N{(log_2 N)}^2\ - \frac{1}{3} N(log_2 N) + O(N)$ comparators, which represents an asymptotic improvement of $\frac{1}{12} N(log_2 N)$ comparators over the best previous N-sorter. We indicate that special constructions (not described in this paper) have been found for [$2^r , 2^r$] f-networks, which lead to an N-sorter network that requires only .25 $N{(log_2 N)}^2\ - .372 N(log_2 N) + O(N)$ comparators. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/237/CS-TR-71-237.pdf %R CS-TR-71-238 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A lower bound for sorting networks that use the divide-sort-merge strategy %A Van Voorhis, David C. %D August 1971 %X Let $M_g (g^{k+1})$ represent the minimum number of comparators required by a network that merges g sorted multisets containing $g^k$ members each. In this paper we prove that $M_g (g^{k+1}) \geq\ g M_g(g^k) + g^{k-1} \sum_{\ell =2}^{g} \lfloor (\ell -1)g/\ell\rfloor$. From this relation we are able to show that an N-sorter network which uses the g-way divide-sort-merge strategy must contain at least order $N{(log_2 N)}^2$ comparators. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/238/CS-TR-71-238.pdf %R CS-TR-71-239 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Large [g,d] sorting networks %A Van Voorhis, David C. %D August 1971 %X With only a few exceptions the minimum-comparator N-sorter networks employ the generalized "divide-sort-merge" strategy. That is, the N inputs are divided among g $\geq$ 2 smaller sorting networks -- of size $N_1,N_2,...,N_g$, where $N = \sum_{k=1}^{g} N_k$ -- that comprise the initial portion of the N-sorter network. The remainder of the N-sorter is a comparator network that merges the outputs of the $N_1-, N_2-, ...,$ and $N_g$-sorter networks into a single sorted sequence. The most economical merge networks yet designed, known as the "[g,d]" merge networks, consist of d smaller merge networks -- where d is a common divisor of $N_1,N_2,...,N_g$ -- followed by a special comparator network labeled a "[g,d] f-network." In this paper we describe special constructions for $[2^r,2^r]$ f-networks, r > 1, which enable us to reduce the number of comparators required by a large N-sorter network from $.25N {log_2 N)}^2 - .25N(log_2 N) + O(N) to .25N{(log_2 N)}^2 - .37N(log_2 N) + O(N)$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/239/CS-TR-71-239.pdf %R CS-TR-71-240 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Correctness of two compilers for a Lisp subset %A London, Ralph L. %D October 1971 %X Using mainly structural induction, proofs of correctness of each of two running Lisp compilers for the PDP-10 computer are given. Included are the rationale for presenting these proofs, a discussion of the proofs, and the changes needed to the second compiler to complete its proof. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/240/CS-TR-71-240.pdf %R CS-TR-71-242 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The frame problem and related problems in artificial intelligence %A Hayes, Patrick J. %D November 1971 %X The frame problem arises in considering the logical structure of a robot's beliefs. It has been known for some years, but only recently has much progress been made. The problem is described and discussed. Various suggested methods for its solution are outlined, and described in a uniform notation. Finally, brief consideration is given to the problem of adjusting a belief system in the face of evidence which contradicts beliefs. It is shown that a variation on the situation notation of (McCarthy and Hayes, 1969) permits an elegant approach, and relates this problem to the frame problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/242/CS-TR-71-242.pdf %R CS-TR-71-246 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A resemblance test for the validation of a computer simulation of paranoid processes %A Colby, Kenneth Mark %A Hilf, Franklin Dennis %A Weber, Sylvia %A Kraemer, Helena C. %D November 1971 %X A computer simulation of paranoid processes in the form of a dialogue algorithm was subjected to a validation study using an experimental resemblance test in which judges rated degrees of paranoia present in initial psychiatric interviews of both paranoid patients and of versions of the paranoid model. The statistical results indicate a satisfactory degree of resemblance between the two groups of interviews. It is concluded that the model provides a successful simulation of naturally occuring paranoid processes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/246/CS-TR-71-246.pdf %R CS-TR-71-247 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T One small head -- some remarks on the use of 'model' in linguistics %A Wilks, Yorick A. %D December 1971 %X I argue that the present situation in formal linguistics, where much new work is presented as being a "model of the brain", or of "human language behavior", is an undesirable one. My reason for this judgement is not the conservative (Braithwaitian) one that the entities in question are not really models but theories. It is rather that they are called models because they cannot be theories of the brain at the present stage of brain research, and hence that the use of "model" in this context is not so much aspirational as resigned about our total ignorance of how the brain stores and processes linguistic information. The reason such explanatory entities cannot be theories is that this ignorance precludes any "semantic ascent" up the theory; i.e., interpreting the items of the theory in terms of observables. And the brain items, whatever they may be, are not, as Chomsky has sometimes claimed, in the same position as the "occult entities" of Physics like Gravitation; for the brain items are not theoretically unreachable, merely unreached. I then examine two possible alternate views of what linguistic theories should be proffered as theories of: theories of sets of sentences, and theories of a particular class of algorithms. I argue for a form of the latter view, and that its acceptance would also have the effect of making Computational Linguistics a central part of Linguistics, rather than the poor relation it is now. I examine a distinction among "linguistic models" proposed recently by Mey, who was also arguing for the self-sufficiency of Computational Linguistics, though as a "theory of performance". I argue that his distinction is a bad one, partly for the reasons developed above and partly because he attempts to tie it to Chomsky's inscrutable competence-performance distinction. I conclude that the independence and self-sufficiency of Computational Linguistics are better supported by the arguments of this paper. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/247/CS-TR-71-247.pdf %R CS-TR-71-249 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An annotated bibliography on the construction of compilers %A Pollack, Bary W. %D December 1971 %X This bibliography is divided into 9 sections: 1. General Information on Compiling Techniques 2. Syntax- and Base-Directed Parsing 3. Parsing in General 4. Resource Allocation 5. Errors - Detection and Correction 6. Compiler Implementation in General 7. Details of Compiler Construction 8. Additional Topics 9. Miscellaneous Related References Within each section the entries are alphabetical by author. Keywords describing the entry will be found for each entry set off by pound signs (#). Some amount of cross-referencing has been done; e.g., entries which fall into Section 3 as well as Section 7 will generally be found in both sections. However, entries will be found listed only under the principle or first author's name. "Computing Reviews" citations are given following the annotation when available. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/249/CS-TR-71-249.pdf %R CS-TR-71-250 %Z Wed, 01 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Program schemas with equality %A Chandra, Ashok K. %A Manna, Z ohar %D December 1971 %X We discuss the class of program schemas augmented with equality tests, that is, tests of equality between terms. In the first part of the paper we discuss and illustrate the "power" of equality tests. It turns out that the class of program schemas with equality is more powerful than the "maximal" classes of schemas suggested by other investigators. In the second part of the paper we discuss the decision problems of program schemas with equality. It is shown for example that while the decision problems normally considered for schemas (such as halting, divergence, equivalence, isomorphism and freedom) are solvable for Ianov schemas, they all become unsolvable if general equality tests are added. We suggest, however, limited equality tests which can be added to certain subclasses of program schemas while preserving their solvable properties. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/71/250/CS-TR-71-250.pdf %R CS-TR-70-146 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Roundoff error analysis of the fast Fourier transform %A Ramos, George U. %D February 1970 %X This paper presents an analysis of roundoff errors occurring in the floating-point computation of the fast Fourier transform. Upper bounds are derived for the ratios of the root-mean-square (RMS) and maximum roundoff errors in the output data to the RMS value of the input data for both single and multidimensional transformations. These bounds are compared experimentally with actual roundoff errors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/146/CS-TR-70-146.pdf %R CS-TR-70-147 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Pitfalls in computation, or why a math book isn't enough %A Forsythe, George E. %D January 1970 %X The floating-point number system is contrasted with the real numbers. The author then illustrates the variety of computational pitfalls a person can fall into who merely translates information gained from pure mathematics courses into computer programs. Examples include summing a Taylor series, solving a quadratic equation, solving linear algebraic systems, solving ordinary and partial differential equations, and finding polynomial zeros. It is concluded that mathematics courses should be taught with a greater awareness of automatic computation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/147/CS-TR-70-147.pdf %R CS-TR-70-150 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Elementary proof of the Wielandt-Hoffman Theorem and of its generalization %A Wilkinson, James H. %D January 1970 %X An elementary proof is given of the Wielandt-Hoffman Theorem for normal matrices and of a generalization of this theorem. The proof makes no direct appeal to results from linear-programming theory. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/150/CS-TR-70-150.pdf %R CS-TR-70-151 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T "On the Properties of the Derivatives of the Solutions of Laplace's Equation and the Errors of the Method of Finite Differences for Boundary Values in $C_2$ and $C_{1,1}$" by E. A. Volkov %A Volkov, E. A. %A Forsythe, George E. %D January 1970 %X If a function u is harmonic in a circular disk and its boundary values are twice continuously differentiable, u need not have bounded second derivatives in the open disk. For the Dirichlet problem for Laplace's equation in a more general two-dimensional region the discretization error of the ordinary method of finite differences is studied, when Collatz's method of linear interpolation is used at the boundary. If the boundary of the region has a tangent line whose angle satisfies a Lipschitz condition, and if the boundary values have a first derivative satisfying a Lipschitz condition, then the discretization error is shown to be of order $h^2 ln h^{-1}$. This bound is shown to be sharp. By a different method of interpolation at the boundary one can improve the bound to o($h^2$). There are other similar results. Translated by G. E. Forsythe. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/151/CS-TR-70-151.pdf %R CS-TR-70-155 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The method of odd/even reduction and factorization with application to Poisson's equation, part II %A Buzbee, B. L. %A Golub, Gene H. %A Nielson, C. W. %D March 1970 %X In this paper, we derive and generalize the methods of Buneman for solving elliptic partial difference equations in a rectangular region. We show why the Buneman methods lead to numerically accurate solutions whereas the CORF algorithm may be numerically unstable. Several numerical examples are given and discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/155/CS-TR-70-155.pdf %R CS-TR-70-156 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On a model for computing round-off error of a sum %A Dantzig, George B. %D March 1970 %X No abstract available. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/156/CS-TR-70-156.pdf %R CS-TR-70-157 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Algorithms for matrix multiplication %A Brent, Richard P. %D March 1970 %X Strassen's and Winograd's algorithms for matrix multiplication are investigated and compared with the normal algorithm. Floating-point error bounds are obtained, and it is shown that scaling is essential for numerical accuracy using Winograd's method. In practical cases Winograd's method appears to be slightly faster than the other two methods, but the gain is, at most, about 20%. Finally, an attempt to generalize Strassen's method is described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/157/CS-TR-70-157.pdf %R CS-TR-70-159 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The use of direct methods for the solution of the discrete Poisson equation on non-rectangular regions %A George, John Alan %D June 1970 %X Some direct and iterative schemes are presented for solving a standard finite-difference scheme for Poisson's equation on a two-dimensional bounded region R with Dirichlet conditions specified on the boundary $\delta$R. These procedures make use of special-purpose direct methods for solving rectangular Poisson problems. The region is imbedded in a rectangle and a uniform mesh is superimposed on it. The usual five-point Poisson difference operator is applied over the whole rectangle, yielding a block-tridiagonal system of equations. The original problem, however, determines only the elements of the right-hand side which correspond to grid points lying within $\delta$R; the remaining elements can be treated as parameters. The iterative algorithms construct a sequence of right-hand sides in such a way that the corresponding sequence of solutions on the rectangle converges to the solution of the imbedded problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/159/CS-TR-70-159.pdf %R CS-TR-70-160 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A model for parallel computer systems %A Bredt, Thomas H. %A McCluskey, Edward J. %D April 1970 %X A flow table model is defined for parallel computer systems. In this model, fundamental-mode flow tables are used to describe the operation of system componenets, which may be programs or circuits. Components communicate by changing the values on interconnecting lines which carry binary level signals. It is assumed that there is no bound on the time for value changes to propagate over the interconnecting lines. Given this delay assumption, it is necessary to specify a mode of operation for system components such that input changes which arrive while a component is unstable do not affect the operation of the component. Such a mode of operation is specified. Using the flow table model, a new control algorithm for the two-process mutual exclusion problem is designed. This algorithm does not depend on the exclusive execution of any primitive operations used in its implementation. A circuit implementation of the control algorithm is described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/160/CS-TR-70-160.pdf %R CS-TR-70-162 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical techniques in mathematical programming %A Bartels, Richard H. %A Golub, Gene H. %A Saunders, Michael A. %D May 1970 %X The application of numerically stable matrix decompositions to minimization problems involving linear constraints is discussed and shown to be feasible without undue loss of efficiency. Part A describes computation and updating of the product-form of the LU decomposition of a matrix and shows it can be applied to solving linear systems at least as efficiently as standard techniques using the product-form of the inverse. Part B discusses orthogonalization via Householder transformations, with applications to least squares and quadratic programming algorithms based on the principal pivoting method of Cottle and Dantzig. Part C applies the singular value decomposition to the nonlinear least squares problem and discusses related eigenvalue problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/162/CS-TR-70-162.pdf %R CS-TR-70-163 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T An algorithm for floating-point accumulation of sums with small relative error %A Malcolm, Michael A. %D June 1970 %X A practical algorithm for floating-point accumulation is presented. Through the use of multiple accumulators, errors due to cancellation are avoided. An example in Fortran is included. An error analysis providing a sharp bound on the relative error is also given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/163/CS-TR-70-163.pdf %R CS-TR-70-164 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T "Estimates of the Roundoff Error in the Solution of a System of Conditional Equations" by V. I. Gordonova %A Gordonova, V. I. %A Kaufman, Linda C. %D June 1970 %X Using backward error analysis, this paper compares the roundoff error in the least-squares solution of a system of conditional equations Ax=f by two different methods. The first one entails solving the normal equations $A^T$Ax=$A^T$f and the second is one proposed by Faddeev, Faddeeva, and Kublanovskaya in 1966. This latter method involves multiplying the system by orthogonal matrices to transform the matrix A into upper triangular form. Translated by Linda Kaufman. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/164/CS-TR-70-164.pdf %R CS-TR-70-165 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The scheduling of n tasks with m operations on two processors %A Bauer, Henry R. %A Stone, Harold S. %D July 1970 %X The job shop problem is one scheduling problem for which no efficient algorithm exists. That is, no algorithm is known in which the number of computational steps grow algebraically as the problem enlarges. This paper presents a discussion of the problem of scheduling N tasks on two processors when each task consists of three operations. The operations of each task must be performed in order and among the processors. We analyze this problem through four sub-problems. Johnson's scheduling algorithm is generalized to solve two of these sub-problems, and functional equation algorithms are used to solve the remaining two problems. Except for one case, the algorithms are efficient. The exceptional case has been labelled the "core" problem and the difficulties are described. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/165/CS-TR-70-165.pdf %R CS-TR-70-170 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Analysis and synthesis of concurrent sequential programs %A Bredt, Thomas H. %D May 1970 %X This paper presents analysis and synthesis procedures for a class of sequential programs. These procedures aid in the design of programs for parallel computer systems. In particular, the interactions of a given program with other programs or circuits in a system can be described precisely. The basis for this work is a model for parallel computer systems in which the operation of each component is described by a flow table and the components interact by changing values on interconnecting lines. The details of this model are discussed in another paper [Stanford University Department of Computer Science report STAN-CS-70-160]. The analysis procedure produces a flow table description of a program. In program synthesis, a flow table description is converted to a sequential program. Using flow table design procedures, a control program for the two-program mutual exclusion problem is produced. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/170/CS-TR-70-170.pdf %R CS-TR-70-171 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A survey of models for parallel computing %A Bredt, Thomas H. %D August 1970 %X The work of Adams, Karp and Miller, Luconi, and Rodriguez on formal models for parallel computations and computer systems is reviewed. A general definition of a parallel schema is given so that the similarities and differences of the models can be discussed. Primary emphasis is on the control structures used to achieve parallel operation and on properties of the models such as determinacy and equivalence. Decidable and undecidable properties are summarized. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/171/CS-TR-70-171.pdf %R CS-TR-70-172 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Analysis of parallel systems %A Bredt, Thomas H. %D August 1970 %X A formal analysis procedure for parallel computer systems is presented. The flow table model presented in an earlier paper [Stanford University Department of Computer Science report STAN-CS-70-160] is used to describe a system. Each component to the system is described by a completely specified fundamental-mode flow table. All delays in a parallel system are assumed to be finite. Component delays are assumed to be bounded and line delays unbounded. The concept of an output hazard is introduced to account for the effects of line delay and the lack of synchronization among components. Necessary and sufficient conditions for the absence of output hazards are given. The state of a parallel system is defined by the present internal state and input state of each component. The operation of the system is described by a system state graph which specifies all possible state transitions for a specified initial system state. A procedure for constructing the system state graph is given. The analysis procedure may be summarized as follows. A problem is stated in terms of restrictions on system operation. A parallel system is said to operate correctly with respect to the given problem if the associated restrictions are always satisfied. The restrictions specify either forbidden system states, which are never to be entered during the operation of the system, or forbidden system state sequences, which must never appear during system operation. The restrictions are tested by examining the system state graph. A parallel system for the two-process mutual exclusion problem is analyzed and the system is shown to operate correctly with respect to this problem. Finally, the conditions of determinacy and output functionality, which have been used in other models of parallel computing, are discussed as they relate to correct solutions to the mutual exclusion problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/172/CS-TR-70-172.pdf %R CS-TR-70-173 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The mutual exclusion problem %A Bredt, Thomas H. %D August 1970 %X This paper discusses how n components, which may be programs or circuits, in a computer system can be controlled so that (1) at most one component may perform a designated "critical" operation at any instant and (2) if one component wants to perform its critical operation, it is eventually allowed to do so. This control problem is known as the mutual exclusion or interlock problem. A summary of the flow table model [Stanford University Department of Computer Science report STAN-CS-70-160] for computer systems is given. In this model, a control algorithm is represented by a flow table. The number of internal states in the control flow table is used as a measure of the complexity of control algorithms. A lower bound of n + 1 internal states is shown to be necessary if the mutual exclusion problem is to be solved. Procedures to generate control flow tables for the mutual exclusion problem which require the minimum number of internal states are described and it is proved that these procedures given correct control solutions. Other so-called "unbiased" algorithms are described which require 2.n! internal states but break ties in the case of multiple requests in favor of the component that least recently executed its critical operation. The paper concludes with a discussion of the tradeoffs between central and distributed control algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/173/CS-TR-70-173.pdf %R CS-TR-70-174 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Towards automatic program synthesis %A Manna, Z ohar %A Waldinger, Richard J. %D July 1970 %X An elementary outline of the theorem-proving approach to automatic program synthesis is given, without dwelling on technical details. The method is illustrated by the automatic construction of both recursive and iterative programs operating on natural numbers, lists, and trees. In order to construct a program satisfying certain specifications, a theorem induced by those specifications is proved, and the desired program is extracted from the proof. The same technique is applied to transform recursively defined functions into iterative programs, frequently with a major gain in efficiency. It is emphasized that in order to construct a program with loops or with recursion, the principle of mathematical induction must be applied. The relation between the version of the induction rule used and the form of the program constructed is explored in some detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/174/CS-TR-70-174.pdf %R CS-TR-70-175 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A description and comparison of subroutines for computing Euclidean inner products on the IBM 360 %A Malcolm, Michael A. %D October 1970 %X Several existing subroutines and an Algol W procedure for computing inner products on the IBM 360, using more precision than long, are described and evaluated. Error bounds (when they exist) and execution timing tests are included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/175/CS-TR-70-175.pdf %R CS-TR-70-176 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T On generality and problem solving: a case study using the DENDRAL program %A Feigenbaum, Edward A. %A Buchanan, Bruce G. %A Lederberg, Joshua %D August 1970 %X Heuristic DENDRAL is a computer program written to solve problems of inductive inference in organic chemistry. This paper will use the design of Heuristic DENDRAL and its performance on different problems for a discussion of the following topics: 1. the design for generality; 2. the performance problems attendant upon too much generality; 3. the coupling of expertise to the general problem solving processes; 4. the symbiotic relationship between generality and expertness, and the implications of this symbiosis for the study and design of problem solving systems. We conclude the paper with a view of the design for a general problem solver that is a variant of the "big switch" theory of generality. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/176/CS-TR-70-176.pdf %R CS-TR-70-178 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Research in the Computer Science Department and selected other research in computing at Stanford University %A Forsythe, George E. %A Miller, William F. %D October 1970 %X The research program of the Computer Science Department can perhaps be best summarized in terms of its research projects. The chart on page ii lists the projects and the participation by faculty and students. The sections following the chart provide descriptions of the individual projects. There are a number of projects in other schools or departments which are making significant contributions to computer science; and these add to the total computer environment. Descriptions of a few of these projects are also included with this report. This list of projects outside of Computer Science does not purport to be complete or even representative. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/178/CS-TR-70-178.pdf %R CS-TR-70-179 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MLISP %A Smith, David Canfield %D October 1970 %X MLISP is a high level list-processing and symbol-manipulation language based on the programming language LISP. MLISP programs are translated into LISP programs and then executed or compiled. MLISP exists for two purposes: (1) to facilitate the writing and understanding of LISP programs; (2) to remedy certain important deficiencies in the list-processing ability of LISP. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/179/CS-TR-70-179.pdf %R CS-TR-70-183 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Machine learning through signature trees: applications to human speech %A White, George M. %D October 1970 %X Signature tree "machine learning", pattern recognition heuristics are investigated for the specific problem of computer recognition of human speech. When the data base of given utterances is insufficient to establish trends with confidence, a large number of feature extractors must be employed and "recognition" of an unknown pattern made by comparing its feature values with those of known patterns. When the data base is replete, a "signature" tree can be constructed and recognition can be achieved by the evaluation of a select few features. Learning results from selecting an optimal minimal set of features to achieve recognition. Properties of signature trees and the heuristics for this type of learning are of primary interest in this exposition. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/183/CS-TR-70-183.pdf %R CS-TR-70-184 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A note on a conjecture of L. J. Mordell %A Malcolm, Michael A. %D November 1970 %X A computer proof is described for a previously unsolved problem concerning the inequality $\sum{i=1}{n} x_i/(x_{i+1}\ + x_{i+2}) \geq\ frac{n}{2}$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/184/CS-TR-70-184.pdf %R CS-TR-70-185 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Graph program simulation %A Nelson, Edward C. %D October 1970 %X This reports the simulation of a parallel processing system based on a directed graph representation of parallel computations. The graph representation is based on the model developed by Duane Adams in which programs are written as directed graphs whose nodes represent operations and whose edges represent data flow. The first part of the report describes a simulator which interprets these graph programs. The second part describes the use of the simulator in a hypothetical environment which has an unlimited number of processors and an unlimited amount of memory. Three programs, a trapezoidal quadrature, a sort and a matrix multiplication, were used to study the effect of varying the relative speed of primitive operations on computation time with problem size. The system was able to achieve a high degree of parallelism. For example, the simulator multiplied two n by n matrices in a simulated time proportional to n. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/185/CS-TR-70-185.pdf %R CS-TR-70-187 %Z Mon, 06 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MPL, Mathematical Programming Language: specification manual for Committee review %A Eisenstat, Stanley C. %A Magnanti, Thomas L. %A Maier, Steven F. %A McGrath, Michael B. %A Nicholson, Vincent J. %A Riedl, Christiane %A Dantzig, George B. %D November 1970 %X Mathematical Programming Language (MPL) is intended as a highly readable, user oriented, programming tool for use in the writing and testing of mathematical algorithms, in particular experimental algorithms for solving large-scale linear programs. It combines the simplicity of standard mathematical notation with the power of complex data structures. Variables may be implicitly introduced into a program by their use in the statement in which they first appear. No formal defining statement is necessary. Statements of the "let" and "where" type are part of the language. Included within the allowable data structures of MPL are matrices, partitioned matrices, and multidimensional arrays. Ordered sets are included as vectors with their constructs closely paralleling those found in set theory. Allocation of storage is dynamic, thereby eliminating the need for a data manipulating subset of the language, as is characteristic of most high level scientific programming languages. This report summarizes the progress that has been made to date in developing MPL. It contains a specification manual, examples of the application of the language, and the future directions and goals of the project. A version of MPL, called MPL/70, has been implemented using PL/I as a translator. This will be reported separately. Until fully implemented, MPL is expected to serve primarily as a highly readable communication language for mathematical algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/70/187/CS-TR-70-187.pdf %R CS-TR-69-120 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MUTANT 0.5: an experimental programming language %A Satterthwaite, Edwin H. %D February 1969 %X A programming language which continues the extension and simplification of ALGOL 60 in the direction suggested by EULER is defined and described. Techniques used in an experimental implementation of that language, called MUTANT 0.5, are briefly summarized. The final section of this report is an attempt to assess the potential value of the approach to procedural programming language design exemplified by MUTANT 0.5. Implementation and use of the experimental system have indicated a sufficient number of conceptual and practical problems to suggest that the general approach is of limited value; however, a number of specific features were found to be convenient, useful, and adaptable to other philosophies of language design. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/120/CS-TR-69-120.pdf %R CS-TR-69-121 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Accurate bounds for the eigenvalues of the Laplacian and applications to rhombical domains %A Moler, Cleve B. %D February 1969 %X We deal with the eigenvalues and eigenfunctions of Laplace's differential operator on a bounded two-dimensional domain G with zero values on the boundary. The paper describes a new technique for determining the coefficients in the expansion of an eigenfunction in terms of particular eigenfunctions of the differential operator. The coefficients are chosen to make the sum of the expansion come close to satisfying the boundary conditions. As an example, the eigenvalues and eigenfunctions are determined for a rhombical membrane. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/121/CS-TR-69-121.pdf %R CS-TR-69-122 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Heuristic analysis of numerical variants of the Gram-Schmidt orthonormalization process %A Mitchell, William C. %A McCraith, Douglas L. %D February 1969 %X The Gram-Schmidt orthonormalization process is a fundamental formula of analysis which is notoriously unstable computationally. This report provides a heuristic analysis of the process, which shows why the method is unstable. Formulas are derived which describe the propagation of round-off error through the process. These formulas are supported by numerical experiments. These formulas are then applied to a computational variant of a basic method proposed by John R. Rice, and this method is shown to offer significant improvement over the basic algorithm. This finding is also supported by numerical experiment. The formulas for the error propagation are then used to produce a linear corrector for the basic Gram-Schmidt process, which shows significant improvement over both previous methods, but at the cost of slightly more computations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/122/CS-TR-69-122.pdf %R CS-TR-69-124 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Matrix decompositions and statistical calculations %A Golub, Gene H. %D March 1969 %X Several matrix decompositions which are of some interest in statistical calculations are presented. An accurate method for calculating the canonical correlation is given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/124/CS-TR-69-124.pdf %R CS-TR-69-125 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Grammatical complexity and inference %A Feldman, Jerome A. %A Gips, James %A Horning, James J. %A Reder, Stephen %D June 1969 %X The problem of inferring a grammar for a set of symbol strings is considered and a number of new decidability results obtained. Several notions of grammatical complexity and their properties are studied. The question of learning the least complex grammar for a set of strings is investigated leading to a variety of positive and negative results. This work is part of a continuing effort to study the problems of representation and generalization through the grammatical inference question. Appendices A and B and Section 2a.0 are primarily the work of Reder, Sections 2b and 3d of Horning, Section 4 and Appendix C of Gips, and the remainder the responsibility of Feldman. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/125/CS-TR-69-125.pdf %R CS-TR-69-126 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complementary spanning trees %A Dantzig, George B. %D March 1969 %X Given a network G whose arcs partition into non-overlapping 'clubs' (sets) $R_i$, D. Ray Fulkerson has considered the problem of constructing a spanning tree such that no two of its arcs belong to (represent) the same club and has stated necessary and sufficient conditions for such trees to exist. When each club $R_i$ consists of exactly two arcs, we shall refer to each of the arc pair as the 'complement' of the other, and the representative tree as a complementary tree. Our objective is to prove the following theorem: If there exists one complementary tree, there exists at least two. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/126/CS-TR-69-126.pdf %R CS-TR-69-128 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The method of odd/even reduction and factorization with application to Poisson's equation %A Buzbee, B. L. %A Golub, Gene H. %A Nielson, C. W. %D April 1969 %X Several algorithms are presented for solving block tridiagonal systems of linear algebraic equations when the matrices on the diagonal are equal to each other and the matrices on the subdiagonals are all equal to each other. It is shown that these matrices arise from the finite difference approximation to certain elliptic partial differential equations on rectangular regions. Generalizations are derived for higher order equations and non-rectangular regions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/128/CS-TR-69-128.pdf %R CS-TR-69-129 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Research in the Computer Science Department, Stanford University %A Miller, William F. %D April 1969 %X The research program of the Computer Science Department can perhaps be best summarized in terms of its research projects. The chart on the following page lists the projects and the participation by faculty and students. Two observations should be made to complete the picture. Within the Artificial Intelligence Project, the Stanford Computation Center, the SLAC Computation Group, and the INFO project, there are a large number of highly competent professional computer scientists who add greatly to the total capability of the campus. Also, there are a number of projects in other schools or departments which are making significant contributions to computer science. These, too, add to the total computer environment. Summarized by Professor W. F. Miller. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/129/CS-TR-69-129.pdf %R CS-TR-69-134 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Linear least squares and quadratic programming %A Golub, Gene H. %A Saunders, Michael A. %D May 1969 %X Several algorithms are presented for solving linear least squares problems; the basic tool is orthogonalization techniques. A highly accurate algorithm is presented for solving least squares problems with linear inequality constraints. A method is also given for finding the least squares solution when there is a quadratic constraint on the solution. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/134/CS-TR-69-134.pdf %R CS-TR-69-135 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T CIL: Compiler Implementation Language %A Gries, David %D May 1969 %X This report is a manual for the proposed Compiler Implementation Language, CIL. It is not an expository paper on the subject of compiler writing or compiler-compilers. The language definition may change as work progresses on the project. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/135/CS-TR-69-135.pdf %R CS-TR-69-137 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Fixed points of analytic functions %A Henrici, Peter %D July 1969 %X A continuous mapping of a simply connected, closed, bounded set of the euclidean plane into itself is known to have at least one fixed point. It is shown that the usual condition for the fixed point to be unique, and for convergence of the iteration sequence to the fixed point, can be relaxed if the mapping is defined by an analytic function of a complex variable. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/137/CS-TR-69-137.pdf %R CS-TR-69-141 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Bounds for the error of linear systems of equations using the theory of moments %A Dahlquist, Germund %A Eisenstat, Stanley C. %A Golub, Gene H. %D October 1969 %X Consider the system of linear equations $A\underset ~\to x = \underset ~\to b$ where A is an n$\times$n real symmetric, positive definite matrix and $\underset ~\to b$ is a known vector. Suppose we are given an approximation to $\underset ~\to x$, $\underset ~\to \xi$, and we wish to determine upper and lower bounds for $\Vert \underset ~\to x\ - \underset ~\to \xi \Vert$ where $\Vert ...\Vert$ indicates the euclidean norm. Given the sequence of vectors ${\{ {\underset ~\to r}_i \} }^{k}_{i=0}$ where ${\underset ~\to r}_i\ = A{\underset ~\to r}_{i-1}$ and ${\underset ~\to r}_o\ = \underset ~\to b -A\underset ~\to \xi$, it is shown how to construct a sequence of upper and lower bounds for $\Vert \underset ~\to x\ - \underset ~\to \xi \Vert$ using the theory of moments. In addition, consider the Jacobi algorithm for solving the system $\underset ~\to x\ = M\underset ~\to x +\underset ~\to b \underline{viz.} {\underset ~\to x}_{i+1} = M{\underset ~\to x}_i +\underset ~\to b$. It is shown that by examining ${\underset ~\to \delta}_i\ = {\underset ~\to x}_{i+1} - {\underset ~\to x}_i , it is possible to construct upper and lower bounds for $\Vert {\underset ~\to x}_i -\underset ~\to x \Vert$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/141/CS-TR-69-141.pdf %R CS-TR-69-142 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stationary values of the ratio of quadratic forms subject to linear constraints %A Golub, Gene H. %A Underwood, Richard R. %D November 1969 %X Let A be a real symmetric matrix of order n, B a real symmetric positive definite matrix of order n, and C an n$\times$p matrix of rank r with r $\leq$ p < n. We wish to determine vectors $\underset ~\to x$ for which ${\underset ~\to x}^T\ A\underset ~\to x\ / {\underset ~\to x}^T\ B\underset ~\to x$ is stationary and $C^T \underset ~\to x\ = \underset ~\to \Theta$, the null vector. An algorithm is given for generating a symmetric eigensystem whose eigenvalues are the stationary values and for determining the vectors $\underset ~\to x$. Several Algol procedures are included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/142/CS-TR-69-142.pdf %R CS-TR-69-144 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The maximum and minimum of a positive definite quadratic polynomial on a sphere are convex functions of the radius %A Forsythe, George E. %D July 1969 %X It is proved that in euclidean n-space the maximum M($\rho$) and minimum m($\rho$) of a fixed positive definite quadratic polynomial Q on spheres with fixed center are both convex functions of the radius $\rho$ of the sphere. In the proof, which uses elementary calculus and a result of Forsythe and Golub, $m^" (\rho) and M^" (\rho)$ are shown to exist and lie in the interval [$2{\lambda}_1 ,2{\lambda}_n$], where ${\lambda}_i$ are the eigenvalues of the quadratic form of Q. Hence $m^" (\rho) > 0 and M^" (\rho) > 0$. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/144/CS-TR-69-144.pdf %R CS-TR-69-145 %Z Mon, 27 Nov 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Methods of search for solving polynomial equations %A Henrici, Peter %D December 1969 %X The problem of determining a zero of a given polynomial with guaranteed error bounds, using an amount of work that can be estimated a priori, is attacked here by means of a class of algorithms based on the idea of systematic search. Lehmer's "machine method" for solving polynomial equations is a special case. The use of the Schur-Cohn algorithm in Lehmer's method is replaced by a more general proximity test which reacts positively if applied at a point close to a zero of a polynomial. Various such tests are described, and the work involved in their use is estimated. The optimality and non-optimality of certain methods, both on a deterministic and on a probabilistic basis, are established. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/69/145/CS-TR-69-145.pdf %R CS-TR-68-83 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Iterative refinements of linear least squares solutions by Householder transformations %A Bjorck, Ake %A Golub, Gene H. %D January 1968 %X An algorithm is presented in ALGOL for iteratively refining the solution to a linear least squares problem with linear constraints. Numerical results presented show that a high degree of accuracy is obtained. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/83/CS-TR-68-83.pdf %R CS-TR-68-84 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A computer system for transformational grammar %A Friedman, Joyce %D January 1968 %X A comprehensive system for transformational grammar has been designed and is being implemented on the IBM 360/67 computer. The system deals with the transformational model of syntax, along the lines of Chomsky's "Aspects of the Theory of Syntax." The major innovations include a full and formal description of the syntax of a transformational grammar, a directed random phrase structure generator, a lexical insertion algorithm, and a simple problem-oriented programming language in which the algorithm for application of transformations can be expressed. In this paper we present the system as a whole, first discussing the philosophy underlying the development of the system, then outlining the system and discussing its more important special features. References are given to papers which consider particular aspects of the system in detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/84/CS-TR-68-84.pdf %R CS-TR-68-85 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computer-aided language development in nonspeaking mentally disturbed children %A Colby, Kenneth Mark %D December 1967 %X Experience with a computer-based method for aiding language development in nonspeaking mentally disturbed children is described. Out of a group of 10 children 8 improved linguistically while 2 were unimproved. Problems connected with the method and its future prospects are briefly discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/85/CS-TR-68-85.pdf %R CS-TR-68-86 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ALGOL W %A Bauer, Henry R. %A Becker, Sheldon I. %A Graham, Susan L. %D January 1968 %X The textbook "Introduction to Algol" by Baumann, Feliciano, Bauer, and Samelson describes the internationally recognized language ALGOL 60 for algorithm communication. ALGOL W can be viewed as an extension of ALGOL. This document consists of (1) "Algol W Notes for Introductory Computer Science Courses" [by Henry R. Bauer, Sheldon Becker, and Susan L. Graham] which describes the differences between ALGOL 60 and ALGOL W and presents the new features of ALGOL W; (2) "Deck Set-Up"; (3) "Algol W Language Description" [by Henry R. Bauer, Sheldon Becker, and Susan L. Graham], a complete syntactic and semantic description of the language; (4) "Unit Record Equipment"; and (5) "Error Message." %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/86/CS-TR-68-86.pdf %R CS-TR-68-87 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T CS139 lecture notes. Part I: Sections 1 thru 21. Preliminary version %A Ehrman, John R. %D June 1968 %X These notes are meant to provide an introduction to the IBM System/360 which will help the reader to understand and to make effective use of the capabilities of both the machinery and some of its associated service programs. They are largely self-contained, and in general the reader should need to make only occasional reference to the "System/360 Principles of Operation" manual (IBM File No. S360-01, Form A22-6821) and to the "Operating System/360 Assembler Language" manual (IBM File No. S360-21, Form C28-6514). %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/87/CS-TR-68-87.pdf %R CS-TR-68-88 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Relaxation methods for convex problems %A Schechter, Samuel %D February 1968 %X Extensions and simplifications are made for convergence proofs of relaxation methods for nonlinear systems arising from the minimization of strictly convex functions. This work extends these methods to group relaxation, which includes an extrapolated form of Newton's method, for various orderings. A relatively simple proof is given for cyclic orderings, sometimes referred to as nonlinear overrelaxation, and for residual orderings where an error estimate is given. A less restrictive choice of relaxation parameter is obtained than that previously. Applications are indicated primarily to the solution of nonlinear elliptic boundary problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/88/CS-TR-68-88.pdf %R CS-TR-68-89 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ALGOL W (revised) %A Bauer, Henry R. %A Becker, Sheldon I. %A Graham, Susan L. %A Forsythe, George E. %A Satterthwaite, Edwin H. %D March 1968 %X The textbook "Introduction to Algol" by Baumann, Feliciano, Bauer, and Samelson describes the internationally recognized language ALGOL 60 for algorithm communication. ALGOL W can be viewed as an extension of ALGOL. This document consists of (1) "Algol W Deck Set-Up" [by E.H. Satterthwaite, Jr.]; (2) "Algol W Language Description" [by Henry R. Bauer, Sheldon Becker, and Susan L. Graham], a complete syntactic and semantic description of the language; (3) "Algol W Error Messages" [by Henry R. Bauer, Sheldon Becker, and Susan L. Graham]; (4) "Algol W Notes for Introductory Computer Science Courses" [by Henry R. Bauer, Sheldon Becker, and Susan L. Graham] which describes the differences tween ALGOL 60 and ALGOL W and presents the new features of ALGOL W; and (5) "Notes on Number Representation on System/360 and relations to Algol W" [by George E. Forsythe]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/89/CS-TR-68-89.pdf %R CS-TR-68-90 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A multi-level computer organization designed to separate data-accessing from the computation %A Lesser, Victor R. %D March 1968 %X The computer organization to be described in this paper has been developed to overcome the inflexibility of computers designed around a few fixed data structures, and only binary operations. This has been accomplished by separating the data-accessing procedures from the computational algorithm. By this separation, a new and different language may be used to express data-accessing procedures. The new language has been designed to allow the programmer to define the procedures for generating the names of the operands for each computation, and locating the value of an operand given its name. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/90/CS-TR-68-90.pdf %R CS-TR-68-91 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The PL360 system %A Wirth, Niklaus E. %A Wells, Joseph W. %A Satterthwaite, Edwin H. %D April 1968 %X This report describes the use of two operating systems which serve as environments for the PL360 language defined in the companion report [Niklaus Wirth, "A Programming Language for the 360 Computers," Stanford University Computer Science Department report CS 53 (revised), June 1967]. Some additions to that language, not described in CS 53, are documented in the Appendix. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/91/CS-TR-68-91.pdf %R CS-TR-68-98 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ALGOL W implementation %A Bauer, Henry R. %A Becker, Sheldon I. %A Graham, Susan L. %D May 1968 %X In writing a compiler of a new language (ALGOL W) for a new machine (IBM System/360) we were forced to deal with many unforeseen problems in addition to the problems we expected to encounter. This report describes the final version of the compiler. The implemented language ALGOL W is based on the Wirth/Hoare proposal for a successor to ALGOL 60. The major differences from that proposal are in string definition and operations and in complex number representation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/98/CS-TR-68-98.pdf %R CS-TR-68-100 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A computer model of information processing in children %A Bredt, Thomas H. %D June 1968 %X A model of cognitive information processing has been constructed on the basis of a protocol gathered from a child taking an object association test. The basic elements of the model are a graph-like data base and strategy. The data base contains facts that relate objects in the experiment. The graph distance that separates two objects in the data base is the measure of how well a relation is known. The strategy used in searching for facts that relate two objects is sequential in nature. The model has been programmed for computer testing in the LISP programming language. The responses of the computer model and the original subject are compared. To aid in the model evaluation a revised test was defined and administered to two children. The results were modeled and the correspondence of model and subject performance is discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/100/CS-TR-68-100.pdf %R CS-TR-68-92 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MLISP %A Enea, Horace J. %D March 1968 %X Mlisp is an Algol-like list processing language based on Lisp 1.5. It is currently implemented on the IBM 360/67 at the Stanford Computation Center, and is being implemented on the DEC PDP-6 at the Stanford Artificial Intelligence Project. The balance of this paper is a very informal presentation of the language so that the reader will be able to run programs in Mlisp with a minimum of effort. The language has an extremely simple syntax which is presented in Appendix I. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/92/CS-TR-68-92.pdf %R CS-TR-68-95 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A formal syntax for transformational grammar %A Friedman, Joyce %A Doran, Robert W. %D March 1968 %X A formal definition of the syntax of a transformational grammar is given using a modified Backus Naur Form as the metalanguage. Syntax constraints and interpretation are added in English. The underlying model is that presented by Chomsky in "Aspects of the Theory of Syntax." Definitions are given for the basic concepts of tree, analysis, restriction, complex symbol, and structural change, as well as for the major components of a transformational grammar, phrase structure, lexicon, and transformations. The syntax was developed as a specification of input formats for the computer system for transformational grammar described in [Joyce Friedman, "A Computer System for Transformational Grammar," Stanford University Computer Science Department report CS-84, January 1968]. It includes as a subcase a fairly standard treatment of transformational grammar, but has been generalized in many respects. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/95/CS-TR-68-95.pdf %R CS-TR-68-96 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interval arithmetic determinant evaluation and its use in testing for a Chebyshev system %A Smith, Lyle B. %D April 1968 %X Two recent papers by Hansen and by Hansen and R. R. Smith have shown how interval arithmetic (I.A.) can be used effectively to bound errors in matrix computations. This paper compares a method proposed by Hansen and R. R. Smith to straight-forward use of I.A. in determinant evaluation. Computational results show what accuracy and running times can be expected when using I.A. for determinant evaluation. An application using I.A. determinants in a program to test a set of functions to see if they form a Chebyshev system is then presented. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/96/CS-TR-68-96.pdf %R CS-TR-68-113 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T The impact of storage management on plex processing language implementation %A Hansen, Wildred J. %D July 1969 %X A plex processing system is implemented within a set of environments whose relationships are vital to the system's time/space efficiency: Data Environment Stack Structures Data Structures Subroutine Environment Routine Linkage Variable Binding Storage Management Environment Memory Organization for Allocation Storage Control This paper discusses these environments and their relationships in detail. For each environment there is some discussion of alternative implementation techniques, the dependence of the implementation on the hardware, and the dependence of the environment on the language design. In particular, two language features are shown to affect substantially the environment design: variable length plexes and 'release' of active plexes. Storage management is complicated by the requirement for variable length plexes, but they can substantially reduce memory requirements. If inactive plexes are released, a garbage collector can be avoided; but considerable tedious programming may be required to maintain the status of each plex. Many plex processing systems store numbers in strange formats and compile arithmetic operations as subroutine calls, thus handicapping the computer on the only operations it does well. Careful coordination of the system environments can permit direct numeric computation, that is, a single instruction for each arithmetic operation. This paper considers with each environment, the requirements for direct numeric computation. To explore the techniques discussed, a collection of environments called Swym was implemented. This system permits variable length plexes and compact lists. The latter is a list representation requiring less space than chained lists because pointers to the elements are stored in consecutive words. In Swym, a list can be partly compact and partly chained. The garbage collector converts chained lists into compact lists when possible. Swym has careful provision for direct numeric computation, but no compiler has been built. To illustrate Swym, an interpreter was implemented for a small language similar to LISP 1.5. Details of Swym and the langauge are in a series of appendices. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/113/CS-TR-68-113.pdf %R CS-TR-68-115 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Programmers manual for a computer system for transformational grammar %A Friedman, Joyce %A Bredt, Thomas H. %A Doran, Robert W. %A Martner, Theodore S. %A Pollack, Bary W. %D August 1968 %X This volume provides programming notes on a computer system for transformational grammar. The important ideas of the system have been presented in a series of reports which are listed in Appendix B; this document is the description of the system as a program. It is intended for programmers who might wish to maintain, modify or extend the system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/115/CS-TR-68-115.pdf %R CS-TR-68-102 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Integer programming over a cone %A Pnueli, Amir %D July 1968 %X The properties of a special form integer programming problem are discussed. We restrict ourselves to optimization over a cone (a set of n constraints in n unconstrained variables) with a square matrix of positive diagonal and non positive off-diagonal elements. (Called a bounding form by F. Glover [1964]). It is shown that a simple iterational process gives the optimal integer solution in a finite number of steps. It is then shown that any cone problem with bounded rational solution can be transformed to the bounding form and hence solved by the outlined method. Some extensions to more than n constraints are discussed and a numerical example is shown to solve a bigger problem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/102/CS-TR-68-102.pdf %R CS-TR-68-103 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lexical insertion in transformational grammar %A Friedman, Joyce %A Bredt, Thomas H. %D June 1968 %X In this paper, we describe the lexical insertion process for generative transformational grammars. We also give detailed descriptions of many of the concepts in transformational theory. These include the notions of complex symbol, syntactic feature (particularly contextual feature), redundancy rule, tests for pairs of complex symbols, and change operations that may be applied to complex symbols. Because of our general interpretation of redundancy rules, we define a new complex symbol test known as compatibility. This test replaces the old notion of nondistinctness. The form of a lexicon suitable for use with a generative grammar is specified. In lexical insertion, vocabulary words and associated complex symbols are selected from a lexicon and inserted at lexical category nodes in the tree. Complex symbols are lists of syntactic features. The compatibility of a pair of complex symbols and the analysis procedure used for contextual features are basic in determining suitable items for insertion. Contextual features (subcategorization and selectional) have much in common with the structural description for a transformation and we use the same analysis procedure for both. A problem encountered in the insertion of a complex symbol that contains selectional features is side effects. We define the notion of side effects and describe how these effects are to be treated. The development of the structure of the lexicon and the lexical insertion algorithm has been aided by a system of computer programs that enable the linguist to study transformational grammar. In the course of this development, a computer program to perform lexical insertion was written. Results obtained using this program with fragments of transformational grammar are presented. The paper concludes with suggestions for extensions of this work and a discussion of interpretations of transformational theory that do not fit immediately into our framework. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/103/CS-TR-68-103.pdf %R CS-TR-68-107 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A three-stage variable-shift iteration for polynomial zeros and its relation to generalized Rayleigh iteration %A Jenkins, M. A. %A Traub, Joseph F. %D August 1968 %X We introduce a new three-stage process for calculating the zeros of a polynomial with complex coefficients. The algorithm is similar in spirit to the two-stage algorithms studied by Traub in a series of papers. The algorithm is restriction free, that is, it converges for any distribution of zeros. A proof of global convergence is given. Z eros are calculated in roughly increasing order of magnitude to avoid deflation instability. Shifting is incorporated in a natural and stable way to break equimodularity and speed convergence. The three stages use no shift, a fixed shift, and a variable shift, respectively. To obtain additional insight we recast the problem and algorithm into matrix form. The third stage is inverse iteration with the companion matrix, followed by generalized Rayleigh iteration. A program implementing the algorithm was written in a dialect of ALGOL 60 and run on Stanford University's IBM 360/67. The program has been extensively tested and testing is continuing. For polynomials with complex coefficients and of degrees ranging from 20 to 50, the time required to calculate all zeros averages $8n^2$ milliseconds. Timing information and a numerical example are provided. A description of the implementation, an analysis of the effects of finite-precision arithmetic, an ALGOL 60 program, the results of extensive testing, and a second program which clusters the zeros and provides a posteriori error bounds will appear elsewhere. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/107/CS-TR-68-107.pdf %R CS-TR-68-109 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A computer system for writing and testing transformational grammars: final report %A Friedman, Joyce %D September 1968 %X A comprehensive system for transformational grammar has been designed and is being implemented on the IBM 360/67 computer. The system deals with the transformational model of syntax, along the lines of Chomsky's "Aspects of the Theory of Syntax." The major innovations include a full and formal description of the syntax of a transformational grammar, a directed random phrase structure generator, a lexical insertion algorithm, and a simple problem-oriented programming language in which the algorithm for application of transformations can be expressed. In this paper we present the system as a whole, first discussing the philosophy underlying the development of the system, then outlining the system and discussing its more important special features. References are given to papers which consider particular aspects of the system in detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/109/CS-TR-68-109.pdf %R CS-TR-68-111 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Analysis in transformational grammar %A Friedman, Joyce %A Martner, Theodore S. %D August 1968 %X In generating sentences by means of a transformational grammar, it is necessary to analyze trees, testing for the presence or absence of various structures. This analysis occurs at two stages in the generation process -- during insertion of lexical items (more precisely, in testing contextual features), and during the transformation process, when individual transformations are being tested for applicability. In this paper we describe a formal system for the definition of tree structure of sentences. The system consists of a formal language for partial or complete definition of the tree structure of a sentence, plus an algorithm for comparison of such a definition with a tree. It represents a significant generalization of Chomsky's notion of "proper analysis", and is flexible enough to be used within any transformational grammar which we have seen. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/111/CS-TR-68-111.pdf %R CS-TR-68-112 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T A control language for transformational grammar %A Friedman, Joyce %A Pollack, Bary W. %D August 1968 %X Various orders of application of transformations have been considered in transformational grammar, ranging from unorder to cyclical orders involving notions of "lowest sentence" and of numerical indices on depth of embedding. The general theory of transformational grammar does not yet offer a uniform set of "traffic rules" which are accepted by most linguists. Thus, in designing a model of transformational grammar, it seems advisable to allow the specification of the order and point of application of transformations to be a proper part of the grammar. In this paper we present a simple control language designed to be used by linguists for this specification. In the control language the user has the ability to: 1. Group transformations into ordered sets and apply transformations either individually or by transformation set. 2. Specify the order in which the transformation sets are to be considered. 3. Specify the subtrees in which a transformation set is to be applied. 4. Allow the order of application to depend on which transformations have previously modified the tree. 5. Apply a transformation set either once or repeatedly. In addition, since the control language has been implemented as part of a computer system, the behavior of the transformations may be monitored giving additional information on their operation. In this paper we present the control language and examples of its use. Discussion of the computer implementation will be found in [Pollack, B.W. The Control Program and Associated Subroutines. Stanford University. Computer Science Department. Computational Linguistics Project. Report no. AF-28. June 1968.]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/112/CS-TR-68-112.pdf %R CS-TR-68-110 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T ALGOL W (revised) %A Bauer, Henry R. %A Becker, Sheldon I. %A Graham, Susan L. %A Floyd, Robert W. %A Forsythe, George E. %A Satterthwaite, Edwin H. %D September 1969 %X "A Contribution to the Development of ALGOL" by Niklaus Wirth and C. A. R. Hoare [Comm. ACM, v.9, no. 6 (June 1966), pp. 413-431] was the basis for a compiler developed for the IBM 360 at Stanford University. This report is a description of the implemented language, ALGOL W. Historical background and the goals of the language may be found in the Wirth and Hoare paper. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/110/CS-TR-68-110.pdf %R CS-TR-68-114 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T Calgen - an interactive picture calculus generation system %A George, James E. %D December 1968 %X A sub-set of the Picture Calculus was implemented on the IBM 360/75 to experiment with the proposed data structure, to study the capability of PL/1 for implementing the Picture Calculus and to evaluate the usefulness of drawing pictures with this formalized language. The system implemented is referred to as Calgen. Like many other drawing proggrams, Calgen utilizes a graphic display console; however, it differs from previous drawing systems in one major area, namely, Calgen retains structure information. Since the Picture Calculus is highly structured, Calgen retains structure information, and only scope images where convenient; further, these scope images saved may be altered by changing the structure information. The only reason scope images are saved by Calgen is to avoid regeneration of a previously generated picture. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/114/CS-TR-68-114.pdf %R CS-TR-68-119 %Z Wed, 20 Dec 95 00:00:00 GMT %I Stanford University, Department of Computer Science %T MPL: Mathematical Programming Language %A Bayer, Rudolf %A Bigelow, James H. %A Dantzig, George B. %A Gries, David J. %A McGrath, Michael B. %A Pinsky, Paul D. %A Schuck, Stephen K. %A Witzgall, Christoph %D May 1968 %X The purpose of MPL is to provide a language for writing mathematical programming algorithms that will be easier to write, to read, and to modify than those written in currently available computer languages. It is believed that the writing, testing, and modification of codes for solving large-scale linear programs will be a less formidable undertaking once MPL becomes available. It is hoped that by the Fall of 1968, work on a compiler for MPL will be well underway. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/68/119/CS-TR-68-119.pdf %R CS-TR-67-54 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A generalized Bairstow algorithm %A Golub, Gene H. %A Robertson, Thomas N. %D January 1967 %X This report discusses convergence and applications for the generalized Bairstow algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/54/CS-TR-67-54.pdf %R CS-TR-67-55 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A stopping criterion for polynomial root finding %A Adams, Duane A. %D February 1967 %X When solving for the roots of a polynomial, it is generally difficult to know just when to terminate the iteration process. In this paper an algorithm is derived and discussed which allows one to terminate the iteration process on the basis of calculated bounds for the roundoff error. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/55/CS-TR-67-55.pdf %R CS-TR-67-56 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T QD-method with Newton shift %A Bauer, Friedrich L. %D March 1967 %X Theoretically, for symmetric matrices, a QR-step is equivalent to two successive LR-steps, and the LR-transformation for a tridiagonal matrix is, apart from organizational details, identical with the qd-method. For non-positive definite matrices, however, the LR-transformation cannot be guaranteed to be numerically stable unless pivotal interchanges are made. This has led to preference for the QR-transformation, which is always numerically stable. If, however, some of the smallest or some of the largest eigenvalues are wanted, then the QR-transformation will not necessarily give only these, and bisection might seem too slow with its fixed convergence rate of 1/2. In this situation, Newton's method would be fine if the Newton correction can be computed sufficiently simply, since it will always tend monotonically to the nearest root starting from a point outside the spectrum. Consequently, if one always worked with positive (or negative) definite matrices, there would be no objection to using the now stable qd-algorithm. The report shows that for a qd-algorithm, the Newton correction can very easily be calculated, and accordingly a shift which avoids under-shooting, or a lower bound. Since the last diagonal element gives an upper bound, the situation is quite satisfactory with respect to bounds. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/56/CS-TR-67-56.pdf %R CS-TR-67-57 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The use of transition matrices in compiling %A Gries, David %D March 1967 %X The construction of efficient parsing algorithms for programming languages has been the subject of many papers in the last few years. Techniques for efficient parsing and algorithms which generate the parser from a grammar or phrase structure system have been derived. Some of the well-known methods are the precedence techniques of Floyd, and Wirth and Weber, and the production langauge of Feldman. Perhaps the first such discussion was by Samelson and Bauer. There the concept of the push-down stack was introduced, along with the idea of a transition matrix. A transition matrix is just a switching table which lets one determine from the top element of the stack (denoting a row of the table) and the next symbol of the program to be processed (represented by a column of the table) exactly what should be done. Either a reduction is made in the stack, or the incoming symbol is pushed onto the stack. Considering its efficiency, the transition matrix technique does not seem to have achieved much attention, probably because it was not sufficiently well-defined. The purpose of this paper is to define the concept more formally, to illustrate that the technique is very efficient, and to describe an algorithm which generates a transition matrix from a suitable grammar. The report also describes other uses of transition matrices besides the usual ones of syntax checking and compiling. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/57/CS-TR-67-57.pdf %R CS-TR-67-59 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Almost diagonal matrices with multiple or close eigenvalues %A Wilkinson, James H. %D April 1967 %X If A = D + E where D is the matrix of diagonal elements of A, then when A has some multiple or very close eigenvalues, E has certain characteristic properties. These properties are considered both for hermitian and non-hermitian A. The properties are important in connexion with several algorithms for diagonalizing matrices by similarity transformations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/59/CS-TR-67-59.pdf %R CS-TR-67-60 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Two algorithms based on successive linear interpolation %A Wilkinson, James H. %D April 1967 %X The method of successive linear interpolation has a very satisfactory asymptotic rate of convergence but the behavior in the early steps may lead to divergence. The regular falsi has the advantage of being safe but its asymptotic behavior is unsatisfactory. Two modified algorithms are described here which overcome these weaknesses. Although neither is new, discussions of their main features do not appear to be readily available in the literature. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/60/CS-TR-67-60.pdf %R CS-TR-67-61 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the asymptotic directions of the s-dimensional optimum gradient method %A Forsythe, George E. %D April 1967 %X The optimum s-gradient method for minimizing a positive definite quadratic function f(x) on $E_n$ has long been known to converge for s $\geq$ 1. For these $\underline{s}$ the author studies the directions from which the iterates $x_k$ approach their limit, and extends to s > 1 a theory proved by Akaike for s = 1. It is shown that f($x_k$) can never converge to its minimum value faster than linearly, except in degenerate cases where it attains the minimum in one step. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/61/CS-TR-67-61.pdf %R CS-TR-67-62 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Varying length floating point arithmetic: a necessary tool for the numerical analyst %A Tienari, Martti %D April 1967 %X The traditional floating point arithmetic of scientific computers is biased towards fast and easy production of numerical results without enough provision to enable the programmer to control and solve problems connected with numerical accuracy and cumulative round-off errors. The author suggests the varying length floating point arithmetic as a general purpose solution for most of these problems. Some general philosophies are outlined for applications of this feature in numerical analysis. The idea is analyzed further discussing hardware and software implementations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/62/CS-TR-67-62.pdf %R CS-TR-67-63 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Graeffe's method for eigenvalues %A Polya, George %D April 1967 %X Let an entire function F(z) of finite genus have infinitely many zeros which are all positive, and take real values for real z. Then it is shown how to give two-sided bounds for all the zeros of F in terms of the coefficients of the power series of F, and of coefficients obtained by Graeffe's algorithm applied to F. A simple numerical illustration is given for a Bessel function. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/63/CS-TR-67-63.pdf %R CS-TR-67-64 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Floating-point number representations: base choice versus exponent range %A Richman, Paul L. %D April 1967 %X A digital computer whose memory words are composed of r-state devices is considered. The choice of the base, $\Beta$, for the internal floating-point numbers on such a computer is discussed. Larger values of $\Beta$ necessitate the use of more r-state devices for the mantissa, in order to preserve some "minimum accuracy," leaving fewer r-state devices for the exponent of $\Beta$. As $\Beta$ increases, the exponent range may increase for a short period, but it must ultimately decrease to zero. Of course, this behavior depends on what definition of accuracy is used. This behavior is analyzed for a recently proposed definition of accuracy which specifies when it is to be said that the set of q-digit base $\Beta$ floating-point numbers is accurate to p-digits base t. The only case of practical importance today is t=10 and r=2; and in this case we find that $\Beta$ = 2 is always best. However the analysis is done to cover all cases. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/64/CS-TR-67-64.pdf %R CS-TR-67-65 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T On certain basic concepts of programming languages %A Wirth, Niklaus %D May 1967 %X Recent developments of programming languages have led to the emergence of languages whose growth showed cancerous symptoms: the proliferation of new elements defied every control exercised by the designers, and the nature of the new cells often proved to be incompatible with the existing body. In order that a language be free from such symptoms, it is necessary that it be built upon basic concepts which are sound and mutually independent. The rules governing the language must be simple, generally applicable and consistent. In order that simplicity and consistency can be achieved, the fundamental concepts of a language must be well-chosen and defined with utmost clarity. In practice, it turns out that there exists an optimum in the number of basic concepts, below which not only implementability of these concepts on actual computers, but also their appeal to human intuition becomes questionable because of their high degree of generalization. These informal notes do not abound with ready-made solutions, but it is hoped they shed some light on several related subjects and inherent difficulties. They are intended to summarize and interrelate various ideas which are partly present in existing languages, partly debated within the IFIP Working Group 2.1, and partly new. While emphasis is put on clarification of conceptual issues, consideration of notation cannot be ignored. However, no formal or concise definitions of notation (syntax) will be given or used; the concepts will instead be illustrated by examples, using notation based on Algol as far as possible. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/65/CS-TR-67-65.pdf %R CS-TR-67-67 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computational considerations regarding the calculation of Chebyshev solutions for overdetermined linear eqauation systems by the exchange method %A Bartels, Richard H. %A Golub, Gene H. %D June 1967 %X An implementation, using Gaussian LU decomposition with row interchanges, of Stiefel's exchange algorithm for determining a Chebyshev solution to an overdetermined system of linear equations is presented. The implementation is computationally more stable than those usually given in the literature. A generalization of Stiefel's algorithm is developed which permits the occasional exchange of two equations simultaneously. Finally, some experimental comparisons are offered. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/67/CS-TR-67-67.pdf %R CS-TR-67-69 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Translator writing systems %A Feldman, Jerome A. %A Gries, David %D June 1967 %X Compiler writing has long been a glamour field within programming and has a well developed folklore. More recently, the attention of researchers has been directed toward various schemes for automating different parts of the compiler writer's task. This paper contains neither a history of nor an introduction to these developments; the references at the end of this section provide what introductory material there is in the literature. Although we will make comparisons between individual systems and between various techniques, this is certainly not a consumer's guide to translator writing systems. Our intended purpose is to carefully consider the existing work in an attempt to form a unified scientific basis for future research. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/69/CS-TR-67-69.pdf %R CS-TR-67-70 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T On computation of flow patterns of compressible fluids in the transonic region %A Bergman, Stefan %A Herriot, John G. %A Richman, Paul L. %D July 1967 %X The first task in devising a numerical procedure for solving a given problem is that of finding a constructive mathematical solution to the problem. But even after such a solution is found there is much to be done. Mathematical solutions normally involve infinite processes such as integration and differentiation as well as infinitely precise arithmetic and functions defined in arbitrarily involved ways. Numerical procedures suitable for a computer can involve only finite processes, fixed or at least bounded length arithmetic and rational functions. Thus one must find efficient methods which yield approximate solutions. Of interest here are the initial and boundary value problems for compressible fluid flow. Constructive solutions to these problems can be found in [Bergman, S., "On representation of stream functions of subsonic and supersonic flows of compressible fluids," Journal of Rational Mechanics and Analysis, v.4 (1955), no. 6, pp. 883-905]. As presented there, solution of the boundary value problem is limited to the subsonic region, and is given symbolically as a linear combination of orthogonal functions. A numerical continuation of this (subsonic) solution into the supersonic region can be done by using the (subsonic) solution and its derivative to set up an intial value problem. The solution to the initial value problem may then be valid in (some part of) the supersonic region. Whether this continuation will lead to a closed, meaningful flow is an open question. In this paper, we deal with the numerical solution of the initial value problem. We are currently working on the rest of the procedure described above. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/70/CS-TR-67-70.pdf %R CS-TR-67-75 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Theory of norms %A Bauer, Friedrich L. %D August 1967 %X These notes are based on lectures given during the winter of 1967 as CS 233, Computer Science Department, Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/75/CS-TR-67-75.pdf %R CS-TR-67-68 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The PL360 system %A Wirth, Niklaus %D June 1967 %X This report describes the use and the organization of the operating system which serves as the environment of the PL360 language defined in the companion report, CS 53 [Niklaus Wirth, "A Programming Language for the 360 Computers," Stanford University Department of Computer Science, June 1967]. Edited by Niklaus Wirth. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/68/CS-TR-67-68.pdf %R CS-TR-67-72 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Chebyshev approximation of continuous functions by a Chebyshev system of functions %A Golub, Gene H. %A Smith, Lyle B. %D July 1967 %X The second algorithm of Remez can be used to compute the minimax approximation to a function, f(x), by a linear combination of functions, ${\{Q_i (x)\}}^{N}_{O}$, which form a Chebyshev system. The only restriction on the function to be approximated is that it be continuous on a finite interval [a,b]. An Algol 60 procedure is given which will accomplish the approximation. This implementation of the second algorithm of Remez is quite general in that the continuity of f(x) is all that is required whereas previous implementations have required differentiability, that the end points of the interval be "critical points," and that the number of "critical points" be exactly N+2. Discussion of the method used and its numerical properties is given as well as some computational examples of the use of the algorithm. The use of orthogonal polynomials (which change at each iteration) as the Chebyshev system is also discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/72/CS-TR-67-72.pdf %R CS-TR-67-76 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Collectively compact operator approximations. %A Anselone, Phillip M. %D September 1967 %X This report consists of notes based on lectures presented July-August 1967. The notes were prepared by Lyle Smith. A general approximation theory for linear and nonlinear operators on Banach spaces is presented. It is applied to numerical integration approximations of integral operators. Convergence of the operator approximations is pointwise rather than uniform on bounded sets, which is assumed in other theories. The operator perturbations form a collectively compact set, i.e., they map each bounded set into a single compact set. In the nonlinear case, Frechet differentiability conditions are also imposed. Principal results include convergence and error bounds for approximate solutions and, for linear operators, results on spectral approximations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/76/CS-TR-67-76.pdf %R CS-TR-67-77 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T What to do till the computer scientist comes %A Forsythe, George E. %D September 1967 %X The potential impact of computer science departments in the field of education is discussed. This is an expanded version of a presentation to a panel session before the Mathematics Association of America, Toronto, 30 August 1967. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/77/CS-TR-67-77.pdf %R CS-TR-67-78 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Machine utilization of the natural language word 'good' %A Colby, Kenneth Mark %A Enea, Horace J. %D September 1967 %X Using the term 'good' as an example, the effect of natural language input on an interviewing computer program is described. The program utilizes syntactic and semantic information to generate relevant plausible inferences from which statements for a goal-directed man-machine dialogue can be constructed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/78/CS-TR-67-78.pdf %R CS-TR-67-79 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T 360 O.S. FORTRAN IV free field input/output subroutine package %A Doran, Robert W. %D October 1967 %X Programmers dealing with aspects of natural language processing have a difficult task in choosing a computer language which enables them to program easily, produce efficient code and accept as data freely written sentences with words of arbitrary length. List processing languages such as LISP are reasonably easy to program in but do not execute very quickly. Other, formula oriented, languages like FORTRAN are not provided with free field input. The Computational Linguistics group at the Stanford University Computer Science Department is writing a system for testing transformational grammars. As these grammars are generally large and complicated, it is important to make the system as efficient as possible, so we are using FORTRAN IV (O.S. on IBM 360-65) as our language. To enable us to handle free field input we have developed a subroutine package which we describe here in the hope that it will be useful to others embarking on natural language tasks. The package consists of two main programs, free field reader, free field writer, with a number of utility routines and constant COMMON blocks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/79/CS-TR-67-79.pdf %R CS-TR-67-80 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Directed random generation of sentences %A Friedman, Joyce %D October 1967 %X The problem of producing sentences of a transformational grammar by using a random generator to create phrase structure trees for input to the lexical insertion and transformational phases is discussed. A purely random generator will produce base trees which will be blocked by the transformations, and which are frequently too long to be of practical interest. A solution is offered in the form of a computer program which allows the user to constrain and direct the generation by the simple but powerful device of restricted subtrees. The program is a directed random generator which accepts as input a subtree with restrictions and produces around it a tree which satisfies the restrictions and is ready for the next phase of the grammar. The underlying linguistic model is that of Noam Chomsky, as presented in "Aspects of the Theory of Syntax." The program is written in Fortran IV for the IBM 360/67 and is part of the Stanford Transformational Grammar Testing System. It is currently being used with several partial grammars of English. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/80/CS-TR-67-80.pdf %R CS-TR-67-81 %Z Wed, 03 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Calculation of Gauss quadrature rules %A Golub, Gene H. %A Welsch, John H. %D November 1967 %X Most numerical integration techniques consist of approximating the integrand by a polynomial in a region or regions and then integrating the polynomial exactly. Often a complicated integrand can be factored into a non-negative 'weight' function and another function better approximated by a polynomial, thus $\int_{a}^{b} g(t)dt = \int_{a}^{b} \omega (t)f(t)dt \approx \sum_{i=1}^{N} w_i f(t_i)$. Hopefully, the quadrature rule ${\{w_j, t_j\}}_{j=1}^{N}$ corresponding to the weight function $\omega$(t) is available in tabulated form, but more likely it is not. We present here two algorithms for generating the Gaussian quadrature rule defined by the weight function when: a) the three term recurrence relation is known for the orthogonal polynomials generated by $\omega$(t), and b) the moments of the weight function are known or can be calculated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/67/81/CS-TR-67-81.pdf %R CS-TR-66-34 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Eigenvectors of a real matrix by inverse iteration %A Varah, James M. %D February 1966 %X This report contains the description and listing of an ALGOL 60 program which calculates the eigenvectors of an arbitrary real matrix, using the technique of inverse iteration. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/34/CS-TR-66-34.pdf %R CS-TR-66-37 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T COGENT 1.2 operations manual %A Reynolds, John C. %D April 1966 %X This document is an addendum to the COGENT Programming Manual (Argonne National Laboratory, ANL-7022, March 1965, hereafter referred to as CPM) which describes a specific implementation of the COGENT system, COGENT 1.2, written for the Control Data 3600 Computer. Chapters I and II describe a variety of features available in COGENT 1.2 which are not mentioned in CPM; these chapters parallel the material in Chapters II and III of CPM. Chapter III of this report gives various operational details concerning the assembly and loading of both COGENT-compiled programs and the compiler itself. Chapter IV describes system and error messages. Familiarity with the contents of CPM is assumed throughout this report. In addition, a knowledge of the 3600 operating system SCOPE, and the assembler COMPASS is assumed in Chapter III. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/37/CS-TR-66-37.pdf %R CS-TR-66-39 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A university's educational program in computer science %A Forsythe, George E. %D May 1966 %X After a review of the power of contemporary computers, computer science is defined in several ways. The objectives of computer science education are stated, and it is asserted that in a U.S. university these will be achieved only through a computer science department. The program at Stanford University is reviewed as an example. The appendix includes syllabi of Ph.D. qualifying examinations for Stanford's Computer Science Department. This is a revision of a previous Stanford Computer Science Department report, CS 26. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/39/CS-TR-66-39.pdf %R CS-TR-66-40 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T How do you solve a quadratic equation? %A Forsythe, George E. %D June 1966 %X The nature of the floating-point number system of digital computers is explained to a reader whose university mathematical background is very limited. The possibly large errors in using mathematical algorithms blindly with floating-point computation are illustrated by the formula for solving a quadratic equation. An accurate way of solving a quadratic is outlined. A few general remarks are made about computational mathematics, including the backwards analysis of rounding error. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/40/CS-TR-66-40.pdf %R CS-TR-66-41 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Accurate eigenvalues of a symmetric tri-diagonal matrix %A Kahan, William %D July 1966 %X Having established tight bounds for the quotient of two different lub-norms of the same tri-diagonal matrix J, the author observes that these bounds could be of use in an error-analysis provided a suitable algorithm were found. Such an algorithm is exhibited, and its errors are thoroughly accounted for, including the effects of scaling, over/underflow and roundoff. A typical result is that, on a computer using rounded floating point binary arithmetic, the biggest eigenvalue of J can be computed easily to within 2.5 units in its last place, and the smaller eigenvalues will suffer absolute errors which are no larger. These results are somewhat stronger than had been known before. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/41/CS-TR-66-41.pdf %R CS-TR-66-42 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T When to neglect off-diagonal elements of symmetric tri-diagonal matrices %A Kahan, William %D July 1966 %X Given a tolerance $\epsilon$ > 0, we seek a criterion by which an off-diagonal element of the symmetric tri-diagonal matrix J may be deleted without changing any eigenvalue of J by more than $\epsilon$. The criterion obtained here permits the deletion of elements of order $\sqrt{\epsilon }$ under favorable circumstances, without requiring any prior knowledge about the separation between the eigenvalues of J. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/42/CS-TR-66-42.pdf %R CS-TR-66-43 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Two working algorithms for the eigenvalues of a symmetric tridiagonal matrix %A Kahan, William %A Varah, James M. %D August 1966 %X Two tested programs are supplied to find the eigenvalues of a symmetric tridiagonal matrix. One program uses a square-root-free version of the QR algorithm. The other uses a compact kind of Sturm sequence algorithm. These programs are faster and more accurate than the other comparable programs published previously with which they have been compared. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/43/CS-TR-66-43.pdf %R CS-TR-66-44 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Relaxation methods for an eigenproblem %A Kahan, William %D August 1966 %X A theory is developed to account for the convergence properties of certain relaxation iterations which have been widely used to solve the eigenproblem $(A - \lambda B) \underline{x} = 0, \underline{x} \neq 0, with large symmetric matrices A and B and positive definite B. These iterations always converge, and almost always converge to the right answer. Asymptotically, the theory is essentially that of the relaxation iteration applied to a semi-definite linear system discussed in the author's previous report [Stanford University Computer Science Department report CS45, 1966]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/44/CS-TR-66-44.pdf %R CS-TR-66-45 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Relaxation methods for semi-definite systems %A Kahan, William %D August 1966 %X Certain non-stationary relaxation iterations, which are commonly applied to positive definite symmetric systems of linear equations, are also applicable to a semi-definite system provided that system is consistent. Some of the convergence theory of the former application is herein extended to the latter application. The effects of rounding errors and of inconsistency are discussed too, but with few helpful conclusions. Finally, the application of these relaxation iterations to an indefinite system is shown here to be ill-advised because these iterations will almost certainly diverge exponentially. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/45/CS-TR-66-45.pdf %R CS-TR-66-47 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T An interpreter for "Iverson notation" %A Abrams, Philip S. %D August 1966 %X Kenneth E. Iverson's book, "A Programming Language" [New York: Wiley, 1962], presented a highly elegant language for the description and analysis of algorithms. Although not widely acclaimed at first, "Iverson notation" (referred to as "the language" in this report) is coming to be recognized as an important tool by computer scientists and programmers. The current report contains an up-to-date definition of a subset of the language, based on recent work by Iverson and his colleagues. Chapter III describes an interpreter for the language, written jointly by the author and Lawrence M. Breed of IBM. The remainder of the paper consists of critiques of the implementation and the language, with suggestions for improvement. This report was originally submitted in fulfillment of a Computer Science 239 project supervised by Professor Niklaus Wirth, Stanford University, May 30, 1966. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/47/CS-TR-66-47.pdf %R CS-TR-66-52 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Lecture notes on a course in systems programming %A Shaw, Alan C. %D December 1966 %X These notes are based on the lectures of Professor Niklaus Wirth which were given during the winter and spring of 1965/66 as CS 236a and part of CS 236b, Computer Science Department, Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/52/CS-TR-66-52.pdf %R CS-TR-66-53 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming language for the 360 computers %A Wirth, Niklaus %D December 1966 %X A programming language for the IBM 360 computers and its implementation are described. The language, called PL360, provides the facilities of a symbolic machine language, but displays a structure defined by a recursive syntax. The compiler, consisting of a precedence syntax analyser and a set of interpretation rules with strict one-to-one correspondence to the set of syntactic rules directly reflects the definition of the language. | k-th syntax rule | k-th interpretation rule | $S_0 ::= S_1 S_2 ... S_n$ | $V_0 := f_k (V_1 , V_2 , ... , V_n)$ | PL360 was designed to improve the readability of programs which must take into account specific characteristics and limitations of a particular computer. It represents an attempt to further the state of the art of programming by encouraging and even forcing the programmer to improve his style of exposition and his principles and discipline in program organization, and not by merely providing a multitude of "new" features and facilities. The language is therefore particularly well suited for tutorial purposes. The attempt to present a computer as a systematically organized entity is also hoped to be of interest to designers of future computers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/66/53/CS-TR-66-53.pdf %R CS-TR-65-16 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Maximizing a second-degree polynomial on the unit sphere %A Forsythe, George E. %A Golub, Gene H. %D February 1965 %X Let A be a hermitian matrix of order n, and b a known vector in $C^n$. The problem is to determine which vectors make $\Phi (x) = {(x-b)}^H\ A(x-b)$ a maximum or minimum on the unit sphere U = {x : $x^H$x = 1}. The problem is reduced to the determination of a finite point set, the spectrum of (A,b). The theory reduces to the usual theory of hermitian forms when b = 0. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/16/CS-TR-65-16.pdf %R CS-TR-65-17 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automatic grading programs %A Forsythe, George E. %A Wirth, Niklaus %D February 1965 %X The ALGOL grader programs are presented for the computer evaluation of student ALGOL programs. One is for a beginner's program; it furnishes random data and checks answers. The other provides a searching test of the reliability and efficiency of a rootfinding procedure. There is a statement of the essential properties of a computer system, in order that grader programs can be effectively used. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/17/CS-TR-65-17.pdf %R CS-TR-65-18 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The difference correction method for non-linear two-point boundary value problems %A Pereyra, Victor %D February 1965 %X The numerical solution of non-linear two-point boundary value problems is discussed. It is shown that for a certain class of finite difference approximations the a posteriori use of a difference correction raises the order of the approximation by at least two orders. The difference correction itself involves only the solution of one system of linear equations. If Newton's method is used in the early stage, then it is shown that the matrices in both processes are identical, which is a useful feature in coding the method for an automatic computer. Several numerical examples are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/18/CS-TR-65-18.pdf %R CS-TR-65-23 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Convex polynomial approximation %A Rudin, Bernard D. %D June 1965 %X Let f(t) be a continuous function on [0,1], or let it be discretely defined on a finite point set in [0,1]. The problem is the following: among all polynomials p(t) of degree n or less which are convex on [0,1], find one which minimizes the functional $\l p(t)-f(t)\l$, where $\l\ \l$ is a suitably defined norm (in particular, the $L^p$, ${\ell}^p$, and Chebyshev norms). The problem is treated by showing it to be a particular case of a more general problem: let f be an element of a real normed linear space V; let $x_{1}(z),...,x_{k}(z)$ be continous functions on a subset S of the Euclidean space $E^n$ into V such that for each $z_o$ in S the set {$x_{1}(z_{o}),...,x_{k}(z_{o})$} is linearly independent in V; let $(y_{1},...,y_{k})$ denote an element of the Euclidean space $E^k$ and let H be a subset of $K^k$; then among all (y,z) in H $\times$ S, find one which minimizes the functional $\l y_1\ x_{1}(z)+ ... +y_{k}x_{k}(z) - f\l$. It is shown that solutions to this problem exist when H is closed and S is compact. Conditions for uniqueness and location of solutions on the boundary of H $\times$ S are also given. Each polynomial of degree n + 2 or less which is convex on [0,1] is shown to be uniquely representable in the form $y_{o}+y_{1}t+y_2\ \int\int\ p(z,t)dt^2$, where p(z,t) is a certain representation of the polynomials positive on [0,1], $y_2\ \geq\ 0$, and z is constrained to lie in a certain convex hyperpolyhedron. With this representation, the convex polynomial approximation problem can be treated by the theory mentioned above. It is reduced to a problem of minimizing a functional subject to linear constraints. Computation of best least squares convex polynomial approximation is illustrated in the continuous and discrete cases. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/23/CS-TR-65-23.pdf %R CS-TR-65-25 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Yield-point load determination by nonlinear programming %A Hodge, Philip G. Jr. %D June 1965 %X The determination of the yield-point load of a perfectly plastic structure can be formulated as a nonlinear programming problem by means of the theorems of limit analysis. This formulation is discussed in general terms and then applied to the problem of a curved beam. Recent results in the theory of nonlinear programming are called upon to solve typical problems for straight and curved beams. The theory of limit analysis enables intermediate answers to be given a physical interpretation in terms of upper and lower bounds on the yield-point load. The paper closes with some indication of how the method may be generalized to more complex problems of plastic yield-point load determination. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/25/CS-TR-65-25.pdf %R CS-TR-65-26 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Stanford University's Program in Computer Science %A Forsythe, George E. %D June 1965 %X This report discusses the nature and objectives of Stanford University's Program in Computer Science. Listings of course offerings and syllabi for Ph.D. examinations are given in appendices. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/26/CS-TR-65-26.pdf %R CS-TR-65-28 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Matrix theorems for partial differential and difference equations %A Miller, John J. H. %A Strang, Gilbert %D July 1965 %X We extend the work of Kreiss and Morton to prove: for some constant K(m), where m is the order of the matrix A, $|A^(n)v| \leq C(v)$ for all n $geq$ 0 and |v| = 1 implies that $|{SAS}^{-1}| \leq 1$ for some S with $|S^{-1}| \leq 1$, |Sv| $\leq$ k(m)C(v). We establish the analogue for exponentials $e^{Pt}$, and use it to construct the minimal Hilbert norm dominating $L_2$ in which a given partial differential equation with constant coefficients is well-posed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/28/CS-TR-65-28.pdf %R CS-TR-65-29 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T On improving an approximate solution of a functional equation by deferred corrections %A Pereyra, Victor %D August 1965 %X The improvement of discretization algorithms for the approximate solution of nonlinear functional equations is considered. Extensions to the method of difference corrections by Fox are discussed and some general results are proved. Applications to nonlinear boundary problems and numerical examples are given in some detail. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/29/CS-TR-65-29.pdf %R CS-TR-65-31 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the approximation of weak solutions of linear parabolic equations by a class of multistep difference methods %A Raviart, Pierre Arnaud %D December 1965 %X We consider evolution equations of the form (1) du(t)/dt + A(t)u(t) = f(t), $0 \leq\ t \leq\ T$, f given, with the initial condition (2) u(o) = $u_o$, $u_o$ given, where each A(t) is an unbounded linear operator in a Hilbert space H, which is in practice an ellilptic partial differential operator subject to appropriate boundary conditions. Let $V_h$ be a Hilbert space which depends on the parameter h. Let k be the time-step such that m = $\frac{T}{k}$ is an integer. We approximate the solution u of (1), (2) by the solution $u_{h,k}$ ($u_{h,k}$ = {$u_{h,k}(rk) \in V_{h}$, r = 0,1,...,m-1}) of the multistep difference scheme (3) $\frac{u_{h,k}(rk) - u_{h,k}((r-1)k)}{k} = \sum_{{\ell}=0}^{p} {\gamma}_{\ell} A_{h}((r-{\ell})k) u_{h,k}((r-{\ell}k) = \sum_{{\ell}=0}^{p} {\gamma}_{\ell} f_{h,k}((r-{\ell})k), r = p,...,m-1$ (4) $u_{h,k}(o),...,u_{h,k}((p-1)k)$ given, where each $A_{h}(rk) is a linear continuous operator from $V_h$ into $V_h$, $f_{h,k}(rk)$ (r = 0,1,...,m-1) are given, and ${\gamma}_{\ell}({\ell}=0,...,p) are given complex numbers. Our paper is mainly concerned by the study of the stability of the approximation. The methods used here are very closely related to those developed in the author's thesis and we shall refer to the thesis frequently. In Section 1,2, we define the continuous and approximate problems in precise terms. In Section 4, we find sufficient conditions for $u_{h,k}$ to satisfy some a priori estimates. The definition of the stability is given in Section 5 and we use the a priori estimates for proving a general stability theorem. In Section 6 we prove that the stability conditions may be weakened when A(t) is a self-adjoint operator (or when only the principal part of A(t) is self-adjoint). We give in Section 7 a weak convergence theorem. Section 8 is concerned with regularity properties. We apply our abstract analysis to a class of parabolic partial differential equations with variable coefficients in Section 9. Strong convergence theorems can be obtained as in the author's thesis (via compactness arguments) or as in the thesis of J.P. Aubin. We do not study here the discretization error (see author's thesis). For the study of the stability of multistep difference methods in the case of the Cauchy problem for parabolic differential operators, we refer to Kreiss [1959], Widlund [1965]. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/31/CS-TR-65-31.pdf %R CS-TR-65-32 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Minimum multiplication Fourier analysis %A Hockney, Roger W. %D December 1965 %X Fourier analysis and synthesis is a frequently used tool in applied mathematics but is found to be a time consuming process to apply on a digital computer and this fact may prevent the practical application of the technique. This paper describes an algorithm which uses the symmetries of the sine and cosine functions to reduce the number of arithmetic operations by a factor between 10 and 30. The algorithm is applicable to a finite fourier (or harmonic) analysis on $12 \bigotimes\ 2^q$ values, where q is any integer $\geq$ 0 and is applicable to a variety of end conditions. A complete and tested B5000 Algol program known as FOURIER12 is included. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/32/CS-TR-65-32.pdf %R CS-TR-65-33 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A programming language for the 360 computers %A Wirth, Niklaus %D December 1965 %X This paper is a prelimary definition of a programming language which is specifically designed for use on IBM 360 computers, and is therefore appropriately called PL360. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/33/CS-TR-65-33.pdf %R CS-TR-65-20 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T EULER: a generalization of ALGOL, and its formal definition %A Wirth, Niklaus %A Weber, Helmut %D April 1965 %X A method for defining programming languages is developed which introduces a rigorous relationship between structure and meaning. The structure of a language is defined by a phrase structure syntax, the meaning in terms of the effects which the execution of a sequence of interpretation rules exerts upon a fixed set of variables, called the Environment. There exists a one-to-one correspondence between syntactic rules and interpretation rules, and the sequence of executed interpretation rules is determined by the sequence of corresponding syntactic reductions which constitute a parse. The individual interpretation rules are explained in terms of an elementary and obvious algorithmic notation. A constructive method for evaluating a text is provided, and for certain decidable classes of languages their unambiguity is proven. As an example, a generalization of ALGOL is described in full detail to demonstrate that concepts like block-structure, procedures, parameters etc. can be defined adequately and precisely by this method. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/20/CS-TR-65-20.pdf %R CS-TR-65-21 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Vectorcardiographic analysis by digital computer, selected results %A Fisher, Donald D. %A Groeben, Jobst von der %A Toole, J. Gerald %D May 1965 %X Instrumentation, recording devices and digital computers now may be combined to obtain detailed statistical measures of physiological phenomena. Computers make it possible to study several models of a system in depth as well as breadth. This report is concerned with methods employed in a detailed statistical study of some 600 vectorcardiograms from different "normal" individuals which were recorded on analog magnetic tape using two different orthogonal lead systems (Helm, Frank) giving a total of 1200 cardiograms. A "normal" individual is defined as one in which no abnormal heart condition was detected by either medical history or physical examination. One heartbeat in a train of 15 or 20 was selected for digitization. An average of 1.2 seconds worth of data was digitized from each of the three vector leads simultaneously at a rate of 1000 samples per second for each lead giving a total of over ${4.10}^6$ values. Statistical models by sex and lead system of the P wave and QRS complex (at 1 millisecond intervals) and T wave (normalized to 60 points in time) were obtained for 43 age groups from age 19 to 61 in rectangular coordinates, polar coordinates and ellipsoidal fit (F-test) coordinates. Several programs were written to perform the analyses on an IBM 7090. Two of the programs used 300000+ words of disk storage to collect the necessary statistics. Various aspects of the study are presented in this report. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/65/21/CS-TR-65-21.pdf %R CS-TR-64-6 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A fast direct solution of Poisson's equation using Fourier analysis %A Hockney, Roger W. %D April 1964 %X The demand for rapid procedures to solve Poisson's equation has lead to the development of a direct method of solution involving Fourier analysis which can solve Poisson's equation in a square region covered by a 48 x 48 mesh in 0.9 seconds on the IBM 7090. This compares favorably with the best iterative methods which would require about 10 seconds to solve the same problem. The method is applicable to rectangular regions with simple boundary conditions and the maximum observed error in the potential for several random charge distributions is $5 \times\ 10^{-7}$ of the maximum potential charge in the region. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/64/6/CS-TR-64-6.pdf %R CS-TR-64-9 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The QD-algorithm as a method for finding the roots of a polynomial equation when all roots are positive %A Andersen, Christian %D June 1964 %X The Quotient-Difference (QD)-scheme, symmetric functions and some results from the theory of Hankel determinants are treated. Some well known relations expressing the elements of the QD-scheme by means of the Hankel determinants are presented. The question of convergence of the columns of the QD-scheme is treated. An exact expression for $q_{n}^{k}$ is developed for the case of different roots. It is proved that the columns of the QD-scheme will converge not only in the well known case of different roots, but in all cases where the roots are positive. A detailed examination of the convergence to the smallest root is presented. An exact expression for $q_{n}^{N}$ is developed. This expression is correct in all cases of multiple positive roots. It is shown that the progressive form of the QD-algorithm is only 'mildly unstable'. Finally, some ALGOL programs and some results obtained by means of these are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/64/9/CS-TR-64-9.pdf %R CS-TR-64-11 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Elastic-plastic analysis of trusses by the gradient projection method %A Nakamura, Tsuneyoshi %A Rosen, Judah Ben %D July 1964 %X The gradient projection method has been applied to the problem of obtaining the elastic-plastic response of a perfectly plastic ideal truss with several degrees of redundancy to several independently varying sets of quasi-static loads. It is proved that the minimization of stress rate intensity subject to a set of yield inequalities is equivalent to the maximization process of the gradient projection method. This equivalence proof establishes the basis of the computational method. The technique is applied to the problem of investigating the possibilities of shake down and to limit analysis. A closed convex "safe load domain" is defined to represent the load carrying capacity characteristics of a truss subjected to various combinations of the several sets of loads. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/64/11/CS-TR-64-11.pdf %R CS-TR-64-12 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Numerical methods for solving linear least squares problems (by G. Golub); An Algol procedure for finding linear least squares solutions (by Peter Businger) %A Golub, Gene H. %A Businger, Peter A. %D August 1964 %X A common problem in a Computer Laboratory is that of finding linear least squares solutions. These problems arise in a variety of areas and in a variety of contexts. Linear least squares problems are particularly difficult to solve because they frequently involve large quantities of data, and they are ill-conditioned by their very nature. In this paper, we shall consider stable numerical methods for handling these problems. Our basic tool is a matrix decomposition based on orthogonal Householder transformations. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/64/12/CS-TR-64-12.pdf %R CS-TR-64-13 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Computation of the pseudoinverse of a matrix of unknown rank %A Pereyra, Victor %A Rosen, Judah Ben %D September 1964 %X A program is described which computes the pseudoinverse, and other related quantities, of an m $\times$ n matrix A of unknown rank. The program obtains least square solutions to singular and/or inconsistent linear systems Ax = B, where m $\leq$ n or m > n and the rank of A may be less than min(m,n). A complete description of the programs and its use is given, including computational experience on a variety of problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/64/13/CS-TR-64-13.pdf %R CS-TR-63-2 %Z Fri, 19 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T The solution of large systems of algebraic equations %A Pavkovich, John M. %D December 1963 %X The solution of a system of linear algebraic equations using a computer is not a difficult problem as long as the equations are not ill-conditioned and all of the coefficients can be stored in the computer. However, when the number of coefficients is so large that supplemental means of storage, such as magnetic tape, are required, the problem of solving the system in an efficient manner increases considerably. This paper describes a method of solution whereby such systems of equations can be solved in an efficient manner. The problems associated with ill-conditioned systems of equations are not discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/63/2/CS-TR-63-2.pdf %R CS-TR-96-1563 %Z Wed, 21 Feb 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Database Research: Achievements and Opportunities into the 21st Century %A Silberschatz, Avi %A Stonebraker, Michael %A Ullman, Jeffrey D. %D February 1996 %X In May, 1995 an NSF workshop on the future of database management systems research was convened. This paper reports the conclusions of that meeting. Among the most important directions for future DBMS research recommended by the panel are: support for multimedia objects; managing distributed and loosely coupled information, as on the world-wide web; supporting new database applications such as data mining and warehousing; workflow and other complex transaction-management problems, and enhancing the ease-of-use of DBMS's for both users and system managers. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1563/CS-TR-96-1563.pdf %R CS-TR-96-1564 %Z Fri, 01 Mar 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Medical Applications of Neural Networks: Connectionist Models of Survival %A Ohno-Machado, Lucila %D March 1996 %X Although neural networks have been applied to medical problems in recent years, their applicability has been limited for a variety of reasons. One of those barriers has been the problem of recognizing rare categories. In this dissertation, I demonstrate, and prove the utility of, a new method for tackling this problem. In particular, I have developed a method that allows the recognition of rare categories with high sensitivity and specificity, and will show that it is practical and robust. This method involves the construction of sequential neural networks. Rare categories occur and must be learned if practical application of neural-network technology is to be achieved. Survival analysis is one area in which this problem appears. In this work, I test the hypotheses that (1) sequential systems of neural networks produce results that are more accurate (in terms of calibration and resolution) than nonsequential neural networks; and (2) in certain circumstances, sequential neural networks produce more accurate estimates of survival time than Cox proportional hazards and logistic regression models. I use two sets of data to test the hypotheses: (1) a data set of HIV+ patients; and (2) a data set of patients followed prospectively for the development of cardiac conditions. I show that a neural network model can predict death due to AIDS more accurately than a Cox proportional hazards model. Furthermore, I show that a sequential neural network model is more accurate than a standard neural network model. I show that the predictions of logistic regression and neural networks are not significantly different, but that any of these models used sequentially is more accurate than its standard counterpart. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1564/CS-TR-96-1564.pdf %R CS-TR-96-1568 %Z Wed, 10 Apr 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Algorithms for computing intersection and union of toleranced polygons with applications %A Cazals, Frederic %A Ramkumar, G. D. S. %D April 1996 %X Since mechanical operations are performed only up to a certain precision, the geometry of parts involved in real life products is never known precisely. Nevertheless, operations on toleranced objects have not been studied extensively. In this paper, we initiate a study of the analysis of the union and intersection of toleranced simple polygons. We provide a practical and efficient algorithm that stores in an implicit data structure the information necessary to answer a request for specific values of the tolerances without performing a computation from scratch. If the polygons are of sizes m and n, and s is the number of intersections between edges occuring for all the combinations of tolerance values, the pre-processed data structure takes O(s) space and the algorithm that computes a union/intersection from it takes O((n+m) log(s) + k' + k log(k)) time where k is the number of vertices of the union/intersection and k <= k' <= s. Although the algorithm is not output sensitive, we show that the expectations of k and k' remain within a constant factor tau, a function of the input geometry. Finally, we list interesting applications of the algorithms related to feasibility of assembly and assembly sequencing of real assemblies. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1568/CS-TR-96-1568.pdf %R CS-TR-96-1569 %Z Thu, 25 Apr 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Using Automatic Abstraction for Problem-Solving and Learning %A Unruh, Amy %D April 1996 %X Abstraction is a powerful tool for controlling search combinatorics. This research presents a framework for automatic abstraction planning, and a family of associated abstraction methods, called SPATULA. The framework provides a structure within which different parameterized methods for automatic abstraction can be instantiated to generate abstraction planning behavior, and provides an integrated environment for abstract problem-solving and learning. A core idea underlying the abstraction techniques is that abstraction can arise as an obviation response to impasses in planning. Abstraction is performed at problem-solving time with respect to impasses in the current problem context, and thus the planner generates abstractions in response to specific situations. This approach is used to reduce the cost of lookahead evaluation searches, by performing abstract search in problem spaces which are automatically abstracted from the ground spaces during search. New search control rules are learned during abstract search; they constitute an abstract plan used in future situations, and produce an emergent multi-level abstraction behavior. The abstraction method has been implemented and evaluated. It has been shown to: reduce planning time, while still yielding good solutions; reduce learning time; and increase the effectiveness of learned rules by enabling them to transfer more widely. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1569/CS-TR-96-1569.pdf %R CS-TR-96-1565 %Z Thu, 04 Apr 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Formal Model for Bridging Heterogeneous Relational Databases in Clinical Medicine %A Sujansky, Walter %D April 1996 %X This document describes the results of my thesis research, which focused on developing a standard query interface to heterogenous clinical databases. The high-level goal of this work was to *insulate* the developers of clinical computer applications from the implementation details of clinical databases, thereby facilitating the *sharing* of clinical computer applications across institutions with different database implementations. Most clinical databases store information about patients' diagnoses, laboratory results, medication orders, drug allergies, and demographic background. These data are valuable as the inputs to computer applications that provide real-time decision support, monitor the quality of care, and analyze data for research purposes. Clinical databases at different institutions, however, vary significantly in the way the databases model, represent, and retrieve clinical data. This database heterogeneity makes it impossible for a single computer application to retrieve data from the clinical databases of various institutions because the database queries included in the application must be formulated differently for each institution. Therefore, database heterogeneity makes it difficult to share computer applications across institutions with different database implementations. In my work, I have developed an *abstract* model of clinical data and an *abstract* query language that allow the developers of computer applications to formulate queries independently of the institution-specific features of clinical databases. I have also developed a database mapping language and a formal query-translation method that automatically translate the abstract queries that appear in applications into equivalent institution-specific queries. This framework ostensibly allows copies of a single computer application to be distributed to multiple institutions and to be customized automatically at each of the institutions such that the queries in each copy of the application can retrieve data from the local clinical database. This dissertation formally describes the abstract data model, the abstract query language, the mapping language, and the translation algorithm. It also presents the results of a formal evaluation that I performed to assess the feasibility and utility of this approach for sharing clinical computer applications. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1565/CS-TR-96-1565.pdf %R CS-TR-96-1566 %Z Tue, 09 Apr 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Clocked Transition Systems %A Manna, Z ohar %A Pnueli, Amir %D April 1996 %X This paper presents a new computational model for real-time systems, called the clocked transition system model. The model is a development of our previous timed transition model, where some of the changes are inspired by the model of timed automata. The new model leads to a simpler style of temporal specification and verification, requiring no extension of the temporal language. For verifying safety properties, we present a run-preserving reduction from the new real-time model to the untimed model of fair transition systems. This reduction allows the (re)use of safety verification methods and tools, developed for untimed reactive systems, for proving safety properties of real-time systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1566/CS-TR-96-1566.pdf %R CS-TR-96-1570 %Z Wed, 29 May 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Optimization of SQL Queries for Parallel Machines %A Hasan, Waqar %D May 1996 %X Parallel execution offers a method for reducing the response time of queries against large databases. We address the problem of parallel query optimization: Given a declarative SQL query, find a procedural parallel plan that delivers the query result in minimal time. We develop optimization algorithms using models that incorporate both sources and obstacles to speedup. We address independent, pipelined and partitioned parallelism. We incorporate inherent constraints on available parallelism and the extra cost of parallel execution. Our models are motivated by experiments with NonStop SQL, a commercial parallel DBMS. We adopt a two-phase approach to parallel query optimization: JOQR (join ordering and query rewrite), followed by parallelization. JOQR minimizes total work. Then, parallelization spreads work among processors to minimize response time. For JOQR, we model communication costs and abstract physical characteristics of data as colors. We devise tree coloring and reordering algorithms that are efficient and optimal. We model parallelization as scheduling a tree whose nodes represent operators and edges represent parallel/precedence constraints. Computation/communication costs are represented as node/edge weights. We prove worst-case bounds on the performance ratios of our algorithms and measure average cases using simulation. Our results enable the construction of SQL compilers that effectively exploit parallel machines. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1570/CS-TR-96-1570.pdf %R CS-TR-96-1567 %Z Tue, 30 Apr 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Synthesis of Reactive Programs %A Anuchitanukul, Anuchit %D April 1996 %X We study various problems of synthesizing reactive programs. A reactive program is a program whose behaviors are not merely functional relationships between inputs and outputs, but sequences of actions as well as interactions between the program and its environment. The goal of program synthesis in general is to find an implementation of a program such that the behaviors of the implementation satisfy a given specification. The reactive behaviors that we study are omega-regular infinite sequences and regular finite sequences. The domain of the implementation is (finite) transition systems for closed system synthesis, and transition system modules for open system synthesis. We consider various solutions, e.g. basic, maximal, modular and exact, for any particular subclasses of the implementation language and investigate how characteristics of the program such as fairness, number of processes and composition operations, affect the synthesis algorithm. In addition to the automata-theoretic algorithms, we give a synthesis algorithm which synthesizes a program directly from the linear-time temporal logic ETL. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1567/CS-TR-96-1567.pdf %R CS-TR-96-1571 %Z Mon, 10 Jun 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Formal Verification of Performance and Reliability of Real-Time Systems %A DeAlfaro, Luca %D June 1996 %X In this paper we propose a methodology for the specification and verification of performance and reliability properties of real-time systems within the framework of temporal logic. The methodology is based on the system model of stochastic real-time systems (SRTSs), and on branching-time temporal logics that are extensions of the probabilistic logics pCTL and pCTL*. SRTSs are discrete-time transition systems that can model both probabilistic and nondeterministic behavior. The specification language extends the branching-time logics pCTL and pCTL* by introducing an operator to express bounds on the average time between events. We present model-checking algorithms for the algorithmic verification of system specifications, and we discuss their complexity. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1571/CS-TR-96-1571.pdf %R CS-TR-96-1572 %Z Tue, 18 Jun 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Caching and Non-Horn Inference in Model Elimination Theorem Provers %A Geddis, Donald F. %D June 1996 %X Caching in an inference procedure holds the promise of replacing exponential search with constant-time lookup, at a cost of slightly-increased overhead for each node expansion. Caching will be useful if subgoals are repeated often enough during proofs. In experiments on solving queries using a backward chainer on Horn theories, caching appears to be very helpful on average. When trying to extend this success to first-order theories, however, intuition suggests that subgoal caches are no longer useful. The cause is that complete first-order backward chaining requires goal-goal resolutions in addition to resolutions with the database, and this introduces a context-sensitivity into the proofs for a subgoal. A cache is only feasible if the solutions are independent of context, so that they may be copied from one part of the space to another. It is shown here that a full exploration of a subgoal in one context actually provides complete information about the solutions to the same subgoal in all other contexts of the proof. In a straightforward way, individual solutions from one context may be copied over directly. More importantly, non-Horn failure caching is also feasible, so no additional solutions in the new context (that might affect the query) are possible and therefore there is no need to re-explore the space in the new context. Thus most Horn clause caching schemes may be used with minimal changes in a non-Horn setting. In addition, a new Horn clause caching scheme is proposed: postponement caching. This new scheme involves exploring the inference space as a graph instead of as a tree, so that a given literal will only occur once in the proof space. Despite the previous extension of failure caching to non-Horn theories, postponement caching is incomplete in the non-Horn case. A counterexample is presented, and possible enhancements to reclaim completeness are investigated. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1572/CS-TR-96-1572.pdf %R CS-TR-96-1573 %Z Wed, 17 Jul 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Depth Discontinuities by Pixel-To-Pixel Stereo %A Birchfield, Stan %A Tomasi, Carlo %D July 1996 %X This report describes a two-pass binocular stereo algorithm that is specifically geared towards the detection of depth discontinuities. In the first pass, introduced in part I of the report, stereo matching is performed independently on each epipolar pair for maximum efficiency. In the second pass, described in part II, disparity information is propagated between the scanlines. Part I. Our stereo algorithm explicitly matches the pixels in the two images, leaving occluded pixels unpaired. Matching is based upon intensity alone without utilizing windows. Since the algorithm prefers piecewise constant disparity maps, it sacrifices depth accuracy for the sake of crisp boundaries, leading to precise localization of the depth discontinuities. Three features of the algorithm are worth noting: (1) unlike most stereo algorithms, it does not require texture throughout the images, making it useful in unmodified indoor settings, (2) it uses a measure of pixel dissimilarity that is provably insensitive to sampling, and (3) it prunes bad nodes during the search, resulting in a running time that is faster than that of standard dynamic programming. Part II. After the scanlines are processed independently, the disparity map is postprocessed, leading to more accurate disparities and depth discontinuities. Both the algorithm and the postprocessor are fast, producing a dense disparity map in about 1.5 microseconds per pixel per disparity on a workstation. Results on five stereo pairs are given. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1573/CS-TR-96-1573.pdf %R CS-TR-96-1574 %Z Wed, 04 Sep 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Effective Remote Modeling in Large-Scale Distributed Simulation and Visualization Environments %A Singhal, Sandeep K. %D September 1996 %X A Distributed Interactive Simulation provides the illusion of a single, coherent virtual world to a group of users located at different machines connected by a network. Networked virtual environments are used for multiplayer video games, military and industrial training, and collaborative engineering. Network bandwidth, network latency, and host processing power limit the achievable size and detail of future simulations. This thesis describes network protocols and algorithms to support "remote modeling," allowing a host to model and render remote entities in large-scale distributed simulations. These techniques require fewer network resources and support more entity types than previous approaches. The Position History-Based Dead Reckoning (PHBDR) protocol provides accurate remote position modeling and minimizes dependencies on network performance and entity representation. PHBDR is a foundation for three protocols which model entity orientation, entity structural change, and entity groups. This thesis shows that a simple, efficient protocol can provide smooth, accurate remote position modeling and that it can be applied recursively to support entity orientation, structure, and aggregation at multiple levels of detail; these protocols offer performance and costs that are competitive with more complex and application-specific approaches, while providing simpler analyses of behavior by exploiting this recursive structure. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1574/CS-TR-96-1574.pdf %R CS-TR-96-1576 %Z Fri, 06 Dec 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Query Reformulation under Incomplete Mappings %A Huyn, Nam %D December 1996 %X This paper focuses on some of the important new translatability issues that arise in the problem of interoperation between two database schemas when mappings between these schemas are inherently more complex than traditional views or pure Datalog programs can capture. In many cases, sources cannot be redesigned, and mappings among them exhibit some form of incompleteness under which the question of whether a query can be translated across different schemas is not immediately obvious. The notion of query we consider here is the traditional one, in which the answers to a query are required to be definite: answers cannot be disjunctive or conditional and must refer only to domain constants. In this paper, mappings are modeled by Horn programs that allow existential variables, and queries are modeled by pure Datalog programs. We then consider the problem of eliminating functional terms from the answers to a Horn query where function symbols are allowed. We identify a class of Horn queries called "term-bounded" that are equivalent to pure Datalog queries. We present an algorithm that rewrites a term-bounded query into an "equivalent" pure Datalog query. Equivalence is defined here as yielding the same function-free answer. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1576/CS-TR-96-1576.pdf %R CS-TR-96-1575 %Z Wed, 09 Oct 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T Routing and Admission Control in General Topology Networks with Poisson Arrivals %A Kamath, Anil %A Palmon, Omri %A Plotkin, Serge %D October 1996 %X Emerging high speed networks will carry traffic for services such as video-on-demand and video teleconferencing -- that require resource reservation along the path on which the traffic is sent. High bandwidth-delay product of these networks prevents circuit rerouting, i.e. once a circuit is routed on a certain path, the bandwidth taken by this circuit remains unavailable for the duration (holding time) of this circuit. As a result, such networks will need effective routing and admission control strategies. Recently developed online routing and admission control strategies have logarithmic competitive ratios with respect to the admission ratio (the fraction of admitted circuits). Such guarantees on performance are rather weak in the most interesting case where the rejection ratio of the optimum algorithm is very small or even 0. Unfortunately, these guarantees can not be improved in the context of the considered models, making it impossible to use these models to identify algorithms that are going to perform well in practice. In this paper we develop routing and admission control strategies for a more realistic model, where the requests for virtual circuits between any two points arrive according to a Poisson process and where the circuit holding times are exponentially distributed. Our model is close to the one that was developed to analyse and tune the (currently used) strategies for managing traffic in long-distance telephone networks. We strengthen this model by assuming that the rates of the Poisson processes (the ``traffic matrix'') are unknown to the algorithm and are chosen by the adversary. Our strategy is competitive with respect to the expected rejection ratio. More precisely, it achieves expected rejection ratio of at most R+epsilon, where R is the optimum expected rejection ratio. The expectations are taken over the distribution of the request sequences, and epsilon=Sqrt(r log n), where r is the maximum fraction of an edge bandwidth that can be requested by a single circuit. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1575/CS-TR-96-1575.pdf %R CS-TR-96-1577 %Z Tue, 17 Dec 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T A More Aggressive Use Of Views To Extract Information %A Huyn, Nam %D December 1996 %X Much recent work has focussed on using views to evaluate queries. More specifically, queries are rewritten to refer to views instead of the base relations over which the queries were originally written. The motivation is that the views represent the only ways in which some information source may be accessed. Another use of views that has been overlooked becomes important especially when no equivalent rewriting of a query in terms of views is possible: even though we cannot use the views to get all the answers to the query, we can still use them to deduce as many answers as possible. In many global information applications, the notion of equivalence used is often too restrictive. We propose a notion of pseudo-equivalence that allows more queries to be rewritten usefully: we show that if a query has an equivalent rewriting, the query also has a pseudo-equivalent rewriting. The converse is not true in general. In particular, when the views are conjunctive, we show that all Datalog queries over the source do have a pseudo-equivalent Datalog query over the views. We reduce the problem of finding pseudo-equivalent queries to that of rewriting Horn queries with Skolem functions as Datalog queries. We present an algorithm for the class of term-bounded Horn queries. We discuss extending the problem to larger classes of Horn queries, other non-Horn queries that result from ``inverting'' Datalog views and adding functional dependencies. The theory and methods developed in our work have important uses in query mediation between heterogeneous sources, automatic join discovery and view updates. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1577/CS-TR-96-1577.pdf %R CS-TR-96-1578 %Z Tue, 17 Dec 96 00:00:00 GMT %I Stanford University, Department of Computer Science %T State Reduction Methods for Automatic Formal Verification %A Ip, C. Norris %D December 1996 %X Validation of industrial designs is becoming more challenging as technology advances. One of the most suitable debugging aids is automatic formal verification. This thesis presents several techniques for reducing the state explosion problem, that is, reducing the number of states that are examined. A major contribution of this thesis is the design of simple extensions to the Murphi description language, which enable us to convert two existing abstraction strategies into two fully automatic algorithms, making these strategies easy to use and safe to apply. These two algorithms rely on two facts about high-level designs: they frequently exhibit structural symmetry, and their behavior is often independent of the exact number of replicated components they contain. Another contribution is the design of a new state reduction algorithm, which relies on reversible rules (transitions that do not lose information) in a system description. This new reduction algorithm can be used simultaneously with the other two algorithms. These techniques, implemented in the Murphi verification system, have been applied to many applications, such as cache coherence protocols and distributed algorithms. In the cases of two important classes of infinite systems, infinite state graphs can be automatically converted to small finite state graphs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/96/1578/CS-TR-96-1578.pdf %R CS-TR-97-1580 %Z Thu, 09 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T STARTS: Stanford Protocol Proposal for Internet Retrieval and Search %A Gravano, Luis %A Chang, Kevin %A Garcia-Molina, Hector %A Paepcke, Andreas %D January 1997 %X Document databases are available everywhere, both within the internal networks of the organizations and on the Internet. The database contents are often "hidden" behind search interfaces. These interfaces vary from database to database. Also, the algorithms with which the associated search engines rank the documents in the query results are usually incompatible across databases. Even individual organizations use search engines from different vendors to index their internal document collections. These organizations could benefit from unified query interfaces to multiple search engines, for example, that would give users the illusion of a single big document database. Building such "metasearchers" is nowadays a hard task because different search engines are largely incompatible and do not allow for interoperability. To improve this situation, the Digital Library project at Stanford has coordinated among search-engine vendors and other key players to reach informal agreements for unifying basic interactions in these three areas. This is the final writeup of our informal "standards" effort. This draft is based on feedback from people from Excite, Fulcrum, GILS, Harvest, Hewlett-Packard Laboratories, Infoseek, Microsoft Network, Netscape, PLS, Verity, and WAIS, among others. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1580/CS-TR-97-1580.pdf %R CS-TR-97-1581 %Z Thu, 09 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Towards Interoperability in Digital Libraries: Overview and Selected Highlights of the Stanford Digital Library Project %A Paepcke, Andreas %A Cousins, Steve B. %A Garcia-Molina, Hector %A Hassan, Scott W. %A Ketchpel, Steven K. %A Roscheisen, Martin %A Winograd, Terry %D January 1997 %X We outline the scope of the Stanford Digital Library Project which covers five areas: user interface work, technologies for locating information and library services, the emerging economic perspective of digital libraries, infrastructure technology and the use of agent technologies to support all of these aspects. We describe technical details for two specific efforts that have been realized in prototype implementions. First, we describe how we employ distributed object technology to move towards an implementation of our InfoBus vision. The InfoBus consists of translation services and wrappers around existing protocols to cope with the problem of interoperability and the distributed nature of emerging digital library services. We model autonomous, heterogeneous library services as CORBA proxy objects. This allows the construction of unified but extensible method-based interfaces for client programs to interact through. We describe how distributed objects enable the design of communication protocols that leave implementors a large degree of freedom. This is a benefit because the resulting implementations can allow users to choose among multiple performance profile tradeoffs while staying within the confines of the protocol. The second effort we cover describes InterPay which uses the object approach for an architecture that helps manage heterogeneity in payment mechanisms among autonomous services. The architecture is organized into three layers. The top layer contains elements involved in the task-level interaction with the services. The middle layer is responsible for enforcing user-specified payment policies. The lowest layer manages the mechanics of diverse online payment schemes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1581/CS-TR-97-1581.pdf %R CS-TR-97-1582 %Z Thu, 09 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Shared Web Annotations as a Platform for Third-Party Value-Added, Information Providers: Architecture, Protocols, and Usage Examples %A Roscheisen, Martin %A Mogensen, Christian %A Winograd, Terry %D January 1997 %X In this paper, we present an architecture, called "ComMentor", which provides a platform for third-party providers of lightweight super-structures to material provided by conventional content providers. It enables people to share structured in-place annotations about arbitrary on-line documents. The system is part of a general "virtual document" architecture ("PCD BRIO") in which--with the help of lightweight distributed meta information--documents are dynamically synthesized from distributed sources depending on the user context and the meta-information which has been attached to them. The meta-information is managed independently of the documents themselves on separate meta-information servers, both in terms of storage and authority. A wide range of useful scenarios can be readily realized on this platform. We give examples of how a more personalized content presentation can be achieved by leveraging the database storage of the uniform meta-information and generating documents dynamically for a particular user perspective. These include structured discussion about paper drafts, collaborative filtering, seals of approval, tours, shared "hotlists" with section-based visibility control, usage indicators, co-presence, and value-added trails. Our object model and request interface for the prototype implementation are defined in technical detail in the appendix. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1582/CS-TR-97-1582.pdf %R CS-TR-97-1583 %Z Thu, 09 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Boolean Query Mapping Across Heterogeneous Information Sources (Extended Version) %A Chang, Kevin Chen-Chuan %A Garcia-Molina, Hector %A Paepcke, Andreas %D January 1997 %X Searching over heterogeneous information sources is difficult because of the non-uniform query languages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that may return extra documents. The results are then processed by a filter query to yield the correct final result. In this paper we introduce the architecture and associated algorithms for generating the supported subsuming queries and filters. We show that generated subsuming queries return a minimal number of documents; we also discuss how minimal cost filters can be obtained. We have implemented prototype versions of these algorithms and demonstrated them on heterogeneous Boolean systems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1583/CS-TR-97-1583.pdf %R CS-TR-97-1584 %Z Thu, 09 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Grassroots: a System Providing a Uniform Framework for Communicating, Structuring, Sharing Information, and Organizing People %A Kamiya, Kenichi %A Roscheisen, Martin %A Winograd, Terry %D January 1997 %X People keep pieces of information in diverse collections such as folders, hotlists, e-mail inboxes, newsgroups, and mailing lists. These collections mediate various types of collaborations including communicating, structuring, sharing information, and organizing people. Grassroots is a system that provides a uniform framework to support people's collaborative activities mediated by collections of information. The system seamlessly integrates functionalities currently found in such disparate systems as e-mail, newsgroups, shared hotlists, hierarchical indexes, hypermail, etc. Grassroots co-exists with these systems in that its users benefit from the uniform image provided by Grassroots, but other people can continue using other mechanisms, and Grassroots leverages from them. The current Grassroots prototype is based on an http-proxy implementation, and can be used with any Web browser. In the context of the design of a next-generation version of the Web, Grassroots demonstrates the utility of a uniform notification infrastructure. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1584/CS-TR-97-1584.pdf %R CS-TR-97-1585 %Z Thu, 09 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Techniques and Tools for Making Sense out of Heterogeneous Search Service Results %A Baldonado, Michelle Q. Wang %A Winograd, Terry %D January 1997 %X We describe a set of techniques that allows users to interact with results at a higher level than the citation level, even when those results come from a variety of heterogeneous on-line search services. We believe that interactive result analysis allows users to "make sense" out of the potentially many results that may match the constraints they have supplied to the search services. The inspiration for this approach comes from reference librarians, who do not respond to patrons' questions with lists of citations, but rather give high-level answers that are tailored to the patrons' needs. We outline here the details of the methods we employ in order to meet our goal of allowing for dynamic, user-directed abstraction over result sets, as well as the prototype tool (SenseMaker) we have built based upon these techniques. We also take a brief look at the more general theory that underlies the tool, and hypothesize that it is applicable to flexible duplicate detection as well. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1585/CS-TR-97-1585.pdf %R CS-TR-97-1579 %Z Wed, 08 Jan 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T From the Valley of Heart's Delight to Silicon Valley: A Study of Stanford University's Role in the Transformation %A Tajnai, Carolyn %D January 1997 %X This study examines the role of Stanford University in the transformation from the Valley of Heart's Delight to the Silicon Valley. At the dawn of the Twentieth Century, California's Santa Clara County was an agricultural paradise. Because of the benign climate and thousands of acres of fruit orchards, the area became known as the Valley of Heart's Delight. In the early 1890's, Leland and Jane Stanford donated land in the valley to build a university in memory of their son. Thus, Leland Stanford, Jr., University was founded. In the early 1930's, there were almost no jobs for young Stanford engineering graduates. This was about to change. Although there was no organized plan to help develop the economic base of the area around Stanford University, the concern about the lack of job opportunities for their graduates motivated Stanford faculty to begin the chain of events that led to the birth of Silicon Valley. Stanford University's role in the transformation of the Valley of Heart's Delight into Silicon Valley is history, but it is enduring history. Stanford continues to effect the local economy by spawning new and creative ideas, dreams, and ambitions. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1579/CS-TR-97-1579.pdf %R CS-TR-97-1586 %Z Mon, 24 Feb 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Construction of a Three-dimensional Geometric Model for Segmentation and Visualization of Cervical Spine Images %A Pichumani, Ramani %D February 1997 %X This report introduces a new technique for automatically extracting vertebral segments from three-dimensional computerized tomography (CT) and magnetic resonance (MR) images of the human cervical spine. An important motivation for this work is to provide accurate information for registration and for fusion of CT and MR images into a composite three-dimensional image. One of the major hurdles in performing image fusion is the difficulty of extracting and matching corresponding anatomical regions in an accurate, robust, and timely manner. The complementary properties of soft and bony tissues revealed in CT and MR imaging modalities makes it challenging to extract corresponding regions that can be correlated in an accurate and robust manner. Ambiguities in the images due to noise, distortion, limited resolution, and patient-specific structural variations also create additional challenges. Whereas fusion of CT and MR images of the cranium have already been performed, no one has yet developed an automated technique for fusing multimodality images of the spine. Unlike the head, which is relatively rigid, the spine is a complex, articulating object and is subject to structural deformation throughout the multimodal scanning process. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1586/CS-TR-97-1586.pdf %R CS-TR-97-1587 %Z Mon, 24 Mar 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Ensembles for Supervised Classification Learning %A Matan, Ofer %D March 1997 %X This dissertation studies the use of multiple classifiers (ensembles or committees) in learning tasks. Both theoretical and practical aspects of combining classifiers are studied. First we analyze the representational ability of voting ensembles. A voting ensemble may perform either better or worse than each of its individual members. We give tight upper and lower bounds on the classification performance of a voting ensemble as a function of the classification performances of its individual members. Boosting is a method of combining multiple "weak" classifiers to form a "strong" classifier. Several issues concerning boosting are studied in this thesis. We study SBA, a hierarchical boosting algorithm proposed by Schapire, in terms of its representation and its search. We present a rejection boosting algorithm that trades-off exploration and exploitation: It requires fewer pattern labels at the expense of lower boosting ability. Ensembles may be useful in gaining information. We study their use to minimize labeling costs of data and to enable improvements on performance over time. For that purpose a model for on-site learning is presented. The system learns by querying "hard" patterns while classifying "easy" ones. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1587/CS-TR-97-1587.pdf %R CS-TR-97-1588 %Z Tue, 08 Apr 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Systems of Bilinear Equations %A Cohen, Scott %A Tomasi, Carlo %D April 1997 %X How hard is it to solve a system of bilinear equations? No solutions are presented in this report, but the problem is posed and some preliminary remarks are made. In particular, solving a system of bilinear equations is reduced by a suitable transformation of its columns to solving a homogeneous system of bilinear equations. In turn, the latter has a nontrivial solution if and only if there exist two invertible matrices that, when applied to the tensor of the coefficients of the system, zero its first column. Matlab code is given to manipulate three-dimensional tensors, including a procedure that finds one solution to a bilinear system often, but not always. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1588/CS-TR-97-1588.pdf %R CS-TR-97-1589 %Z Thu, 17 Apr 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Learning Action Models for Reactive Autonomous Agents %A Benson, Scott Sherwood %D April 1997 %X To be maximally effective, autonomous agents such as robots must be able both to react appropriately in dynamic environments and to plan new courses of action in novel situations. Reliable planning requires accurate models of the effects of actions---models which are often more appropriately learned through experience than designed. This thesis describes TRAIL (Teleo-Reactive Agent with Inductive Learning), an integrated agent architecture which learns models of actions based on experiences in the environment. These action models are then used to create plans that combine both goal-directed and reactive behaviors. Previous work on action-model learning has focused on domains that contain only deterministic, atomic action models that explicitly describe all changes that can occur in the environment. The thesis extends this previous work to cover domains that contain durative actions, continuous variables, nondeterministic action effects, and actions taken by other agents. Results have been demonstrated in several robot simulation environments and the Silicon Graphics, Inc. flight simulator. The main emphasis in this thesis is on the action-model learning process within TRAIL. The agent begins the learning process by recording experiences in its environment either by observing a trainer or by executing a plan. Second, the agent identifies instances of action success or failure during these experiences using a new analysis demonstrating nine possible causes of action failure. Finally, a variant of the Inductive Logic Programming algorithm DINUS is used to induce action models based on the action instances. As the action models are learned, they can be used for constructing plans whose execution contributes to additional learning experiences. Diminishing reliance on the teacher signals successful convergence of the learning process. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1589/CS-TR-97-1589.pdf %R CS-TR-97-1590 %Z Tue, 24 Jun 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Complexity Measures for Assembly Sequences %A Goldwasser, Michael %D June 1997 %X Our work focuses on various complexity measures for two-handed assembly sequences. For many products, there exist an exponentially large set of valid sequences, and a natural goal is to use automated systems to select wisely from the choices. Although there has been a great deal of algorithmic success for finding feasible assembly sequences, there has been very little success towards optimizing the costs of sequences. We attempt to explain this lack of progress, by proving the inherent difficulty in finding optimal, or even near-optimal, assembly sequences. To begin, we define, "virtual assembly sequencing", a graph-theoretic problem that is a generalization of assembly sequencing, focusing on the combinatorial aspect of the family of feasible assembly sequences, while temporarily separating out the specific geometric assumptions inherent to assembly sequencing. We formally prove the hardness of finding even near-optimal sequences for most cost measures in our generalized framework. As a special case, we prove similar, strong inapproximability results for the problem of scheduling with AND/OR precedence constraints. Finally, we re-introduce the geometry, and continue by realizing several of these hardness results in rather simple geometric settings. We are able to show strong inapproximability results, for example using an assembly consisting solely of unit disks in the plane. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1590/CS-TR-97-1590.pdf %R CS-TR-97-1595 %Z Fri, 19 Sep 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Maintaining data warehouses under limited source access %A Huyn, Nam %D September 1997 %X A data warehouse stores views derived from data that may not reside at the warehouse. Using these materialized views, user queries can be answered quickly because querying the external sources where the base data reside is avoided. However, when the sources change, the views in the warehouse can become inconsistent with the base data and must be maintained. A variety of approaches have been proposed for maintaining these views incrementally. At the one end of the spectrum, the required view updates are computed without restricting which base relations can be used. View maintenance with this approach is simple but can be expensive, since it may involve querying the external data sources. At the other end of the spectrum, additional views are stored at the warehouse to make sure that there is enough information to maintain the views without ever having to query the data sources. While this approach saves on external source access, it may require a large amount of information to be stored and maintained at the warehouse. In this thesis, we propose an intermediate approach to warehouse maintenance based on what we call {\em Runtime View Self-Maintenance}, where the views are incrementally maintained without using all the base relations but without requiring additional views to facilitate maintenance. Under limited information, however, maintaining a view unambiguously may not always be possible. Thus, the main questions in runtime view self-maintenance are: - View self-maintainability. Under what conditions (on the given information) can a view be maintained unambiguously with respect to a given update? - View self-maintenance. If a view can be maintained unambiguously, how do we maintain it using only the given information? The information we consider using for maintaining a view includes: - At least the contents of the view itself and the update instance - Optionally, the contents of other views in the warehouse, functional dependencies the base relations are known to satisfy, a subset of the base relations, and partial contents of a base relation. Developing efficient complete solutions for the runtime self-maintenance of conjunctive-query views is the main focus and the main contribution of this thesis. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1595/CS-TR-97-1595.pdf %R CS-TR-97-1594 %Z Wed, 17 Sep 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Interval and Point-Based Approaches to Hybrid System Verification %A Kapur, Arjun %D September 1997 %X Hybrid systems are real-time systems consisting of both continuous and discrete components. This thesis presents deductive and diagrammatic methodologies for proving point-based and interval-based properties of hybrid systems, where the hybrid system is modeled in either a sampling semantics or a continuous semantics. Under a sampling semantics the behavior of the system consists of a discrete number of system snapshots, where each snapshot records the state of the system at a particular moment in time. Under a continuous semantics, the system behavior is given by a function mapping each point in time to a system state. Two continuous semantics are studied: a continuous interval semantics, where at any given point in time the system is in a unique state, and a super-dense semantics, where no such requirement is needed. We use Linear-time Temporal Logic for expressing properties under either a sampling semantics or a super-dense semantics, and we introduce Hybrid Temporal Logic for expressing properties under a continuous interval semantics. Linear-time Temporal Logic is useful for expressing point-based properties, whose validity is dependent on individual states, while Hybrid Temporal Logic is useful for expressing both interval-based properties, whose validity is dependent on intervals of time, and point-based properties. Finally, two different verification methodologies are presented: a diagrammatic approach for verifying properties specified in Linear-time Temporal Logic, and a deductive approach for verifying properties specified in Hybrid Temporal Logic. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1594/CS-TR-97-1594.pdf %R CS-TR-97-1596 %Z Mon, 13 Oct 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Distributed Development of a Logic-Based Controlled Medical Terminology %A Campbell, Keith Eugene %D October 1997 %X A controlled medical terminology (CMT) encodes clinical data: patient's physical signs, symptoms, and diagnoses. Application developers lack a robust CMT and the methodologies needed to coordinate terminology development within and between projects. In this dissertation, I argue that if a formal terminology model is adopted and integrated into a change-management process that supports dynamic CMTs, then CMTs can evolve from being an impediment to application development and data analysis to a valuable resource. My thesis states that such an evolutionary approach can be supported by using semantics-based methods for managing concurrent terminology development, thereby bypassing the disadvantages of traditional lock-based approaches common in database systems. By allowing developers to work concurrently on the terminology while relying on semantics-based methods to resolve the "collisions" that are inevitable in concurrent work, a scalable approach to terminology development can be supported. This dissertation discusses CMT development in terms of three research topics: 1. Representation of Clinical Data 2. Concurrency Control 3. Configuration Management %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1596/CS-TR-97-1596.pdf %R CS-TR-97-1598 %Z Mon, 08 Dec 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Query Planning and Optimization in Information Integration %A Duschka, Oliver M. %D December 1997 %X Information integration systems provide uniform user interfaces to varieties of different information sources. Our work focuses on query planning in such systems. Query planning is the task of transforming a user query, represented in the user's interface language and vocabulary, into queries that can be executed by the information sources. Every information source might require a different query language and might use different vocabularies. We show that query plans with a fixed number of database operations are insufficient to extract all information from the sources, if functional dependencies or limitations on binding patterns are present. Dependencies complicate query planning because they allow query plans that would otherwise be invalid. We present an algorithm that constructs query plans that are guaranteed to extract all available information in these more general cases. This algorithm is also able to handle datalog user queries. We examine further extensions of the languages allowed for user queries and for describing information sources: disjunction, recursion and negation in source descriptions, negation and inequality in user queries. For these more expressive cases, we determine the data complexity required of languages able to represent "best possible" query plans. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1598/CS-TR-97-1598.pdf %R CS-TR-97-1600 %Z Mon, 22 Dec 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T An Implementation of a Combinatorial Approximation Algorithm for Minimum-Cost Multicommodity Flow %A Goldberg, Andrew %A Oldham, Jeffrey D. %A Plotkin, Serge %A Stein, Cliff %D December 1997 %X The minimum-cost multicommodity flow problem involves simultaneously shipping multiple commodities through a single network so that the total flow obeys arc capacity constraints and has minimum cost. Multicommodity flow problems can be expressed as linear programs, and most theoretical and practical algorithms use linear-programming algorithms specialized for the problems' structures. Combinatorial approximation algorithms yield flows with costs slightly larger than the minimum cost and use capacities slightly larger than the given capacities. Theoretically, the running times of these algorithms are much less than that of linear-programming-based algorithms. We combine and modify the theoretical ideas in these approximation algorithms to yield a fast, practical implementation solving the minimum-cost multicommodity flow problem. Experimentally, the algorithm solved our problem instances (to 1% accuracy) two to three orders of magnitude faster than the linear-programming package CPLEX and the linear-programming based multicommodity flow program PPRN. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1600/CS-TR-97-1600.pdf %R CS-TR-97-1597 %Z Fri, 07 Nov 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Earth Mover's Distance: Lower Bounds and Invariance under Translation %A Cohen, Scott %A Guibas, Leonidas %D November 1997 %X The Earth Mover's Distance (EMD) between two finite distributions of weight is proportional to the minimum amount of work required to transform one distribution into the other. Current content-based retrieval work in the Stanford Vision Laboratory uses the EMD as a common framework for measuring image similarity with respect to color, texture, and shape content. In this report, we present some fast to compute lower bounds on the EMD which may allow a system to avoid exact, more expensive EMD computations during query processing. The effectiveness of the lower bounds is tested in a color-based retrieval system. In addition to the lower bound work, we also show how to compute the EMD under translation. In this problem, the points in one distribution are free to translate, and the goal is to find a translation that minimizes the EMD to the other distribution. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1597/CS-TR-97-1597.pdf %R CS-TR-97-1599 %Z Wed, 17 Dec 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Trial Banks: An Informatics Foundation for Evidence-Based Medicine %A PhD, Ida Sim, MD, %D December 1997 %X Randomized clinical trials constitute one of our main sources of medical knowledge, yet trial reports are difficult to find, read, and apply to clinical care. I propose that authors report trials both as entries into electronic knowledge bases - or trial banks - and as text articles in traditional journals. Trial banks should be interoperable, and we thus require a shared ontology of clinical-trial concepts. My thesis work is the design, implementation, and evaluation of such an ontology. Using a new approach called competency decomposition, I show that my ontology design is reasonable, and that the ontology is competent for three of the four core tasks of clinical-trials interpretation for a broad range of trial types. Using this ontology, I implemented a frame-based trial bank that can be queried dynamically over the World Wide Web. Clinical researchers successfully used this system to critique trials in the trial bank. With the advent of digital publication, we have a window of opportunity to design our publication systems such that they support the transfer of evidence from the research world to the clinic. This dissertation presents foundational work for an interoperating trial-bank system that will help us achieve the day-to-day practice of evidence-based medicine. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1599/CS-TR-97-1599.pdf %R CS-TR-97-1592 %Z Tue, 15 Jul 97 00:00:00 GMT %I Stanford University, Department of Computer Science %T Online Throughput-Competitive Algorithm for Multicast Routing and Admission Control %A Goel, Ashish %A Henzinger, Monika R. %A Plotkin, Serge %D July 1997 %X We present the first polylog-competitive online algorithm for the general multicast problem in the throughput model. The ratio of the number of requests accepted by the optimum offline algorithm to the expected number of requests accepted by our algorithm is polylogarithmic in M and n, where M is the number of multicast groups and n is the number of nodes in the graph. We show that this is close to optimum by presenting an Omega(log n log M) lower bound on this ratio for any randomized online algorithm against an oblivious adversary. We also show that it is impossible to be competitive against an adaptive online adversary. As in the previous online routing algorithms, our algorithm uses edge-costs when deciding on which is the best path to use. In contrast to the previous competitive algorithms in the throughput model, our cost is not a direct function of the edge load. The new new cost definition allows us to decouple the effects of routing and admission decisions of different multicast groups. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/97/1592/CS-TR-97-1592.pdf %R CS-TR-98-1602 %Z Fri, 20 Feb 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Type Systems for Object-Oriented Programming Languages %A Fisher, Kathleen %D February 1998 %X Object-oriented programming languages (OOPL's) provide important support for today's large-scale software projects. Unfortunately, typed OOPL's have suffered from overly restrictive type systems that have forced programmers to use type-casts to achieve flexibility, a notorious source of hard-to-find bugs. One source of this inflexibility is the conflation of subtyping and inheritance, which reduces potential code reuse. Attempts to fix this rigidity have resulted in unsound type systems, most notably Eiffel's. This thesis develops a sound type system for a formal object-oriented language. It gains flexibility by separating subtyping and inheritance and by supporting method specialization, which allows the types of methods to be refined during inheritance. The lack of such a mechanism is a key source of type-casts in languages like C++. Abstraction primitives in this formal language support a class construct similar to the one found in C++ and Java, explaining the link between inheritance and subtyping: object types that include implementation information are a form of abstract type, and the only way to produce a subtype of an abstract type is via inheritance. Formally, the language is presented as an object calculus. The thesis proves type soundness with respect to an operational semantics via a subject reduction theorem. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1602/CS-TR-98-1602.pdf %R CS-TR-98-1603 %Z Thu, 05 Mar 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Using Complete Machine Simulation to Understand Computer System Behavior %A Herrod, Stephen Alan %D March 1998 %X This dissertation describes complete machine simulation, a novel approach to understanding the behavior of modern computer systems. Complete machine simulation models all of the hardware found in modern computer systems, allowing it to investigate the behavior of highly configurable machines running commercial operating systems and important workloads such as database and web servers. Complete machine simulation extends the applicability of traditional machine simulation techniques by addressing speed and data organization challenges. To achieve the speed needed to investigate long-running workloads, complete machine simulation allows an investigator to dynamically adjust the characteristics of its hardware simulation. An investigator can select a high-speed, low-detail simulation setting to quickly pass through uninteresting portions of a workload's execution. Once the workload has reached a more interesting execution state, an investigator can switch to slower, more detailed simulation to obtain behavioral information. To efficiently organize low-level hardware simulation data into more useful information, complete machine simulation provides several mechanisms that incorporate higher-level workload knowledge into the data management process. These mechanisms are efficient and further improve simulation speed by customizing all data collection and reporting to the specific needs of an investigation. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1603/CS-TR-98-1603.pdf %R CS-TR-98-1604 %Z Thu, 19 Mar 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Theory and Applications of Steerable Functions %A Teo, Patrick C. %D March 1998 %X A function is called steerable if transformed versions of the function can be expressed using linear combinations of a fixed set of basis functions. In this dissertation, we propose a framework, based on Lie group theory, for studying and constructing functions steerable under any smooth transformation group. Existing analytical approaches to steerability are consistently explained within the framework. The design of a suitable set of basis functions given any arbitrary steerable function is one of the main problems concerning steerable functions. To this end, we have developed two different algorithms. The first algorithm is a symbolic method that derives the minimal set of basis functions automatically given an arbitrary steerable function. In practice, functions that need to be steered might not be steerable with a finite number of basis functions. Moreover, it is often the case that only a small subset of transformations within the group of transformations needs to be considered. In response to these two concerns, the second algorithm computes the optimal set of k basis functions to steer an arbitrary function under a subset of the group of transformations. Lastly, we demonstrate the usefulness of steerable functions in a variety of applications. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1604/CS-TR-98-1604.pdf %R CS-TR-98-1605 %Z Mon, 06 Apr 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Learning to Surf: Multiagent Systems for Adaptive Web Page Recommendation %A Balabanovic, Marko %D March 1998 %X Imagine a newspaper personalized for your tastes. Instead of a selection of articles chosen for a general audience by a human editor, a software agent picks items just for you, covering your particular topics of interest. Since there are no journalists at its disposal, the agent searches the Web for appropriate articles. Over time, it uses your feedback on recommended articles to build a model of your interests. This thesis investigates the design of "recommender systems" which create such personalized newspapers. Two research issues motivate this work and distinguish it from approaches usually taken by information retrieval or machine learning researchers. First, a recommender system will have many users, with overlapping interests. How can this be exploited? Second, each edition of a personalized newspaper consists of a small set of articles. Techniques for deciding on the relevance of individual articles are well known, but how is the composition of the set determined? One of the primary contributions of this research is an implemented architecture linking populations of adaptive software agents. Common interests among its users are used both to increase efficiency and scalability, and to improve the quality of recommendations. A novel interface infers document preferences by monitoring user drag-and-drop actions, and affords control over the composition of sets of recommendations. Results are presented from a variety of experiments: user tests measuring learning performance, simulation studies isolating particular tradeoffs, and usability tests investigating interaction designs. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1605/CS-TR-98-1605.pdf %R CS-TR-98-1607 %Z Mon, 18 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Network-Centric Design for Relationship-Based Rights Management %A Roscheisen, Martin %D May 1998 %X Networked environments such as the Internet provide a new platform for communication and information access. In this thesis, we address the question of how to articulate and enforce boundaries of control on top of this platform, while enabling collaboration and sharing in a peer-to-peer environment. We develop the concepts and technologies for a new Internet service layer, called FIRM, that enables structured rights/relationship management. Using a prototype implementation, RManage, we show how FIRM makes it possible to unify rights/relationship management from a user-centered perspective and to support full end-to-end integration of shared control state in network services and users' client applications. We present a network-centric architecture for managing control information, which generalizes previous, client/server-based models to a peer-to-peer environment. Principles and concepts from contract law are used to identify a generic way of representing the shared structure of different kinds of relationships. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1607/CS-TR-98-1607.pdf %R CS-TR-98-1606 %Z Thu, 14 May 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Associative Caching in Client-Server Databases %A Basu, Julie %D May 1998 %X Client-server configuration is a popular architecture for modern databases. A traditional assumption in such systems is that clients have limited resources, and query processing is always performed by the server. The server is thus a potential performance bottleneck. To improve the system performance and scalability, today's powerful clients can cache data locally. In this dissertation, we study a new scheme, A*Cache, for associative client-side caching. In contrast to navigational data access using object or page identifiers, A*Cache supports content-based associative access for better data reuse. Query results are stored locally along with their description, and predicate-based reasoning is used to examine and maintain the client cache. Clients execute queries locally if the data is cached, and use update notifications generated by the server for cache maintenance. We first describe the architecture of A*Cache and its transaction execution model. We then develop new optimization techniques for improving the performance of A*Cache. Next, A*Cache performance is investigated through detailed simulation of a client-server database under many different workloads, and compared with other types of caching systems. The simulation results clearly demonstrate the effectiveness of our associative caching scheme for read-only environments, and also for read-write scenarios with moderately high data update probabilities. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1606/CS-TR-98-1606.pdf %R CS-TR-98-1608 %Z Fri, 12 Jun 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T A New Perspective on Partial Evaluation and Use Analysis %A Katz, Morris J. %D June 1998 %X Partial evaluators are compile time optimizers achieving performance improvements through a program modification technique called specialization. Partial evaluators produce one or more copies, or specializations, of each procedure in a source program in the output program. Specializations are distinguished by being optimized for invocation from call sites with different characteristics, for example, placing certain constraints on argument values. Specializations are created by partially executing procedures, leaving only unexecutable portions as residual code. Symbolic execution can replace variable references by the referenced values, executed primitives by their computed results, and function applications by the bodies of the applied functions, yielding inlining. One core challenge of partial evaluation is selecting what specializations to create. Attempting to produce an infinite number of specializations results in divergence. The termination mechanism of a partial evaluator decides whether or not to symbolically execute a procedure in order to create a new specialization. Creating a termination mechanism that precludes divergence is not difficult. However, crafting a termination mechanism resulting in the production of a sufficient number of appropriate specializations to produce high quality residual code while still terminating all, or most, of the time is quite challenging. This dissertation presents a new type of analysis, called use analysis, forming the basis of a termination mechanism designed to yield a better combination of residual code quality and frequent termination than the current state-of-the-art. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1608/CS-TR-98-1608.pdf %R CS-TR-98-1601 %Z Fri, 12 Jun 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Formal Verification of Probabilistic Systems %A Alfaro, Luca de %D June 1998 %X This dissertation presents methods for the formal modeling and specification of probabilistic systems, and algorithms for the automated verification of these systems. Our system models describe the behavior of a system in terms of probability, nondeterminism, fairness and time. The formal specification languages we consider are based on extensions of branching-time temporal logics, and enable the expression of single-event and long-run average system properties. This latter class of properties, not expressible with previous formal languages, includes most of the performance properties studied in the field of performance evaluation, such as system throughput and average response time. Our choice of system models and specification languages has been guided by the goal of providing efficient verification algorithms. The algorithms rely on the theory of Markov decision processes, and exploit a connection between the graph-theoretical and probabilistic properties of these processes. This connection also leads to new results about classical problems, such as an extension to the solvable cases of the stochastic shortest path problem, an improved algorithm for the computation of reachability probabilities, and new results on the average reward problem for semi-Markov decision processes. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1601/CS-TR-98-1601.pdf %R CS-TR-98-1609 %Z Tue, 28 Jul 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Automated creation of clinical-practice guidelines from decision models %A Sanders, Gillian D. %D July 1998 %X I developed an approach that allows clinical-practice guideline (CPG) developers to create, disseminate, and tailor CPGs, using decision models (DMs). I propose that guideline developers can use computer-based DMs that reflect global and site-specific data to generate CPGs. Such CPGs are high quality, can be tailored to specific settings, and can be modified automatically as the DM or evidence evolves. I defined conceptual models for representing CPGs and DMs, and formalized a method for mapping between these two representations. I designed a DM annotation editor that queries the decision analyst for missing knowledge. I implemented the ALCHEMIST system that encompasses the conceptual models, mapping algorithm, and the resulting tailoring abilities. I evaluated the design of both conceptual models, and the accuracy of the mapping algorithm. To show that ALCHEMIST produces high-quality CPGs, I had users rate the quality of produced CPGs using a guideline-rating key, and evaluate ALCHEMIST's tailoring abilities. ALCHEMIST automates the DM-to-CPG process and distributes the CPG over the web to allow local developers to apply, tailor, and maintain a global CPG. I argue that my framework is a method for guideline developers to create and maintain automated CPGs, and it thus promotes high-quality and cost-effective health care. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1609/CS-TR-98-1609.pdf %R CS-TR-98-1611 %Z Tue, 15 Sep 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Approximation Algorithms for Scheduling Problems %A Chekuri, Chandra %D September 1998 %X This thesis describes efficient approximation algorithms for some NP-Hard deterministic machine scheduling and related problems. We study the objective functions of minimizing makespan (the time to complete all jobs) and minimizing average completion time in a variety of settings described below. 1. Minimizing average completion time and its weighted generalization for single and parallel machine problems. We introduce new techniques that either improve earlier results and/or result in simple and efficient approximation algorithms. In addition to improved results for specific problems, we give a general algorithm that converts an x approximate single machine schedule into a (2x + 2) approximate parallel machine schedule. 2. Minimizing makespan on machines with different speeds when jobs have precedence constraints. We obtain an O(log m) approximation (m is the number of machines) in O(n^3) time. 3. We introduce a class of new scheduling problems that arise from query optimization in parallel databases. The novel aspect consists of modeling communication costs in query execution. We devise algorithms for pipelined operator scheduling. We obtain a PTAS and also simpler O(n log n) time algorithms with ratios of 3.56 and 2.58. 4. Multi-dimensional generalizations of three well known problems in combinatorial optimization: multi-processor scheduling, bin packing, and the knapsack problems. We obtain several approximability and inapproximability results. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1611/CS-TR-98-1611.pdf %R CS-TR-98-1613 %Z Fri, 02 Oct 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T On the synchronization of Poisson processes and queueing networks with service and synchronization nodes. %A Prabhakar, Balaji %A Bambos, Nicholas %A Mountford, Tom %D October 1998 %X This paper investigates the dynamics of a synchronization node in isolation, and of networks of service and synchronization nodes. A synchronization node consists of $M$ infinite capacity buffers, where tokens arriving on $M$ distinct random input flows are stored (there is one buffer for each flow). Tokens are held in the buffers until one is available from each flow. When this occurs, a token is drawn from each buffer to form a group-token, which is instantaneously released as a synchronized departure. Under independent Poisson inputs, the output of a synchronization node is shown to converge weakly (and in certain cases strongly) to a Poisson process with rate equal to the minimum rate of the input flows. Hence synchronization preserves the Poisson property, as do superposition, Bernoulli sampling and M/M/1 queueing operations. We then consider networks of synchronization and exponential server nodes with Bernoulli routing and exogenous Poisson arrivals, extending the standard Jackson Network model to include synchronization nodes. It is shown that if the synchronization skeleton of the network is acyclic (i.e. no token visits any synchronization node twice although it may visit a service node repeatedly), then the distribution of the joint queue-length process of only the service nodes is product form (under standard stability conditions) and easily computable. Moreover, the network output flows converge weakly to Poisson processes. Finally, certain results for networks with finite capacity buffers are presented, and the limiting behavior of such networks as the buffer capacities become large is studied. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1613/CS-TR-98-1613.pdf %R CS-TR-98-1614 %Z Fri, 16 Oct 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Decomposing, Transforming and Composing Diagrams: The Joys of Modular Verification %A Alfaro, Luca de %A Manna, Z ohar %A Sipma, Henny %D October 1998 %X The paper proposes a modular framework for the verification of temporal logic properties of systems based on the deductive transformation and composition of diagrams. The diagrams represent abstractions of the modules composing the system, together with information about the environment of the modules. The proof of a temporal specification is constructed with the help of diagram transformation and composition rules, which enable the gradual decomposition of the system into manageable modules, the study of the modules, and the final combination of the diagrams into a proof of the specification. We illustrate our methodology with the modular verification of a database demarcation protocol. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1614/CS-TR-98-1614.pdf %R CS-TR-98-1612 %Z Thu, 01 Oct 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Pleiades Project: Collected Work 1997-1998 %A Cervesato, Iliano, (editor) %A Mitchell, John C., (editor) %D October 1998 %X This report collects the papers that were written by the participants of the Pleiades Project and their collaborators from April 1997 to August 1998. Its intent is to give the reader an overview of our accomplishments during this initial phase of the project. Therefore, rather than including complete publications, we chose to reproduce only the first four pages of each paper. In order to satisfy the legitimate curiosity of readers interested in specific articles, each paper can be integrally retrieved from the World-Wide Web through the provided URL. A list of the current publications of the Pleiades Project is accessible at the URL http://theory.stanford.edu/muri/papers.html. Future articles will be posted there as they become available. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1612/CS-TR-98-1612.pdf %R CS-TR-98-1615 %Z Tue, 15 Dec 98 00:00:00 GMT %I Stanford University, Department of Computer Science %T Using Machine Learning to Improve Information Access %A Sahami, Mehran %D December 1998 %X We address the problem of topical information space navigation. Specifically, we combine query tools with methods for automatically creating topic taxonomies in order to organize text collections. Our system, named SONIA (Service for Organizing Networked Information Autonomously), is implemented in the Stanford Digital Libraries testbed. It employs several novel probabilistic Machine Learning methods that enable the automatic creation of dynamic topic hierarchies based on the full-text content of documents. First, to generate such topical hierarchies, we employ a novel clustering scheme that outperforms traditional methods used in both Information Retrieval and Probabilistic Reasoning. Furthermore, we develop methods for classifying new articles into such automatically generated, or existing manually generated, hierarchies. Our method explicitly uses the hierarchical relationships between topics to improve classification accuracy. Much of this improvement is derived from the fact that the classification decisions in such a hierarchy can be made by considering only the presence (or absence) of a small number of features (words) in each document. The choice of relevant words is made using a novel information theoretic algorithm for feature selection. The algorithms used in SONIA are also general enough to have been successfully applied to data mining problems in different domains than text. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/98/1615/CS-TR-98-1615.pdf %R CS-TR-99-1617 %Z Thu, 11 Feb 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Segmentation of Medical Image Volumes Using Intrinsic Shape Information %A Shiffman, Smadar %D February 1999 %X I propose a novel approach to segmentation of image volumes that requires only a small amount of user intervention and that does not rely on prior global shape models. The approach, intrinsic shape for volume segmentation (IVSeg), comprises two methods. T he first method analyzes isolabel-contour maps to identify salient regions that correspond to major objects. The method detects transitions from within objects into the background by matching isolabel contours that form along the boundaries of objects as a result of multilevel thresholding with a fine partition of the intensity range. The second method searches in the entire sequence for regions that belong to an object that the user selects from one or a few sections. The method uses local overlap criter ia to determine whether regions that overlap in a given direction (coronal, sagittal, or axial) belong to the same object. For extraction of blood vessels, the method derives the criteria dynamically by fitting cylinders to regions in consecutive sections and computing the expected overlap of slices of these cylinders. In a formal evaluation study with CTA data, I showed that IVSeg reduced user editing time by a factor of 5 without affecting the results in any significant way. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1617/CS-TR-99-1617.pdf %R CS-TR-99-1618 %Z Fri, 26 Mar 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Abstraction-based Deductive-Algorithmic Verification of Reactive Systems %A Uribe, Tomas E. %D March 1999 %X This thesis presents a framework that combines deductive and algorithmic methods for verifying temporal properties of reactive systems, to allow more automatic verification of general infinite-state systems and the verification of larger finite-state ones. Underlying these methods is the theory of property-preserving assertion-based abstractions, where a finite-state abstraction of the system is deductively justified and algorithmically model checked. After presenting an abstraction framework that accounts for fairness, we describe a method to automatically generate finite-state abstractions. We then show how a number of other verification methods, including deductive rules, (Generalized) Verification Diagrams, and Deductive Model Checking, can also be understood as constructing finite-state abstractions that are model checked. Our analysis leads to a better classification and understanding of these verification methods. Furthermore, it shows how the different abstractions that they construct can be combined. For this, we present an algorithmic Extended Model Checking procedure, which uses all the information that these methods produce, in a finite-state format that can be easily and incrementally combined. Besides a standard safety component, the combined abstractions include extra bounds on fair transitions, well-founded orders, and constrained transition relations for the generation of counterexamples. Thus, our approach minimizes the need for user interaction and maximizes the impact of the available automated deduction and model checking tools. Once proved, verification conditions are re-used as much as possible, leaving the temporal and combinatorial reasoning to automatic tools. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1618/CS-TR-99-1618.pdf %R CS-TR-99-1620 %Z Fri, 28 May 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finding Color and Shape Patterns in Images %A Cohen, Scott %D May 1999 %X This thesis is devoted to the Earth Mover's Distance (EMD), an edit distance between distributions, and its use within content-based image retrieval (CBIR). The major CBIR problem discussed is the pattern problem: Given an image and a query pattern, determine if the image contains a region which is visually similar to the pattern; if so, find at least one such image region. An important problem that arises in applying the EMD to CBIR is the EMD under transformation (EMD_G) problem: find a transformation of one distribution which minimizes its EMD to another, where the set of allowable transformations G is given. The problem of estimating the size/scale at which a pattern occurs in an image is phrased and efficiently solved as an EMD_G problem. For a large class of transformation sets, we also present a monotonically convergent iteration to find at least a locally optimal transformation. Our pattern problem solution is the SEDL (Scale Estimation for Directed Location) image retrieval system. Three important contributions of SEDL are (1) a general framework for finding both color and shape patterns, (2) the previously mentioned scale estimation algorithm using the EMD, and (3) a directed (as opposed to exhaustive) search strategy. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1620/CS-TR-99-1620.pdf %R CS-TR-99-1619 %Z Wed, 07 Apr 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Intelligent Alarms: Allocating Attention Among Concurrent Processes %A Huang, Cecil %D April 1999 %X I have developed and evaluated a computable, normative framework for intelligent alarms: automated agents that allocate scarce attention resources to concurrent processes in a globally optimal manner. My approach is decision-theoretic, and relies on Markov decision processes to model time-varying, stochastic systems that respond to externally applied actions. Given a collection of continuing processes and a specified time horizon, my framework computes, for each process: (1) an attention allocation, which reflects how much attention the process is awarded, and (2) an activation price, which reflects the process's priority in receiving the allocated attention amount. I have developed a prototype, Simon, that computes these alarm signals for a simulated ICU. My validity experiments investigate whether sensible input results in sensible output. The results show that Simon produces alarm signals that are consistent with sound clinical judgment. To assess computability, I used Simon to generate alarm signals for an ICU that contained 144 simulated patients; the entire computation took about 2 seconds on a machine with only moderate processing capabilities. I thus conclude that my alarm framework is valid and computable, and therefore is potentially useful in a real-world ICU setting. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1619/CS-TR-99-1619.pdf %R CS-TR-99-1623 %Z Mon, 23 Aug 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Efficient Maintenance and Recovery of Data Warehouses %A Labio, Wilburt Juan %D August 1999 %X Data warehouses collect data from multiple remote sources and integrate the information as materialized views in a local database. The materialized views are used to answer queries that analyze the collected data for patterns, and trends. This type of query processing is often called on-line analytical processing (OLAP). The warehouse views must be updated when changes are made to the remote information sources. Otherwise, the answers to OLAP queries are based on stale data. Answering OLAP queries based on stale data is clearly a problem especially if OLAP queries are used to support critical decisions made by the organization that owns the data warehouse. Because the primary purpose of the data warehouse is to answer OLAP queries, only a limited amount of time and/or resources can be devoted to the warehouse update. Hence, we have developed new techniques to ensure that the warehouse update can be done efficiently. Also, the warehouse update is not devoid of failures. Since only a limited amount of time and/or resources are devoted to the warehouse update, it is most likely infeasible to restart the warehouse update from scratch. Thus, we have developed new techniques for resuming failed warehouse updates. Finally, warehouse updates typically transfer gigabytes of data into the warehouse. Although the price of disk storage is decreasing, there will be a point in the ``lifetime" of a data warehouse when keeping and administering all of the collected is unreasonable. Thus, we have investigated techniques for reducing the storage cost of a data warehouse by selectively ``expiring'' information that is not needed. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1623/CS-TR-99-1623.pdf %R CS-TR-99-1622 %Z Mon, 23 Aug 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Multicommodity and Generalized Flow Algorithms: Theory and Practice %A Oldham, Jeffrey David %D August 1999 %X We present several simple, practical, and fast algorithms for linear programs, concentrating on network flow problems. Since the late 1980s, researchers developed different combinatorial approximation algorithms for fractional packing problems, obtaining the fastest theoretical running times to solve multicommodity minimum-cost and concurrent flow problems. A direct implementation of these multicommodity flow algorithms was several orders of magnitude slower than solving these problems using a commercial linear programming solver. Through experimentation, we determined which theoretically equivalent constructs are experimentally efficient. Guided by theory, we designed and implemented practical improvements while maintaining the same worst-case complexity bounds. The resulting algorithms solve problems orders of magnitude faster than commercial linear programming solvers and problems an order of magnitude larger. We also present simple, combinatorial algorithms for generalized flow problems. These problems generalize ordinary network flow problems by specifying a flow multiplier \mu(a) for each arc a. Using multipliers permit a flow problem to model transforming one type into another, e.g., currency exchange, and modification of the amount of flow, e.g., water evaporation from canals or accrual of interest in bank accounts. First, we show the generalized shortest paths problem can be solved using existing network flow ideas, i.e., by combining the Bellman-Ford-Moore shortest path framework and Megiddo's parametric search. Second, we combine this algorithm with fractional packing frameworks to yield the first polynomial-time combinatorial approximation algorithms for the generalized versions of the nonnegative-cost minimum-cost flow, concurrent flow, multicommodity maximum flow, and multicommodity nonnegative-cost minimum-cost flow problems. These algorithms show that generalized concurrent flow and multicommodity maximum flow have strongly polynomial approximation algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1622/CS-TR-99-1622.pdf %R CS-TR-99-1625 %Z Thu, 26 Aug 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Pleiades Project: Collected Work 1998-1999 %A Cervesato, Iliano (editor) %A Mitchell, John C. (editor) %D August 1999 %X This report collects the papers that were written by the participants of the Pleiades Project and their collaborators from September 1998 to August 1999. Its intent is to give the reader an overview of our accomplishments during this central phase of the project. Therefore, rather than including complete publications, we chose to reproduce only the first four pages of each paper. The papers can be integrally retrieved from the World-Wide Web through the provided URLs. A list of the current publications of the Pleiades Project is accessible at the URL http://theory.stanford.edu/muri/papers.html". Future articles will be posted there as they become available. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1625/CS-TR-99-1625.pdf %R CS-TR-99-1621 %Z Mon, 23 Aug 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Perceptual Metrics for Image Database Navigation %A Rubner, Yossi %D August 1999 %X The increasing amount of information available in today's world raises the need to retrieve relevant data efficiently. Unlike text-based retrieval, where keywords are successfully used to index into documents, content-based image retrieval poses up front the fundamental questions how to extract useful image features and how to use them for intuitive retrieval. We present a novel approach to the problem of navigating through a collection of images for the purpose of image retrieval, which leads to a new paradigm for image database search. We summarize the appearance of images by distributions of color or texture features, and we define a metric between any two such distributions. This metric, which we call the "Earth Mover's Distance" (EMD), represents the least amount of work that is needed to rearrange the mass is one distribution in order to obtain the other. We show that the EMD matches perceptual dissimilarity better than other dissimilarity measures, and argue that it has many desirable properties for image retrieval. Using this metric, we employ Multi-Dimensional Scaling techniques to embed a group of images as points in a two- or three-dimensional Euclidean space so that their distances reflect image dissimilarities as well as possible. Such geometric embeddings exhibit the structure in the image set at hand, allowing the user to understand better the result of a database query and to refine the query in a perceptually intuitive way. By iterating this process, the user can quickly zoom in to the portion of the image space of interest. We also apply these techniques to other modalities such as mug-shot retrieval. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1621/CS-TR-99-1621.pdf %R CS-TR-99-1624 %Z Thu, 26 Aug 99 00:00:00 GMT %I Stanford University, Department of Computer Science %T Non-blocking Synchronization and System Design %A Greenwald, Michael %D August 1999 %X Non-blocking synchronization (NBS) has significant advantages over blocking synchronization in areas of fault-tolerance, system structure, portability, and performance. These advantages gain importance with the increased use of parallelism and multiprocessors, and as delays increase relative to processor speed. This thesis demonstrates that non-blocking synchronization is practical as the sole co-ordination mechanism in systems by showing that careful OS design eases implementation of efficient NBS, by demonstrating that DCAS (Double-Compare-and-Swap) is the necessary and sufficient primitive for implementing NBS, and by demonstrating that efficient hardware DCAS is practical for RISC processors. This thesis presents high-performance non-blocking implementations of common data-structures sufficient to implement an operating system kernel. I also present more general algorithms: non-blocking implementations of \casn\ and software transactional memory. Both have overhead proportional to the number of writes, support multi\--objects, and use a DCAS-based contention-reduction technique that is fault-tolerant and OS-independent yet performs as well as the best previously published techniques. I demonstrate that proposed OS implementations of DCAS are inefficient, and propose a design for efficient hardware DCAS specific to the R4000 but generalizable to other RISC processors. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/99/1624/CS-TR-99-1624.pdf %R CS-TR-00-1631 %Z Fri, 14 Apr 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Early Vision Using Distributions %A Ruzon, Mark A. %D April 2000 %X For over thirty years researchers in computer vision have been proposing new methods for performing ``early vision'' tasks such as detecting edges and corners. One key element shared by most methods is that they represent local image neighborhoods as constant in color or intensity, with deviations modeled as noise. Due to computational considerations that encourage the use of small neighborhoods where this assumption holds, these methods remain popular. This research models a neighborhood as a distribution of colors. Our goal is to show that the increase in accuracy of this representation translates into higher-quality results for early vision tasks on difficult, natural images, especially as neighborhood size increases. We emphasize large neighborhoods because small ones often do not contain enough information. We emphasize color because it subsumes greyscale as an image range and because it limits the number of valid models we should consider; using only greyscale images allows assumptions that do not hold for color. We discuss distributions in the context of three related image boundary tasks: edge detection, corner detection, and estimating alpha, or the percentage with which two colors from two objects mix to form the color of a pixel at a boundary. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1631/CS-TR-00-1631.pdf %R CS-TR-00-1630 %Z Fri, 14 Apr 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Dynamic Categorization: A Method for Decreasing Information Overload %A Pratt, Wanda %D April 2000 %X When people use computer-based tools to find answers to general questions, they often are faced with a daunting list of search results that are returned by the search engine. Many search tools address this problem by helping users to make their searches more specific. However, when dozens or hundreds of documents are relevant to their question, users need tools that help them to explore and to understand their search results, rather than ones that eliminate a portion of those results. I have developed a new approach, called dynamic categorization, that addresses this prob-lem by automatically organizing search results into meaningful groups that correspond to the user's query. This approach uses knowledge of important kinds of queries and a model of the domain terminology to generate a hierarchical categorization of search results. I implemented this approach for the domain of medicine, where the amount of information in the primary medical literature alone is overwhelming. Results from my evaluation show that a tool based on this approach helps users to find answers to those important kinds of questions more quickly and easily than when they use a relevance-ranking system or a clustering system. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1630/CS-TR-00-1630.pdf %R CS-TR-00-1632 %Z Fri, 02 Jun 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Finite-State Analysis of Security Protocols %A Shmatikov, Vitaly %D June 2000 %X Security protocols are notoriously difficult to design and debug. Even if the cryptographic primitives underlying a protocol are secure, unexpected interactions between parts of the protocol or several instances of the same protocol can lead to catastrophic security breaches. Since protocol attacks tend to be very subtle, some computer assistance is desirable. The main contribution of this thesis is to demonstrate how fully automatic finite-state techniques can be used to analyze a wide variety of security protocols. We present several case studies in which we model security protocols as finite-state systems, then perform automatic exhaustive state search that either discovers an attack, or proves the protocol correct subject to the limitations of the model. In our first study, we analyze SSL 3.0, a widely used Internet security protocol. The second study focuses on contract signing protocols designed to guarantee properties such as fairness and accountability. All analyses were performed using a general-purpose finite-state tool called Murphi. To alleviate the state-space explosion problem, we develop several state reduction techniques that exploit fundamental properties of security protocols. These optimizations make analysis of large protocols feasible, and establish Murphi as a viable protocol analysis tool. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1632/CS-TR-00-1632.pdf %R CS-TR-00-1629 %Z Wed, 26 Jul 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Kinetic vertical decomposition trees %A Comba, Joao Luiz DihlOMBA %D March 2000 %X This thesis presents a new structure called the Kinetic Vertical Decomposition Tree (KVD), used for the dynamic maintenance of visibility information for a set of moving objects in space. The KVD is a single structure that not only (1) allows dynamic maintenance of visibility, but also (2) represents a vertical decomposition of the space, (3) allows collision detection among moving objects, and (4) it is kinetically maintained based on the kinetic data structures framework. The KVD is a special type of Binary Space Partition tree (BSP), a hierarchical data structure commonly used in solid modeling and computer graphics for feature classification and visibility determination. In the KVD, additional cuts are introduced from edges and vertices, so that a vertical decomposition is formed. The bounded complexity of the cells in this decomposition allows the creation of certificates that indicate times when the movement of objects causes a change in the decomposition. These certificates are used within the framework of kinetic data structures to identify when the structure of the KVD changes. The update of the KVD involves local changes in the tree, accomplished by special update algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1629/CS-TR-00-1629.pdf %R CS-TR-00-1633 %Z Wed, 16 Aug 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Roma Personal Metadata Service %A Swierk, Edward %A Kiciman, Emre %A Laviano, Vince %A Baker, Mary %D August 2000 %X People now have available to them a diversity of digital storage devices for their personal files. These devices include palmtops, cell phone address books, laptops, desktop computers and web-based services. Unfortunately, as the number of personal data repositories increases, so does the management problem of ensuring that the most up-to-date version of any document is available to the user on the storage device he is currently using. We introduce the Roma personal metadata service to make it easier to locate current file versions and ensure their availability across different repositories. Roma does this through the use of a centralized, available and usually portable metadata store used by mobility-aware clients. Separating out the metadata store from the respositories eases deployment of the system, since it allows us to use existing repositories without change. In this paper we describe the design requirements, architecture and current prototype implementation of Roma. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1633/CS-TR-00-1633.pdf %R CS-TR-00-1634 %Z Wed, 30 Aug 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Simulation-Based Search for Hybrid System Control and Analysis %A Neller, Todd William %D August 2000 %X This dissertation explores new algorithmic approaches to simulation-based optimization, game-tree search, and tree search for the control and analysis of hybrid systems. Hybrid systems are systems that evolve with both discrete and continuous behaviors. Examples of hybrid systems include diverse mode-switching systems such as those we have used as focus problems: stepper motors, magnetic levitation units, and submarine detection avoidance scenarios. For hybrid systems with complex dynamics, the designer may have little other than simulation as a tool to detect design flaws or inform offline or real-time control. In approaching control and analysis of such systems, we thus limit ourselves to a black-box simulation of the system. Among our algorithmic contributions are: - the first multi-dimensional information-based optimization approach, - a generalization of previous multi-level optimization methods, - information-based alpha-beta game-tree search, - syntheses of cell-mapping and game-tree search techniques, - iterative refinement approaches for dynamic action timing discretization, - a best-first search variant with dynamic time-step refinement, - iterative refinement with an epsilon variant of recursive best-first search, and - a dispersion technique for dynamic action parameter discretization. We also formally define several hybrid system game-tree and tree search problems. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1634/CS-TR-00-1634.pdf %R CS-TR-00-1635 %Z Thu, 31 Aug 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Change Management and Synchronization of Local and Shared Versions of a Controlled Vocabulary %A Oliver, Diane E. %D August 2000 %X To share clinical data and to build interoperating computer systems that permit data entry, data retrieval, and data analysis, users and systems at multiple sites must share a common controlled clinical vocabulary (or ontology). However, local sites that adopt a shared vocabulary have local needs, and local-vocabulary maintainers make changes to the local version of that vocabulary. If the local site is motivated to conform to the shared vocabulary, then the burden lies with the local site to manage its own changes and to incorporate changes from the shared version at periodic intervals. I call this process synchronization. In this dissertation, I present an approach to change management and synchronization of local and shared versions of a controlled vocabulary. I describe the CONCORDIA model, which comprises a structural model, a change model, and a log model to which the shared and local vocabularies conform. I demonstrate use of this model in the implementation of a synchronization-support tool that supports carefully controlled divergence. I evaluated my model and methods by performing synchronization on a small test set of medical concepts in the subdomain of rickettsial diseases. The CONCORDIA model served as an effective approach for representation and communication of vocabulary change. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1635/CS-TR-00-1635.pdf %R CS-TR-00-1636 %Z Thu, 07 Sep 00 00:00:00 GMT %I Stanford University, Department of Computer Science %T Design and Analysis of Fast Low Power SRAMs %A Amrutur, Bharadwaj S. %D September 2000 %X This thesis explores the design and analysis of Static Random Access Memories (SRAMs), focusing on optimizing delay and power. The SRAM access path is split into two portions: from address input to word line rise (the row decoder) and from word line rise to data output (the read data path). Techniques to optimize both of these paths are investigated. We determine the optimal decoder structure for fast low power SRAMs. Optimal decoder implementations result when the decoder, excluding the predecoder, is implemented as a binary tree. We find that skewed circuit techniques with self resetting gates work the best and evaluate some simple sizing heuristics for low delay and power. We find that the heuristic of using equal fanouts of about 4 per stage works well even with interconnect in the decode path, provided the interconnect delay is reduced by wire sizing. For fast lower power solutions, the heuristic of reducing the sizes of the input stage in the higher levels of the decode tree allows for good trade-offs between delay and power. The key to low power operation in the SRAM data path is to reduce the signal swings on the high capacitance nodes like the bitlines and the data lines. Clocked voltage sense amplifiers are essential for obtaining low sensing power, and accurate generation of their sense clock is required for high speed operation. We investigate tracking circuits to limit bitline and I/O line swings and aid in the generation of the sense clock to enable clocked sense amplifiers. The tracking circuits essentially use a replica memory cell and a replica bitline to track the delay of the memory cell over a wide range of process and operating conditions. We present experimental results from two different prototypes. Finally we look at the scaling trends in the speed and power of SRAMs with size and technology and find that the SRAM delay scales as the logarithm of its size as long as the interconnect delay is negligible. Non-scaling of threshold mismatches with process scaling, causes the signal swings in the bitlines and data lines also not to scale, leading to an increase in the relative delay of an SRAM, across technology generations. The wire delay starts becoming important for SRAMs beyond the 1Mb generation. Across process shrinks, the wire delay becomes worse, and wire redesign has to be done to keep the wire delay in the same proportion to the gate delay. Hierarchical SRAM structures have enough space over the array for using fat wires, and these can be used to control the wire delay for 4Mb and smaller designs across process shrinks. %U ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/00/1636/CS-TR-00-1636.pdf %R CSL-TN-99-1 %Z Fri, 18 Feb 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Graph-Oriented Model for Articulation of Ontology Interdependencies %A Mitra, Prasenjit %A Wiederhold, Gio %A Kersten, Martin L. %D August 1999 %X Ontologies are knowledge structures to explicate the contents, essential properties, and relationships between terms in a knowledge source. Many sources are now accessible with associated ontologies. Most prior work on use of ontologies relies on the construction of a single global ontology covering all sources. Such an approach is not scalable and maintainable especially when the sources change frequently. We propose a scalable and easily maintainable approach based on the interoperation of ontologies. To handle user queries crossing the boundaries of the underlying information systems, the interoperation between the ontologies should be precisely defined. Our approach is to use rules that cross the semantic gap by creating an articulation or linkage between the systems. The rules are generated using a semi-automatic articulation tool with the help of a domain expert. To make the ontologies amenable for automatic composition based on the accumulated knowledge rules, we represent them using a graph-oriented model extended with a small algebraic operator set. ONION, a user-friendly toolkit, aids the experts in bridging the semantic gap in real-life settings. Our framework provides a sound foundation to simplify the work of domain experts, enables integration with public semantic dictionaries, like Wordnet, and will derive ODMG-compliant mediators automatically. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tn/99/1/CSL-TN-99-1.pdf %R CSL-TR-83-236 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of a high performance VLSI processor %A Hennessy, John L. %A Jouppi, Norman %A Przybylski, Steven %A Rowen, Christopher %A Gross, Thomas %D February 1983 %X Current VLSI fabrication technology makes it possible to design a 32-bit CPU on a single chip. However, to achieve high performance from that processor, the architecture and implementation must be carefully designed and tuned. The MIPS processor incorporates some new architectural ideas into a single-chip, nMOS implementation. Processor performance is obtained by the careful integration of the software (e.g., compilers), the architecture, and the hardware implementation. This integrated view also simplifies the design, making it practical to implement the processor at a university. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/236/CSL-TR-83-236.pdf %R CSL-TR-83-240 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T ADAM: an ADA-based language for multiprocessing %A Luckham, David C %A von Henke, Frederick W. %A Larsen, H. J. %A Stevenson, D. R. %D May 1983 %X Adam is a high level language for parallel processing. It is intended for programming resource scheduling applications, in particular supervisory packages for runtime scheduling of multiprocessing systems. An important design goal was to provide support for implementation of Ada and its runtime environment. Adam has been used to implement Ada task supervision and also as a high level target language for compilation of Ada tasking. Adam provides facilities that match the Ada sequential constructs (including subprograms, packages, exceptions, generics). In addition there are specialized module constructs for implementation of packages that may be shared between parallel processes. Adam omits the Ada real types but includes some new predefined types for scheduling. The parallel processing constructs of Adam are more primitive than Ada tasking. Some restrictions are enforced on the ways in which parallel processes can interact. A compiler for Adam has been implemented in MacLisp on DEC PDP-10 computers. Runtime support packages in Adam for scheduling (on a single CPU) and I/O are also provided. The compile contains a library manipulation facility for separate compilation. The Adam compiler has been used to build an Ada compiler for most of the July 1980 Ada language design including task types and rendezvous constructs. This was achieved by implementing algorithms translating Ada tasking into Adam parallel processing as a preprocessor to the Adam compiler. This present Ada compiler, which has been operational since December 1980, uses a procedure call implementation of tasking (due to Haberman and Nassi and to Stevenson). It can be easily modified to other implementations. Compilation of Ada tasking into a high level target language such as Adam facilitates studying questions of correctness and efficiency of various compilation algorithms, and code optimizations specific to tasking, e.g. elimination of unnecessary thread of control. This paper gives an overview of Adam and examples of its use. Emphasis is placed on the differences from Ada. Experience using Adam to build the experimental Ada system is evaluated. Design of runtime supervisors in Adam and algorithms for translating Ada tasking to Adam processing are discussed in detail. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/240/CSL-TR-83-240.pdf %R CSL-TR-83-242 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fault simulation using ADLIB-SABLE %A Ghosh, Sumit %A vanCleemput, Willem %D March 1983 %X This technical report presents work in the area of deductive fault simulation. This technique, one of the three fault simulation techniques discussed in the literature, has been implemented in ADLIB-SABLE, a hierarchical multi-level simulator designed and used at Stanford University. Most of the fault models illustrated in this report consider only two fault types: single stuck-at-0 and single stuck-at-Z (high impedance). Gate level fault models have been built for most commonly used gates. The ability to model the fault behavior of functional blocks in ADLIB-SABLE is also demonstrated. The motivation is that for many functional blocks, a gate level description may not be available or that the designer wishes to sacrifice detailed analysis for a higher simulation speed. Functional fault models are built for many commonly used blocks, using a decomposition technique. The ratio of functional fault simulation speed to gate level fault simulation speed has been observed to be of the order of 5 for the typical functional block sizes considered. The ratio however, is not the upper limit and will be larger for larger-sized functional blocks. It was also proved that the functional fault models are invariant with respect to the internal implementation details. A design discipline for sequential circuits is worked out which allows deductive fault simulation. Extensions to the simple (0,1) deductive techniques are studied and the fault models built in the extended domain are observed to be useful in modelling gates of some technologies. A comparison between deductive and concurrent fault simulation methods is given. Performance of deductive fault simulation, implemented in ADLIB-SABLE, shows that for sequential as well as combinational circuits, the CPU time increases linearly with increasing number of components simulated, an advantage over fault simulators which simulate one fault at a time and display a quadratic behavior. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/242/CSL-TR-83-242.pdf %R CSL-TR-83-244 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High speed image rasterization using a highly parallel smart bulk memory %D June 1983 %X VLSI technology allows the efficient realization of a class of highly parallel architectures consisting of high density semiconductor memory with an on-chip processor which accesses the memory in large sections simultaneously. A processor is described which uses this architecture to rasterize lines, polygons and text quickly, providing the rasterization support required in high performance graphic raster displays and fast page printers. This on-chip processor translates high-level low bandwidth commands into low-level high bandwidth actions on chip, where the high bandwidth can be tolerated. This architecture is capable of achieving performance comparable to the "processor per pixel" approaches while avoiding the tremendous density penalty incurred by such approaches. Consequently, it is practical to build a very high performance high resolution system from a small number of these chips. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/244/CSL-TR-83-244.pdf %R CSL-TR-83-245 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T EDT - a syntax-based program editor reference manual %A Finlayson, Ross S. %D July 1983 %X This report describes an experimental syntax-based editor that has recently been developed at Stanford. Syntax-based editors are unlike conventional text editors in that they use knowledge of the syntactic structure of the item (typically a program) being edited, to provide "high level" editing operations. The editor described in this report is currently being used as an editor for programs written in Ada. Other programming languages could also be handled, by replacing the appropriate language definition files by those for another language. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/245/CSL-TR-83-245.pdf %R CSL-TR-83-247 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Maintaining the time in a distributed system %A Marzullo, Keith %A Owicki, Susan %D August 1983 %X To a client, one of the simplest services provided by a distributed system is a time service. A client simply requests the time from any set of servers, and uses any reply. The simplicity in this interaction, however, misrepresents the complexity of implementing such a service. An algorithm is needed that will keep a set of clocks synchronized, reasonably correct and accurate with respect to a standard, and able to withstand errors such as communication failures and inaccurate clocks. This paper presents a partial solution to the problem by describing two algorithms which will keep clocks both correct and synchronized. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/247/CSL-TR-83-247.pdf %R CSL-TR-83-249 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Runtime and description of deadness errors in ADA tasking %A Helmbold, D. %A Luckham, David C. %D November 1983 %X A routine monitoring system for detecting and describing tasking errors in Ada programs is presented. Basic concepts for classifying tasking errors, called deadness errors, are defined. These concepts indicate which aspects of an Ada computation must be monitored in order to detect deadness errors resulting from attempts to rendezvous or terminate. They also provide a basis for the definition and proof of correct detection. Descriptions of deadness errors are given in terms of the basic concepts. The monitoring system has two parts: (1) a separately compiled runtime monitor that is added to any Ada source to be monitored, and (2) a preprocessor that transforms the Ada source so that necessary descriptive data is communicated to the monitor at runtime. Some basic preprocessing transformations and an abstract monitoring for a limited class of errors were previously presented. Here an Ada implementation of a monitor and a more extensive set of preprocessing transformations are described. This system provides an experimental automated tool for detecting deadness errors in Ada83 tasking and supplies useful diagnostics. The use of the runtime monitor for debugging and for programming evasive actions to avoid imminent errors is described and examples of experiments are given. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/249/CSL-TR-83-249.pdf %R CSL-TR-83-250 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Data buffers for execution architectures %A Alpert, Donald %D November 1983 %X Directly Executed Language (DEL) architectures are derived from idealized representations of high-level languages. DEL architectures show dramatic reduction in the number of instructions and memory references executed when compared to traditional architectures, offering the design considerations for the data buffer in a DEL microprocessor. Simulation techniques were used to evaluate the performance of different sized buffers for a set of Pascal test programs. The results show that a buffer with 256 words typically faults on less than 5% of storage allocations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/250/CSL-TR-83-250.pdf %R CSL-TR-83-251 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T GEM: a tool for concurrency specification and verification %A Lansky, Amy %A Owicki, Susan %D November 1983 %X The GEM model of concurrent computation is presented. Each GEM computation consists of a set of partially ordered events, and represents a particular concurrent execution. Language primitives for concurrency, code segments, as well as concurrency problems may be described as logic formulae (restrictions) on the domain of possible GEM computations. An event-oriented method of program verification is also presented. GEM is unique in its ability to easily describe and reason about synchronization properties. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/251/CSL-TR-83-251.pdf %R CSL-TR-83-253 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Evaluation of an interpreted architecture for Pascal on a personal computer %A Mitchell, Chad Leland %D December 1983 %X This report describes the design and implementation of an interpreter on a personal computer. The architecture interpreted was specifically designed for the execution of Pascal and belongs to the class of architecture known as Direct Correspondence Architectures. The evaluation of the interpreter provides information about the suitability of the host for this architecture and identifies features of the architecture which are not adequately supported by the host. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/83/253/CSL-TR-83-253.pdf %R CSL-TR-84-256 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Instruction selection by attributed parsing %A Ganapathi, Mahadevan %A Fischer, Charles N. %D February 1984 %X Affix grammars are used to describe the instruction-set of a target architecture for purposes of compiler code generation. A code generator is obtained automatically for a compiler using attributed parsing techniques. A compiler built on this model can automatically perform most popular machine-dependent optimizations, including peephole optimizations. Implementations of code generators based on this model exist for the VAX-11, iAPX-86, Z -8000, PDP-ll and IBM-370 architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/256/CSL-TR-84-256.pdf %R CSL-TR-84-257 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Reverse synthesis compilation for architectural research %A Ganapathi, Mahadevan %A Hennessy, John %A Sarkar, Vivek %D March 1984 %X This paper discusses the development of compilation strategies for DEL architectures and tools to assist in the evaluation of their efficiency. Compilation is divided into a series of independent simpler problems. To explore optimization of code for DEL compilers, two intermediate representations are employed. One of these representations is at a lower level than target machine instructions. Machine-independent optimization is performed on this intermediate representation. The other intermediate representation has been specifically designed for compiler retargetability It is at a higher level than the target machine. Target code generation is performed by reverse synthesis followed by attributed parsing. This technique demonstrates the feasibility of using automated table-driven code generation techniques for inflexible architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/257/CSL-TR-84-257.pdf %R CSL-TR-84-258 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A strongly typed language for specifying programs %A Henke, Friedrich W. von %D January 1984 %X A language for specifying and annotating programs is presented. The language is intended to be used in connection with a strongly typed programming language. It provides a framework for the definition of specification concepts and the specification of programs by means of assertions and annotations. The language includes facilities for defining concepts axiomatically and to group definitions of related concepts and derived properties (lemmas) in theories. All entities in the language are required to be strongly typed; however, the language provides a very flexible type system which includes polymorphic (or generic) types. The paper presents a type checking algorithm for the language and discusses the relationship between specification language and programming language. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/258/CSL-TR-84-258.pdf %R CSL-TR-84-261 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T ANNA: a language for annotating ADA programs %A Luckham, David C. %A Henke, Friedrich W. von %A Krieg-Brueckner, Bernd %A Owe, Olaf %D July 1984 %X ANNA is a proposed language extension of Ada to include facilities for formally specifying the intended behavior of Ada programs (or portions thereof) at all stages of program development. Anna programs are Ada programs extended by formal comments. Formal comments in ANNA consist of virtual Ada text and annotations. Anna provides annotations for all Ada constructs, including declarative annotations (for variables, subtypes, subprograms, and packages), statement annotations, annotations of generic units, exception annotations and visibility annotations. (The current Anna design does not include extensions for annotating Ada multi-tasking constructs.) Anna also includes a small number of new predefined attributes, which may appear only in annotations, e.g. the collection attribute of an access type. Since all Anna extensions appear as Ada comments, Anna programs are also legal Ada programs and acceptable by Ada translators. The semantics of annotations are defined in terms of Ada concepts; in particular, many kinds of annotations are generalizations of the Ada constraint concept. This simplifies the training of Ada programmers to use Anna for formal specification of Ada programs. Anna provides a formal framework within which different theories of formal specification may be applied to Ada. This manual also describes a translation of annotations into Ada text for run-time check of consistency with annotations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/261/CSL-TR-84-261.pdf %R CSL-TR-84-262 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T DEBUGGING ADA TASKING PROBLEMS %A Helmhold, David %A Luckham, David %D July 1984 %X A new class of errors, not found in sequential languages, can result when the tasking constructs of Ada are used. These errors are called deadness errors and arise when task communication fails. Since deadness errors often occur intermittently, they are particularly hard to detect and diagnose. Previous papers describe the theory and implementation of runtime monitors to detect deadness errors in tasking programs. The problems of detection and description of errors are different. Even when a dead state is detected, giving adequate diagnostics that enable the programmer to locate its cause in the Ada text is difficult. This paper discusses the use of simple diagnostic descriptions based on Ada tasking concepts. These diagnostics are implemented in an experimental runtime monitor. Similar facilities could be implemented in task debuggers in forthcoming Ada support environments. Their usefulness and shortcomings are illustrated in an example experiment with the runtime monitor. Possible future directions in task error monitoring and diagnosis based on formal specifications are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/262/CSL-TR-84-262.pdf %R CSL-TR-84-265 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An overview of ANNA - a specification language for ADA %A Luckham, David %A Henke, Friedrich W. von %D September 1984 %X A specification language permits information about various aspects of a program to be expressed in a precise machine processable form. This information is not normally part of the program itself. Specification languages are viewed as evolving from modern high level programming languages. The first step in this evolution is cautious extension of the programming language. Some of the features of Anna, a specification language extending Ada, are discussed. The extensions include generalizations of constructs (such as type constraints) that are already in Ada, and new constructs for specifying subprograms, packages, exceptions, and contexts. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/265/CSL-TR-84-265.pdf %R CSL-TR-84-259 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Organization and VLSI implementation of MIPS %A Przybylski, Steven A. %A Gross, Thomas R. %A Hennessy, John L. %A Jouppi, Norman P. %A Rowen, Christopher %D April 1984 %X MIPS is an 32-bit, high performance processor architecture implemented as an nMOS VLSI chip. The processor uses a low level, streamlined instruction set coupled with a fast pipeline to achieve an instruction rate of two million instructions per second. Close interaction between the processor design and compilers for the machine yields efficient execution of programs on the chip. Simplifying the instruction set and the requirements placed on the hardware by the architecture, facilitates both processor control and interrupt handling in the pipeline. High speed MOS circuit design techniques and a sophisticated timing methodology enable the processor to achieve a 250nS clock cycle. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/84/259/CSL-TR-84-259.pdf %R CSL-TR-85-270 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Model and Temporal Proof System for Networks of Processes %A Nguyen, Van %A Gries, David %A Owicki, Susan %D February 1985 %X A model and a sound and complete proof system for networks of processes in which component processes communicate exclusively through messages is given. The model, an extension of the trace model, can describe both synchronous and asynchronous networks. The proof system uses temporal-logic assertions on sequences of observations - a generalization of traces. The use of observations traces makes the proof system simple, compositional and modular, since internal details can be hidden. The expressive power of temporal logic makes it possible to prove temporal properties (safety, liveness, precedence, etc.) in the system. The proof system is language-independent and works for both synchronous and asynchronous networks. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/85/270/CSL-TR-85-270.pdf %R CSL-TR-86-289 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T MIPS-X instruction set and programmer's manual %A Chow, Paul %D May 1986 %X MIPS-X is a high performance second generation reduced instruction set microprocessor. This document describes the visible architecture of the machine, the basic timing of the instructions, and the instruction set. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/289/CSL-TR-86-289.pdf %R CSL-TR-86-298 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Parallel program behavior - specification and abstraction using BDL %A Yan, Jerry C. %D August 1986 %X This paper describes the syntax, semantics, and usage of BDL - a Behavior Description Language for concurrent programs. BDL program models can be used to describe and abstract the behavior of real programs formulated in various computation paradigms (such as CSP, remote procedures, data-flow, actors, etc.). BDL models are constructed from abstract computing entities known as "players". The models can behave as closely as possible to the actual program in terms of message passing, player creation and cpu usage. Although behavior abstraction using BDL only involves identifying the "redundant part" of the computation and replacing them with simple "NO-OP" statements, proper application of this technique remains difficult and requires a thorough understanding of how the program is architectured. Simulating BDL models is much more economical than instruction level emulation while program behavior is realistically preserved. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/298/CSL-TR-86-298.pdf %R CSL-TR-86-300 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An overview of the MIPS-X-MP project %A Hennessy, John L. %A Horowitz, Mark A. %D April 1986 %X MIPS-X-MP is a research project whose end goal is to build a small (workstation-sized) multiprocessor with a total throughput of 100-200 mips. The architectural approach uses a small number (tens) of high performance RISC-based microprocessors (10-20 mips each) The multiprocessor architecture uses software-controlled cache coherency to allow cooperation among processors without sacrificing performance of the processors. Software technology for automatically decomposing problems to allow the entire machine to be concentrated on a single problem is a key component of the research. This report surveys the four key components of the project: high performance VLSI processor architecture and design, multiprocessor architectural studies, multiprocessor programming systems, and optimizing compiler technology. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/300/CSL-TR-86-300.pdf %R CSL-TR-86-301 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The complete transformation methodology for sequential runtime checking of an ANNA subset %A Sankar, Sriram %A Rosenblum, David %D June 1986 %X We present in this report a complete description of a methodology for transformation of Anna (Annotated Ada) programs to executable self-checking Ada programs. The methodology covers a subset of Anna which allows annotation of scalar types and objects. The allowed annotations include subtype annotations, subprogram annotations, result annotations, object annotations, out annotations and statement annotations. Except for package state expressions and quantified expressions, the full expression language of Anna is allowed in the subset. The transformation of annotations to executable checking functions is thoroughly illustrated through informal textual description, universal checking function templates and several transformation examples. We also describe the transformer and related software tools used to transform Anna programs. In conclusion, we describe validation of the transformer and some methods of making the transformation and runtime checking processes more efficient. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/301/CSL-TR-86-301.pdf %R CSL-TR-86-303 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The semantics of timing constructs in hardware description languages %A Luckham, David C. %A Huh, Youm %A Stanculescu, Alec G. %D August 1986 %X Three different approaches to the representation of time in high level hardware design languages are described and compared. The first is the timed assignment statement of ADLIB/SABLE which anticipates future events. The second is the timed assignment of VHDL which predicts future events and allows predictions to be preempted by other predictions. The third is a new proposed method of expressing time dependency by qualifying expressions so that their values are required to be constant over a specified time interval. Examples comparing these three approaches are given. It is shown how time-qualified expressions could be introduced into a hardware description language. The possibility of proving correctness of hardware models in this language is illustrated. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/303/CSL-TR-86-303.pdf %R CSL-TR-86-306 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Queueing network models for parallel processing of task systems: an operational approach %A Mak, Victor W.K. %D September 1986 %X Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. A computationally efficient approximate solution method is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. Numerical results for a number of test cases are presented and compared to those of simulations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/306/CSL-TR-86-306.pdf %R CSL-TR-86-307 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A survey of concurrent architectures %A Mak, Victor W.K. %D September 1986 %X A survey of 18 different concurrent architectures is presented in this report. Although this is by no means complete, it does cover a wide spectrum of both commercial and research architectures. A scheme is proposed to describe concurrent architectures using different dimensions: models of computation, interconnection network, processing element, memory system, and application areas. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/307/CSL-TR-86-307.pdf %R CSL-TR-86-309 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of testbed and emulation tools %A Lundstrom, Stephen F. %A Flynn, Michael J. %D September 1986 %X The research summarized in this report was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software-based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/309/CSL-TR-86-309.pdf %R CSL-TR-86-310 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Dynamic resource allocation in a hierarchical multiprocessor system - a preliminary report %D October 1986 %X In this report, an integrated system approach to dynamic resource allocation is proposed. Some of the problems in dynamic resource allocation and the relationship of these problems to system structures are examined. A general dynamic resource allocation scheme is presented. A hierarchical system architecture which dynamically maps between processor structure and programs at multiple levels of instantiations is described. Simulation experiments have been conducted to study dynamic resource allocation on the proposed system. Preliminary evaluation based on simple dynamic resource allocation algorithms indicates that with the proposed system approach, the complexity of dynamic resource management could be significantly reduced while achieving reasonably effective dynamic resource allocation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/310/CSL-TR-86-310.pdf %R CSL-TR-87-314 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Post-game analysis -- an initial experiment for heuristic-based resource management in concurrent systems %A Yan, Jerry C %D February 1987 %X In concurrent systems, a major responsibility of the resource management system is to decide how the application program is to be mapped onto the multi-processor. Instead of using abstract program and machine models, a generate-and-test framework known as "post-game analysis" that is based on data gathered during program execution is proposed. Each iteration consists of (i) (a simulation of) an execution of the program; (ii) analysis of the data gathered; and (iii) the proposal of a new mapping that would have a smaller execution time. These heuristics are applied to predict execution time changes in response to small perturbations applied to the current mapping. An initial experiment was carried out using simple strategies on "pipeline-like" applications. The results obtained from four simple strategies demonstrated that for this kind of application, even simple strategies can produce acceptable speed-up with a small number of iterations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/314/CSL-TR-87-314.pdf %R CSL-TR-87-326 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SRT division diagrams and their usage in designing intergrated circuits for division %A Williams, Ted E. %A Horowitz, Mark %D November 1986 %X This paper describes the construction and analysis of several diagrams which depict SRT division algorithms. These diagrams yield insight into the operation of the algorithms and the many implementation tradeoffs available in custom circuit design. Examples of simple low radix diagrams are shown, as well as tables for higher radices. The tables were generated by a program which can create and verify the diagrams for different division schemes. Also discussed is a custom CMOS integrated circuit designed which performs SRT division using self-timed circuit techniques. This chip implements an intermediate approach between a fully combinational array and a fully iterative in time method in order to get both speed and small silicon area. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/326/CSL-TR-87-326.pdf %R CSL-TR-87-333 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Managing and measuring two parallel programs on a multiprocessor %A Yan, Jerry C %D June 1987 %X Research is being conducted to determine how distributed computations can be mapped onto multiprocessors so as to minimize execution time. Instead of employing optimization techniques based on some abstract program/machine models, the approach being investigated here (called "post-game analysis") is based on placement heuristics which utilizes program execution history. Although initial experiments have demonstrated that "post-game analysis" indeed discovered mappings that exhibit significantly shorter execution times than the worst cases for the programs tested, three important issues remain to be addressed: i) the need to evaluate the performance of placement heuristics against the "optimal" speed-up attainable, ii) to find evidence to help explain why these heuristics work and iii) to develop better heuristics by understanding how and why the basic set performed well. Parallel program execution was simulated using "Axe" -- an integrated environment for computation model description, processor architecture specification, discrete-time simulation and automated data collection. Five groups of parameters are measured representing different aspects in the concurrent execution environment: (i) overall measurements, (ii) communication parameters, (iii) cpu utilization, (iv) cpu contention and (v) dependencies between players. Two programs were simulated -- a "pipe-line" of players and a "divide-and-conquer" program skeleton. The results showed that program execution time indeed correlated well with some of the parameters measured. It was also shown that "post-game" analysis achieved close to 96% optimal speed-up for both programs in most cases. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/333/CSL-TR-87-333.pdf %R CSL-TR-87-335 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Allocations of Objects Considered as Nondeterministic Expressions - Towards a More Abstract Axiomatics of Access Types %A Meldal, Sigurd %D September 1987 %X The concept of access ("reference" or "pointer") values is formalized as parametrized abstract data types, using the axiomatic method of Guttag and Horning as extended by Owe. Two formalizations are given. The first is a formalization of the approach used in the definition of a partial correctness system for Pascal by Hoare and Wirth. Its lack of abstraction is pointed out. This is caused by the annotation language being too expressive. An approach is taken which results in a more abstract system: The expressiveness of the annotation language is reduced and the allocation operator is viewed as a nondeterministic expression. This reinterpretation of the program language results in an appropriate level of abstraction of the proof system. An example is given, verification of a package defining a set type. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/335/CSL-TR-87-335.pdf %R CSL-TR-87-337 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of testbed and emulation tools %A Flynn, Michael J. %A Lundstrom, Stephen %D October 1987 %X In order to understand how to predict the performance of concurrent computing systems, an experimental environment is needed. The purpose of the research conducted under the grant was to investigate various aspects of this environment. A first performance prediction system was developed and evaluated (by comparison both with simulations and with actual systems). The creation of a second, complementary system is well underway. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/337/CSL-TR-87-337.pdf %R CSL-TR-87-339 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T MIPS-X: the external interface %A Salz, Arturo %A Agarwal, Anant %A Chow, Paul %D November 1987 %X MIPS-X is a 20-MIPS-peak VLSI processor designed at Stanford University. This document describes the external interface of MIPS-X and the organization of the MIPS-X processor system, including the external cache and coprocessors. The external interface has been designed to optimize the paths between the processor, the external cache and the coprocessors. The signals used by the processor and their timing are documented here. Signal use and timings during exceptions and cache misses are also shown. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/339/CSL-TR-87-339.pdf %R CSL-TR-87-342 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Interprocedural analysis useless for code optimization %A Richardson, S. %A Ganapathi, M. %D November 1987 %X The problem of tracking data flow across procedure boundaries has a long history of theoretical study by people who believed that such information would be useful for code optimization. Building upon previous work, we have implemented an algorithm for interprocedural data flow analysis. The algorithm produces three flow-insensitive summary sets: MOD, USE, and ALIASES. The utility of the resulting information was investigated using an optimizing Pascal compiler. Over a sampling of 27 benchmarks, we found that additional optimizations performed as a result of interprocedural summary information contributed almost nothing to program execution speed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/342/CSL-TR-87-342.pdf %R CSL-TR-87-338 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Sparse, distributed memory prototpe: principles of operation %A Flynn, Michael J. %A Kanerva, Pentti %A Ahanin, Bahram %A Flaherty, Paul A. %A Hickey, Philip %A Bhadkamkar, Neal A. %D February 1988 %X Sparse distributed memory is a generalized random-access memory (RAM) for long (e.g., 1,000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech and scene analysis, in signal detection and verification, and in adaptive control of automated equipment---in general, in dealing with real-world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. This research project is aimed at resolving major design issues that have to be faced in building the memories. This report describes the design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/87/338/CSL-TR-87-338.pdf %R CSL-TR-88-347 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Trace compaction using cache filtering with blocking %A Agarwal, Anant %D December 1987 %X Trace-driven simulation is a popular method of estimating the performance of cache memories, translation lookaside buffers, and paging schemes. Because the cost of trace-driven simulation is directly proportional to trace length, reducing the number of references in the trace significantly impacts simulation time. This paper concentrates on trace-driven simulation for cache analysis. A technique called cache filtering with blocking is presented that compresses traces by exploiting both the temporal and spatial locality in the trace. Experimental results show that this scheme can reduce trace length by nearly two orders of magnitude while introducing less than 15% error in cache miss rate estimates. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/347/CSL-TR-88-347.pdf %R CSL-TR-88-348 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Thor user's manual: tutorial and commands %A Alverson, Robert %A Blank, Tom %A Choi, Kiyoung %A Hwang, Sun Young %A Salz, Arturo %A Soule, Larry %A Rokicki, Thomas %D January 1988 %X THOR is a behavioral simulation environment intended for use with digital circuits at either the gate, register transfer, or functional levels. Models are written in the CHDL modeling language (a hardware description language based on the C programming language.) Network descriptions are written in the CSL language supporting hierarchical network descriptions. Using interactive mode, batch mode, or both combined, a variety of commands are available to control execution. Simulation output can be viewed in tabular format or in waveforms. A library of components and a toolbox for building simulation models are also provided. Other tools include CSLIM, used to generate boolean equations directly from THOR models and an interface to other simulators (e.g. RSIM and a physical chip tester) so that two simulations can be run concurrently verifying equivalent operation. This technical report is part one of two parts and is formatted similar to UNIX manuals. Part one contains the THOR tutorial and all the commands associated with THOR. Part two contains descriptions of the general purpose functions used in models, the parts library including many TTL components, and the logic analyzer model. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/348/CSL-TR-88-348.pdf %R CSL-TR-88-349 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Thor user's manual: library functions %A Alverson, Robert %A Blank, Tom %A Choi, Kiyoung %A Hwang, Sun Young %A Salz, Arturo %A Soule, Larry %A Rokicki, Thomas %D January 1988 %X THOR is a behavioral simulation environment intended for use with digital circuits at either the gate, register transfer, or functional levels. Models are written in the CHDL modeling language (a hardware description language based on the "C" programming language). Network descriptions are written in the CSL language supporting hierarchical network descriptions. Using interactive mode, batch mode or both combined, a variety of commands are available to control execution. Simulation output can be viewed in tabular format or in waveforms. A library of components and a toolbox for building simulation models are also provided. Other tools include CSLIM, used to generate boolean equations directly from THOR models and an interface to other simulators (e.g. RSIM and a physical chip tester) so that two simulations can be run concurrently verifying equivalent operation. This technical report is part one of two parts and is formatted similar to UNIX manuals. Part one contains the THOR tutorial and all the commands associated with THOR. Part two contains descriptions of the general purpose functions used in models, the parts library including many TTL components, and the logic analyzer model. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/349/CSL-TR-88-349.pdf %R CSL-TR-88-350 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The ILSP behavioral description language and its graph representation for behavioral synthesis %A Odani, Masayasu %A Hwang, Sun Young %A Blank, Tom %A Rokicki, Thomas %D March 1988 %X This report describes the ILSP behavioral description language and its internal representation employed in the Hermod behavioral synthesis system. Using combined control and data flow graph C/DFG as an intermediate representation, the Hermod system generates hardware modules and their interconnections from behavioral descriptions. The Hermod system is included in an integrated environment for hardware simulation and synthesis under development at Stanford University. The functional models written in the ILSP can be simulated on the THOR logic/functional/behavioral simulator without translation. After proper verification of its behavior, an ILSP model can be input to the synthesizer for compilation into an RT-level description. This report consists of two parts: the specification of the ILSP language and its graph representation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/350/CSL-TR-88-350.pdf %R CSL-TR-88-355 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Introductory user's guide to the architect's workbench tools %A Torrellas, Josep %A Bray, Brian %A Cuderman, Kathy %A Goldschmidt, Stephen %A Kobrin, Alan %A Z immerman, Andrew %D May 1988 %X The Architect's Workbench is a set of simulation tools to provide insight on how the instruction set and the organization of registers and cache affect processor-memory traffic and, as a result, processor performance. This report is designed to be an introductory guide to the tools for the novice user. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/355/CSL-TR-88-355.pdf %R CSL-TR-88-358 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Interviews: A C++ graphical interface toolkit %A Linton, Mark A. %A Calder, Paul R. %A Vlissides, John M. %D July 1988 %X We have implemented an object-oriented user interface package, called InterViews, that supports the composition of a graphical user interface from a set of interactive objects. The base class for interactive objects, called an interactor, and base class for composite objects, called a scene, define a protocol for combining interactive behaviors. Subclasses of scenes define common types of composition: a box tiles its components, a tray allows components to overlap or constrain each other's placement, a deck stacks its components so that only one is visible, a frame adds a border, and a viewport shows part of a component. Predefined components include menus, scrollers, buttons, and text editors. InterViews also includes classes for structured text and graphics. InterViews is written in C++ and runs on top of the X window system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/358/CSL-TR-88-358.pdf %R CSL-TR-88-364 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Applying object-oriented design to structured graphics %A Vlissides, John M. %A Linton, Mark A. %D August 1988 %X Structured graphics are useful for building applications that use a direct manipulation metaphor. Object-oriented languages offer inheritance, encapsulation, and runtime binding of operations to objects. Unfortunately, standard structured graphics packages do not use an object-oriented model, and object-oriented systems do not provide general-purpose structured graphics, relying instead on low-level graphics primitives. An object-oriented approach to structured graphics can give application programmers the benefits of both paradigms. We have implemented a two-dimensional structured graphics library in C++ that presents an object-oriented model to the programmer. The graphic class defines a general graphical object from which all others are derived. The picture subclass supports hierarchical composition of graphics. Programmers can define new graphical objects either statically by subclassing or dynamically by composing instances of existing classes. We have used both this library and an earlier, non-object-oriented library to implement a MacDraw-like drawing editor. We discuss the fundamentals of the object-oriented design and its advantages based on our experiences with both libraries. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/364/CSL-TR-88-364.pdf %R CSL-TR-88-367 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An overview of VAL %A Augustin, Larry M. %A Gennart, Benoit A. %A Huh, Youm %A Luckham, David C. %A Stanculescu, Alec G. %D October 1988 %X VAL (VHDL Annotation Language) provides a small number of new language constructs to annotate VHDL hardware descriptions. VAL annotations, added to the VHDL entity declaration in the form of formal comments, express intended behavior common to all architectural bodies of the entity. Annotations are expressed as parallel processes that accept streams of input signals and generate constraints on output streams. VAL views signals as streams of values ordered by time. Generalized timing expressions allow the designer to refer to relative points on a stream. No concept of preemptive delayed assignment or inertial delay are needed when referring to different relative points in time on a stream. The VAL abstract state model permits abstract data types to be used in specifying history dependent device behavior. Annotations placed inside a VHDL architecture define detailed correspondences between the behavior specification and architecture. The result is a simple but expressive language extension of VHDL with possible applications to automatic checking of VHDL simulations, hierarchical design, and automatic verification of hardware designs in VHDL. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/367/CSL-TR-88-367.pdf %R CSL-TR-88-369 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Composing user interfaces with interviews %A Linton, Mark A. %A Vlissides, John M. %A Calder, Paul R. %D November 1988 %X In this paper we show how to compose user interfaces with InterViews, a user interface toolkit we have developed at Stanford. InterViews provides a library of predefined objects and a set of protocols for composing them. A user interface is created by composing simple primitives in a hierarchical fashion, allowing complex user interfaces to be implemented easily. InterViews supports the composition of interactive objects (such as scroll bars and menus), text objects such as words and whitespace, and graphics objects such as circles and polygons. To illustrate how InterViews composition mechanisms facilitate the implementation of user interfaces, we present three simple applications: a dialog box built from interactive objects, a drawing editor using a hierarchy of graphical objects, and a class browser using a hierarchy of text objects. We also describe how InterViews supports consistency across applications as well as end-user customization. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/369/CSL-TR-88-369.pdf %R CSL-TR-88-373 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Sparse distributed memory prototype: address module hardware guide %A Flynn, M. J. %A Z eidman, R. %A Lochner, E. %D December 1988 %X This document is a detailed specification of the hardware design of the Address Module for the prototype Sparse Distributed Memory. It contains all of the information needed to build, test, debug, modify and operate the Address Module. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/88/373/CSL-TR-88-373.pdf %R CSL-TR-89-378 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Analysis of Parallelism and Deadlocks in Distributed-Time Logic Simulation %A Soule, Larry %A Gupta, Anoop %D May 1989 %X This paper explores the suitability of the Chandy-Misra algorithm for digital logic simulation. We use four realistic circuits as benchmarks for our analysis, with one of them being the vector-unit controller for the Titan supercomputer from Ardent. Our results show that the average number of logic elements available for concurrent execution ranges from 10 to 111 for the four circuits, with an overall average of 68. Although this is twice as much parallelism as that obtained by traditional event-driven algorithms for these circuits, we feel it is still too low. One major factor limiting concurrency is the large number of global synchronization points --- "deadlocks" in the Chandy-Misra terminology --- that occur during execution. Towards the goal of reducing the number of deadlocks, the paper presents a classification of the types of deadlocks that occur during digital logic simulation. Four different types are identified and described intuitively in terms of circuit structure. Using domain specific knowledge, the paper proposes methods for reducing these deadlock occurrences. For one of the benchmark circuits, the use of the proposed techniques eliminated all deadlocks and increased the average parallelism from 40 to 160. We believe that the use of such domain knowledge will make the Chandy-Misra algorithm significantly more effective than it would be in its generic form. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/378/CSL-TR-89-378.pdf %R CSL-TR-89-379 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Two Dimensional Pinpointing: An Application of Formal Specification to Debugging Packages %A Luckham, David %D April 1989 %X New methods of testing and debugging software utilizing high-level formal specifications are presented. These methods require a new generation of support tools. Such tools must be capable of automatically comparing the runtime behavior of hierarchically structured software with high-level specifications; they must provide information about inconsistencies in terms of abstractions used in specifications. This use of specifications has several advantages over present-day debugging methods: (1) the debugging problem itself is precisely defined by specifications; (2) violations of specifications are detected automatically, thus eliminating the need to search output traces and recognize errors manually; (3) complex tests, such as tests for side-effects on global data, can be made easily; (4) the new methods are independent of any compiler and runtime environment for a programming language; (5) they apply generally to hierarchically structured software --- e.g., packages containing nested units, (6) they also apply to other life-cycle processes such as analysis of prototypes, and the use of prototypes to build formal specifications. In this paper a particular process for locating errors in software packages, called two dimensional pinpointing, is described. Tests consist of sequences of package operations (first dimension). Specifications at the highest (most abstract) level are checked first. If violations occur then new specifications are added if possible, otherwise checking of specifications at the next lower level (second dimension) is activated. Violation of a new specification provides more information about the error which reduces the region of program text under suspicion. All interaction between programmer and toolset is phrased in terms of the concepts used to specify the program. Two dimensional pinpointing is presented using the Anna specification language for Ada programs. Anna and a toolset for comparing behavior of Ada programs with Anna specifications is described. Pinpointing techniques are then illustrated by examples. The examples involve debugging of Ada packages, for which Anna provides a rich set of specification constructs. The Anna toolset supports use of the methodology on the full Ada/Anna languages, and is being engineered to commercial standards. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/379/CSL-TR-89-379.pdf %R CSL-TR-89-380 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Unidraw: A Framework for Building Domain-Specific Graphical Editors %A Vlissides, John M. %A Linton, Mark A. %D July 1989 %X Unidraw is a framework for creating object-oriented graphical editors in domains such as technical and artistic drawing, music composition, and CAD. The Unidraw architecture simplifies the construction of these editors by providing programming abstractions that are common across domains. Unidraw defines four basic abstractions: components encapsulate the appearance and semantics of objects in a domain, tools support direct manipulation of components, commands define operations on components and other objects, and external representations define the mapping between components and the file format generated by the editor. Unidraw also supports multiple views, graphical connectivity and confinement, and dataflow between components. This paper describes the Unidraw design, implementation issues, and three prototype domain-specific editors we have developed with Unidraw: a drawing editor, a user interface builder, and a schematic capture system. Experience indicates a substantial reduction in implementation time compared with existing tools. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/380/CSL-TR-89-380.pdf %R CSL-TR-89-387 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Specification and automatic verification of self-timed queues %A Dill, David L. %A Nowick, Steven M. %A Sproull, Robert F. %D August 1989 %X Speed-independent circuit design is of increasing interest because of global timing problems in VLSI. Unfortunately, speed-independent design is very subtle. We propose the use of state-machine verification tools to ameliorate this problem. This paper illustrates issues in the modelling, specification, and verification of speed-independent circuits through consideration of self-timed queues. User-level specifications are given as Petri nets, which are translated into trace structures for automatic processing. Three different implementations of queues are considered: a chain of queue cells, two parallel chains, and "circular buffer" example using a separate RAM. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/387/CSL-TR-89-387.pdf %R CSL-TR-89-390 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T VAL to VHDL transformer: an implementation guide %A Augustin, Larry M. %A Gennart, Benoit A. %A Huh, Youm %A Luckham, David C. %A Sahai, Bob %A Stanculescu, Alec G. %D September 1989 %X This report presents one implementation of the VAL semantics. It is based on a transformation from VAL annotated VHDL to self-checking VHDL that is equivalent to the original source from the simulation semantics standpoint. The transformation is performed as a sequence of tree to tree transformations. The report describes the semantic preserving transformations, as well as the structure of the transformer. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/390/CSL-TR-89-390.pdf %R CSL-TR-89-395 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of Run Time Monitors for Concurrent Programs %A Helmbold, David %A Bryan, Doug %D October 1989 %X We address the problem of correctly monitoring the run time behavior of a concurrent program. We view a program as having three (potentially different) sets of behavior: computations of the original program when monitoring is not performed, computations after the monitor is added to the program, and "observations'' produced by the monitor. Using these sets of behaviors, we define four properties of monitor systems: non-interference, safety, accuracy and correctness. We define both a minimal level and a total level for each of these properties. The non-interference and safety properties address the degree to which the presence of the monitor alters a computation (the differences between the first two sets of computations). Accuracy is a relationship between a monitored computation and the observation of the computation produced by the monitor. Correctness is a relationship between observations and the unmonitored computations. A run time monitor for TSL-1 and Ada has been implemented. This monitor system uses two techniques for constructing the observation. We show that any monitoring system using these two techniques is at least minimally correct, from which the (minimal) correctness of the TSL-1 monitor follows. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/395/CSL-TR-89-395.pdf %R CSL-TR-89-396 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T COOL: a language for parallel programming %A Chandra, Rohit %A Gupta, Anoop %A Hennessy., John L. %D October 1989 %X We present COOL, an object-oriented parallel language derived from C++ by adding constructs to specify concurrent execution. We describe the language design, and the facilities for creating parallelism, performing synchronization, and communicating. The parallel construct is parallel functions that execute asynchronously. Synchronization support includes mutex functions and future types. A shared-memory model is assumed for parallel execution, and all communication is through shared-memory. The parallel programming model of COOL has proved useful in several small programs that we have attempted. We present some examples and discuss the primary implementation issues. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/396/CSL-TR-89-396.pdf %R CSL-TR-89-398 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The relative effects of optimization on instruction architecture performance %A Cuderman, K. J. %A Flynn, M. J. %D October 1989 %X The Stanford Architect's Workbench is a simulation platform used to evaluate the impact of optimization on the relative performance of instruction set architectures. The total impact optimization makes on an application is the combined interaction of the optimizer, the architecture, and the cache configuration. The relative performance of seven architectures are compared using a suite of six application programs. Optimization reduces the number of executed instructions, but its effectiveness varies with architecture. Register architectures capitalize on temporaries introduced by optimization without incurring penalties for moving data. Short instructions for register operations reduce the instruction bandwidth in addition to reducing the number of instructions. Reducing the number of executed instructions does not yield a reduction in memory traffic. Optimization only slightly alters the program working set size. An instruction cache quickly masks the effect of optimization. The result is that the instruction memory traffic remains almost constant for an application. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/398/CSL-TR-89-398.pdf %R CSL-TR-89-400 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Sparse distributed memory prototype: principles and operation %A Flynn, Michael J. %A Kanerva, Pentti %A Bhadkamkar, Neil %D December 1989 %X Sparse distributed memory is a generalized random-access memory (RAM) for long (e.g., 1,000 bit) binary words. Such words can be written into and read from the memory, and they can also be used to address the memory. The main attribute of the memory is sensitivity to similarity, meaning that a word can be read back not only by giving the original write address but also by giving one close to it as measured by the Hamming distance between addresses. Large memories of this kind are expected to have wide use in speech recognition and scene analysis, in signal detection and verification, and in adaptive control of automated equipment---in general, in dealing with real-world information in real time. The memory can be realized as a simple, massively parallel computer. Digital technology has reached a point where building large memories is becoming practical. This research project is aimed at resolving major design issues that have to be faced in building the memories.This report describes the design of a prototype memory with 256-bit addresses and from 8K to 128K locations for 256-bit words. A key aspect of the design is extensive use of dynamic RAM and other standard components. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/400/CSL-TR-89-400.pdf %R CSL-TR-89-383 %Z Thu, 29 Apr 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Super-Scalar Processor Design %A Johnson, William M. %D June 1989 %X A super-scalar processor is one that is capable of sustaining an instruction-execution rate of more than one instruction per clock cycle. Maintaining this execution rate is primarily a problem of scheduling processor resources (such as functional units) for high utilization. A number of scheduling algorithms have been published, with wide-ranging claims of performance over the single-instruction issue of a scalar processor. However, a number of these claims are based on idealizations or on special-purpose applications. This study uses trace-driven simulation to evaluate many different super-scalar hardware organizations. Super-scalar performance is limited primarily by instruction-fetch inefficiencies caused by both branch delays and instruction misalignment. Because of this instruction-fetch limitation, it is not worthwhile to explore highly-concurrent execution hardware. Rather, it is more appropriate to explore economical execution hardware that more closely matches the instruction throughput provided by the instruction fetcher. This study examines techniques for reducing the instruction-fetch inefficiencies and explores the resulting hardware organizations. This study concludes that a super-scalar processor can have nearly twice the performance of a scalar processor, but that this requires that four major hardware features: out-of-order execution, register renaming, branch prediction, and a four-instruction decoder. These features are interdependent, and removing any single feature reduces average performance by 18% or more. However, there are many hardware simplifications that cause only a small performance reduction. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/383/CSL-TR-89-383.pdf %R CSL-TR-89-397 %Z Wed, 05 May 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design and Clocking of VLSI Multipliers %A Santoro, Mark Ronald %D October 1989 %X This thesis presents a versatile new multiplier architecture, which can provide better performance than conventional linear arry multipliers at a fraction of the silicon area. The high performance is obtained by using a new binary tree structure, the 4-2 tree. The 4-2 tree is symmetric and far more regular than other multiplier trees while offering comparable performance, making it better suited for VLSI implementations. To reduce area, a partial, pipelined 4-2 tree is used with a 4-2 carry-save accumulator placed at its outputs to iteratively sum the partial products as they are generated. Maximum performance is obtained by accurately matching the iterative clock to the pipeline rate of the 4-2 tree, using a stoppable on-chip clock generator. To prove the new architecture a test chip, called SPIM, was fabricated in a 1.6 (Mu)m CMOS process. SPIM contains 41,000 transistors with an array size of 2.9 X 5.3 mm. Running at an internal clock frequency of 85 MHz, SPIM performs the 64 bit mantissa portion of a double extended precision floating-point multiply in under 120 ns. To make the new architecture commercially interesting, several high-performance rounding algorithms compatible with IEEE standard 754 for binary floating-point arithmetic have also been developed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/89/397/CSL-TR-89-397.pdf %R CSL-TR-90-410 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Tango introduction and tutorial %A Goldschmidt, Stephen R. %A Davis, Helen %D January 1990 %X Tango is a software-based multiprocessor simulator that can generate traces of synchronization events and data references. The system runs on a uniprocessor and provides a simulated multiprocessor environment. The user code is augmented during compilation to produce a compiled simulation system with optional logging. Tango offers flexible and accurate tracing by allowing the user to incorporate various memory and synchronization models. Tango achieves high efficiency by running compiled user code, by focusing on information that is of specific interest to multiprocessing studies and by allowing the user to select the most efficient memory simulation that is appropriate for a set of experiments. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/410/CSL-TR-90-410.pdf %R CSL-TR-90-411 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Branch strategies: modeling and optimization %A Dubey, Pradeep K. %A Flynn, Michael J. %D February 1990 %X Instruction dependency introduced by conditional branch instructions, which is resolved only at run-time, can have severe performance impact on pipelined machines. A variety of strategies are in wide use to minimize this impact. Additional instruction traffic generated by these branch strategies can also have an adverse effect on the system performance. Therefore, in addition to the likely reduction a branch prediction strategy offers in average branch delay, resulting excess i-traffic can be an important parameter in evaluating its overall effectiveness. The objective of this paper is twofold: to develop a model for different approaches to the branch problem and to help select an optimal strategy after taking into account the additional i-traffic generated by the i-buffering. The model presented provides a flexible tool for comparing different branch strategies in terms of the reduction it offers in average branch delay and also in terms of the associated cost of wasted instruction fetches. This additional criterion turns out to be a valuable consideration in choosing between two almost equally performing strategies. More importantly, it provides a better insight into the expected overall system performance. Simple compiler-support-based low implementation-cost strategies can be very effective under certain conditions. An active branch prediction scheme based on loop buffer can be as competitive as a branch-target-buffer based strategy. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/411/CSL-TR-90-411.pdf %R CSL-TR-90-413 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An area-utility model for on-chip memories and its application %A Mulder, Johannes M. %A Quach, Nhon T. %A Flynn, Michael J. %D February 1990 %X Utility can be defined as quality per unit of cost. The utility of a particular function in a microprocessor can be defined as its contribution to the overall processor performance per unit of implementation cost. In the case of on-chip data memory (e.g., registers, caches) the performance contribution can be reduced to its effectiveness in reducing memory traffic or in reducing the average time to fetch operands. An important cost measure for on-chip memory is occupied area. On-chip memory performance, however, is expressed much more easily as a function of size (the storage capacity) than as a function of area. Simple models have been proposed for mapping memory size to occupied area. These models, however, are of unproven validity and only apply when comparing relatively large buffers (³ 128 words for caches, ³ 32 words for register sets) of the same structure (e.g., cache versus cache). In this paper we present an area model for on-chip memories. The area model considers the supplied bandwidth of the individual memory cells and includes such overhead as control logic, driver logic, and tag storage, thereby permitting comparison of data buffers of different organizations and of arbitrary sizes. The model gave less than 10% error when verified against real caches and register files. Using this area-utility measure F(Performance,Area), we first investigated the performance of various cache organizations and then compared the performance of register buffers (e.g., register sets, multiple overlapping sets) and on-chip caches. Comparing cache performance as a function of area, rather than size, leads to a significantly different set of organizational tradeoffs. Caches occupy more area per bit than register buffers for sizes of 128 words or less. For data caches, line size is a primary determinant of performance for small sizes while write policy becomes the primary factor for larger caches. For the same area, multiple register sets have poorer performance than a single register set with cache except when the memory access time is very fast (under 3 processor cycles). %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/413/CSL-TR-90-413.pdf %R CSL-TR-90-415 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High-speed addition in CMOS %A Quach, Nhon T. %A Flynn, Michael J. %D February 1990 %X This paper describes a fully static Complementary Metal-Oxide Semiconductor (CMOS) implementation of a Ling type adder. The implementation described herein saves up to one gate delay and always reduces the number of serial transistors in the worst-case (critical) path over the conventional carry look-ahead (CLA) approach with a negligible increase in hardware. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/415/CSL-TR-90-415.pdf %R CSL-TR-90-418 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Runtime Access to Type Information in C++ %A Interrante, John A. %A Linton, Mark A. %D March 1990 %X The C++ language currently does not provide a mechanism for an object to determine its type at runtime. We propose the Dossier class as a standard interface for accessing type information from within a C++ program. We have implemented a tool called mkdossier that automatically generates type information in a form that can be compiled and linked with an application. In the prototype implementation, a class must have a virtual function to access an object's dossier given the object. We propose this access be provided implicitly by the language through a predefined member in all classes. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/418/CSL-TR-90-418.pdf %R CSL-TR-90-419 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T HardwareC -- A Language for Hardware Design (Version 2.0) %A Ku, David %A DeMicheli, Giovanni %D April 1990 %X High-level synthesis is the transformation from a behavioral level specification of hardware, through a series of optimizations and translations, to an implementation in terms of logic gates and registers. The success of a high-level synthesis system is heavily dependent on how effectively the high-level language captures the ideas of the designer in a simple and understandable way. Furthermore, as system-level issues such as communication protocols and design partitioning dominate the design process, the ability to specify constraints on the timing requirements and resource utilization of a design is necessary to ensure that the design can integrate with the rest of the system. In this paper, a hardware description language called HardwareC is presented. HardwareC supports both declarative and procedural semantics, has a C-like syntax, and is extended with notion of concurrent processes, message passing, timing constraints via tagging, resource constraints, explicit instantiation of models, and template models. The language is used as the input to the Hercules High-level Synthesis System. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/419/CSL-TR-90-419.pdf %R CSL-TR-90-423 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Implementing a Directory-Based Cache Consistency Protocol %A Simoni, Richard %D March 1990 %X Directory-based cache consistency protocols have the potential to allow shared-memory multiprocessors to scale to a large number of processors. While many variations of these coherence schemes exist in the literature, they have typically been described at a rather high level, making adequate evaluation difficult. This paper explores the implementation issues of directory-based coherency strategies by developing a design at the level of detail needed to write a memory system functional simulator with an accurate timing model. The paper presents the design of both an invalidation coherency protocol and the associated directory/memory hardware. Support is added to prevent deadlock, handle subtle consistency situations, and implement a proper programming model of multiprocess execution. Extensions are delineated for realizing a multiple-threaded directory that can continue to process commands while waiting for a reply from a cache. The final hardware design is evaluated in the context of the number of parts required for implementation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/423/CSL-TR-90-423.pdf %R CSL-TR-90-425 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Concurrent runtime monitoring of formally specified programs %A Mandal, Manas %A Sankar, Sriram %D April 1990 %X This paper describes an application of formal specifications after an executable program has been constructed. We describe how high level specifications can be utilized to monitor critical aspects of the behavior of a program continuously while it is executing. This methodology provides a capability to distribute the monitoring of specifications on multi-processor hardware platforms to meet practical time constraints. Typically, runtime checking of formal specifications involves a significant time penalty which makes it impractical during normal production operation of a program. In previous research, runtime checking has been applied during testing and debugging of software, but not on a permanent basis. Crucial to our current methodology is the use of multi-processor machines - hence runtime monitoring can be performed concurrently on different processors. We describe techniques for distributing checks onto different processors. To control the degree of concurrency, we introduce checkpoints - a point in the program beyond which execution cannot proceed until the specified checks have been completed. Error reporting and recovery in a multi-processor environment is complicated and there are various techniques of handling this. We describe a few of these techniques in this paper. An implementation of this methodology for the Anna specification language for Ada programs is described. Results of experiments conducted on this implementation using a 12 processor Sequent Symmetry demonstrate that permanent concurrent monitoring of programs based on formal specifications is indeed feasible. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/425/CSL-TR-90-425.pdf %R CSL-TR-90-426 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A VLSI architecture for the FCHC isometric lattice gas model %A Lee, Fung F. %A Flynn, Michael J. %A Morf, Martin %D April 1990 %X Lattice gas models are cellular automata used for the simulation of fluid dynamics. This paper addresses the design issues of a lattice gas collision rule processor for the four-dimensional FCHC isometric lattice gas model. A novel VLSI architecture based on an optimized version of Henon's isometric algorithm is proposed. One of the key concepts behind this architecture is the permutation group representation of the isometry group of the lattice. In contrast to the straightforward table lookup approach which would take 4.5 billion bits to implement this set of collision rules, the size of our processor is only about 5000 gates. With a reasonable number of pipeline stages, the processor can deliver one result per cycle with a cycle time comparable to or less than that of a common commercial DRAM. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/426/CSL-TR-90-426.pdf %R CSL-TR-90-428 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Sub-nanosecond arithmetic %A Flynn, Michael J. %A DeMicheli, Giovanni %A Dutton, Robert %A Wooley, Bruce %A Pease, R. Fabian %D May 1990 %X The SNAP (Stanford Nanosecond Arithmetic Processor) project is targeted at realizing an arithmetic processor with performance approximately an order of magnitude faster than currently available technology. The realization of SNAP is predicated on an interdisciplinary approach and effort spanning research in algorithms, data representation, CAD, circuits and devices, and packaging. SNAP is visualized as an arithmetic coprocessor implemented on an active substrate containing several chips, each of which realize a particular arithmetic function. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/428/CSL-TR-90-428.pdf %R CSL-TR-90-431 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Latency and throughput tradeoffs in self-timed speed-independent pipelines and rings %A Williams, Ted %D August 1990 %X Asynchronous pipelines control the flow of tokens through a sequence of logical stages based on the status of local completion detectors. As in a synchronously clocked circuit, the design of self-timed pipelines can trade off between achieving low latency and high throughput. However, there are more degrees of freedom because of the variances in specific latch and function block styles, and the possibility of varying both the number of latches between function blocks and their connections to the completion detectors. This report demonstrates the utility of a graph-based methodology for analyzing the timing dependencies and uses it to make comparisons of different configurations. It is shown that the extremes for high throughput and low latency differ significantly, the placement of the completion detectors influences timing as much as adding an additional latch, and the choice as to whether precharged or static logic is best is dependent on the cost in complexity of the completion detectors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/431/CSL-TR-90-431.pdf %R CSL-TR-90-436 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Application of formal specification to software maintenance %A Madhav, Neel %A Sankar, Sriram %D August 1990 %X This paper describes the use of formal specifications and associated tools in addressing various aspects of software maintenance ---corrective, perfective, and adaptive. It also addresses the refinement of the software development process to build programs that are easily maintainable. The task of software maintenance in our case includes the task of maintaining the specification as well as maintaining the program. We focus on the use of Anna, a specification language for formally specifying Ada programs, to aid us in maintaining Ada programs. These techniques are applicable to most other specification language and programming language environments. The tools of interest are: (1) the Anna Specification Analyzer which allows us to analyze the specification for correctness with respect to our informal understanding of program behavior; and (2) the Anna Consistency Checking System which monitors the Ada program at runtime based on the Anna specification. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/436/CSL-TR-90-436.pdf %R CSL-TR-90-438 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Methodology for Formal Specification and Implementation of Ada Packages Using Anna %A Madhav, Neel %A Mann, Walter %D August 1990 %X This paper presents a methodology for formal specification and prototype implementation of Ada packages using the Anna specification language. Specifications play an important role in the software development cycle. The methodology allows specifiers of Ada packages to follow a sequence of simple steps to formally specify packages. Given the formal specification of a package resulting from the methodology for package specifications, the methodology allows implementors of packages to follow a few simple steps to implement the package. The implementation is meant to be a prototype. This methodology for specification and implementation is applicable to most Ada packages. Limitations of this approach are pointed out at various points in the paper. We present software tools which help the process of specification and implementation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/438/CSL-TR-90-438.pdf %R CSL-TR-90-439 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Tango: A Multiprocessor Simulation and Tracing System %A Davis, Helen %A Goldschmidt, Stephen R. %D July 1990 %X Tango is a software simulation and tracing system used to obtain data for evaluating parallel programs and multiprocessor systems. The system provides a simulated multiprocessor environment by multiplexing application processes onto a single processor. Tango achieves high efficiency by running compiled user code, and by focusing on the information of greatest interest to multiprocessing studies. The system is being applied to a wide range of investigations, including algorithm studies and a variety of hardware evaluations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/439/CSL-TR-90-439.pdf %R CSL-TR-90-441 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Computing Types During Program Specialization %A Weise, Daniel %A Ruf, Erik %D October 1990 %X We have developed techniques for obtaining and using type information during program specialization (partial evaluation). Computed along with every residual expression and every specialized program is type information that bounds the possible values that the specialized program will compute at run time. The three keystones of this research are symbolic values that represent both a value and the code for creating the value, generalization of symbolic values, and the use of online fixed-point iterations for computing the type of values returned by specialized recursive functions. The specializer exploits type information to increase the efficiency of specialized functions. This research has two benefits, one anticipated and one unanticipated. The anticipated benefit is that programs that are to be specialized can now be written in a more natural style without losing accuracy during specialization. The unanticipated benefit is the creation of what we term concrete abstract interpretation. This is a method of performing abstract interpretation with concrete values where possible. The specializer abstracts values as needed, instead of requiring that all values be abstracted prior to abstract interpretation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/441/CSL-TR-90-441.pdf %R CSL-TR-90-442 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An improved algorithm for high-speed floating-point addition %A Quach, Nhon T. %A Flynn, Michael J. %D August 1990 %X This paper describes an improved, IEEE conforming floating-point addition algorithm. This algorithm has only one addition step involving the significand in the worst-case path, hence offering a considerable speed advantage over the existing algorithms, which typically require two to three addition steps. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/442/CSL-TR-90-442.pdf %R CSL-TR-90-443 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A queuing analysis for disk array systems %A Ogata, Mikito %A Flynn, Michael J. %D August 1990 %X Using a queuing model of disk arrays, we study the performance and tradeoffs in disk array sub-systems and develop guidelines for designing these sub-systems in various CPU environments. Finally, we compare our model with some earlier simulation results. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/443/CSL-TR-90-443.pdf %R CSL-TR-90-453 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Event patterns: A language construct for hierarchical designs of concurrent systems %A Luckham, David D. %A Gennart, Benoit A. %D November 1990 %X Event patterns are a language construct for expressing relationships between specifications at different levels of a hierarchical design of a concurrent system. They provide a facility missing from current hardware design languages such as VHDL, or programming languages with parallel constructs such as Ada. This paper explains the use of event patterns in (1) defining mappings between different levels of a design hierarchy, and (2) automating the comparison of the behavior of different design levels during simulation. It describes the language constructs for defining event patterns and mappings, and shows their use in a design example, a 16-bit CPU. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/453/CSL-TR-90-453.pdf %R CSL-TR-90-454 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Page allocation to reduce access time of physical caches %A Bray, Brian K. %A Lunch, William L. %A Flynn, Michael J. %D November 1990 %X A simple modification to an operating system's page allocation algorithm can give physically addressed caches the speed of virtually addressed caches. Colored page allocation reduces the number of bits that need to be translated before cache access, allowing large low-associativity caches to be indexed before address translation, which reduces the latency to the processor. The colored allocation also has other benefits: caches miss less (in general) and more uniformly, and the inclusion principle holds for second level caches with less associativity. However, the colored allocation requires main memory partitioning, and more common bits for shared virtual addresses. Simulation results show high non-uniformity of cache miss rates for normal allocation. Analysis demonstrates the extent of second-level cache inclusion, and the reduction in effective main-memory due to partitioning. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/90/454/CSL-TR-90-454.pdf %R CSL-TR-91-459 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T On fast IEEE rounding %A Quach, Nhon %A Takagi, Naofumi %A Flynn, Michael J. %D January 1991 %X A systematic general rounding procedure is proposed. This procedure consists of 2 steps: constructing a rounding table and selecting a prediction scheme. Optimization guidelines are given in each step to minimize the hardware used. This procedure-based rounding method has the additional advantage that verification and generalization are trivial. Two rounding hardware models are described. The first is shown to be identical to that reported by Santoro, et al. The second is more powerful, providing solutions where the first fails. Applying this approach to the IEEE rounding modes for high-speed conventional binary multipliers reveals that round to infinity is more difficult to implement than the round to nearest mode; more adders are potentially needed. Round to zero requires the least amount of hardware. A generalization of this procedure to redundant binary multipliers reveals two major advantages over conventional binary multipliers. First, the computation of the sticky bit consumes considerably less hardware. Second, implementing round to positive and minus infinity modes does not require the examination of the sticky bit, removing a possible worst-case path. A generalization of this approach to addition produces a similar solution to that reported by Quach and Flynn. Although generalizable to other kinds of rounding as well as other arithmetic operations, we only treat the case of IEEE rounding for addition and multiplication; IEEE rounding because it is the current standard on rounding, addition and multiplication because they are the most frequently used arithmetic operations in a typical scientific computation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/459/CSL-TR-91-459.pdf %R CSL-TR-91-463 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Leading One Detection --- Implementation, Generalization, and Application %A Quach, Nhon %A Flynn, Michael J. %D March 1991 %X This paper presents the concept of leading-one prediction (LOP) in greater detail and describes two existing implementations. The first one is similar to that used in the IBM RS/6000 processor. The second is a distributed version of the first, consuming less hardware when multiple patterns need to be detected. We show how to modify these circuits for sign-magnitude numbers as dictated by the IEEE standard. We then point out that (1) LOP and carry lookahead in parallel addition belong to the same class of problem, that of a bit pattern detection. Such a recognition allows techniques developed for parallel addition to be borrowed for bit pattern detection. And (2) LOP can be applied to compute the sticky bit needed for binary multipliers to perform IEEE rounding. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/463/CSL-TR-91-463.pdf %R CSL-TR-91-468 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Efficient moment-based timing analysis for variable accuracy switch level simulation %A Kao, Russell %A Horowitz, Mark %D April 1991 %X We describe a timing analysis algorithm which can achieve the efficiency of RC tree analysis while retaining much of the generality of Asymptotic Waveform Estimation. RC tree analysis from switch level simulation is generalized to handle piecewise linear transistor models, non tree topologies, floating capacitors, and feedback. For simple switch level models the complexity is O(n). The algorithm allows the user to trade off efficiency vs accuracy through the selection of transistor models of varying complexity. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/468/CSL-TR-91-468.pdf %R CSL-TR-91-469 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SPLASH: Stanford parallel applications for shared-memory %A Singh, Jaswinder Pal %A Weber, Wolf-Dietrich %A Gupta, Anoop %D April 1991 %X This report was replaced and updated in CSL-TR-92-526 %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/469/CSL-TR-91-469.pdf %R CSL-TR-91-470 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Writes caches as an alternative to write buffers %A Bray, Brian K. %A Flynn, Michael J. %D April 1991 %X Write buffers help unbind one level of a memory hierarchy from the next, thus write buffers are used to reduce write stalls. Write buffers are used in write-through systems so that writes can occur at the rate the cache can handle them, but write buffers don't reduce the number of writes, or cluster writes for block transfers. A write cache is a cache that uses an allocate on write miss, write-back, no allocate on read miss strategy. A write cache tries to reduce the total number of writes (write traffic) to the next level by taking advantage of the temporal locality of writes. A write cache also groups writes for block transfers by taking advantage of the spatial locality of writes. We have found that small write caches can significantly reduce the write traffic to the first write-back level after the processor's register set. Systems that would benefit from reduced write traffic to the first write-back level would benefit from using a write cache instead of a write buffer. The temporal and spatial locality of writes is very important in determining what organization the write cache should have. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/470/CSL-TR-91-470.pdf %R CSL-TR-91-475 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Making effective use of shared-memory multiprocessors: the process control approach %A Gupta, Anoop %A Tucker, Andrew %A Stevens, Luis %D May 1991 %X We present the design, implementation, and performance of a novel approach for effectively utilizing shared-memory multiprocessors in the presence of multiprogramming. Our approach offers high performance by combining the techniques of process control and processor partitioning. The process control technique is based on the principle that to maximize performance, a parallel application must dynamically match the number of runnable processes associated with it to the effective number of processors available to it. This avoids the problems arising from oblivious preemption of processes and it allows an application to work at a better operating point on its speedup versus processors curve. Processor partitioning is necessary for dealing with realistic multiprogramming environments, where both process controlled and non-controlled applications may be present. It also helps improve the cache performance of applications and removes the bottleneck associated with a single centralized scheduler. Preliminary results from an implementation of the process control approach, with a user-level server, were presented in a previous paper. In this paper, we extend the process control approach to work with processor partitioning and fully integrate the approach with the operating system kernel. This also allows us to address a limitation in our earlierÊimplementation wherein a close correspondence between runnable processes and the available processors was not maintained in the presence of I/O. The paper presents the design decisions and the rationale for the current implementation, along with extensive results from executions on a high-performance Silicon Graphics 4D/340 %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/475/CSL-TR-91-475.pdf %R CSL-TR-91-480 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Strategies for branch target buffers %A Bray, Brian K. %A Flynn, M. J. %D June 1991 %X Achieving high instruction issue rates depends on the ability to dynamically predict branches. We compare two schemes for dynamic branch prediction: a separate branch target buffer and an instruction cache based branch target buffer. For instruction caches of 4KB and greater, instruction cache based branch prediction performance is a strong function of line size, and a weak function of instruction cache size. An instruction cache based branch target buffer with a line size of 8 (or 4) instructions performs about as well as a separate branch target buffer structure which has 64 (or 256, respectively) entries. Software can rearrange basic blocks in a procedure to reduce the number of taken branches, thus reducing the amount of branch prediction hardware needed. With software assistance, predicting all branches as not branching performs as well as a 4 entry branch target buffer without assistance, and a 4 entry branch target buffer with assistance performs as well as a 32 entry branch target buffer without assistance. The instruction cache based branch target buffer also benefits from the software, but only for line sizes of more than 4 instructions. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/480/CSL-TR-91-480.pdf %R CSL-TR-91-481 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Subnanosecond arithmetic (Second Report) %A Flynn, Michael J. %A DeMicheli, Giovanni %A Dutton, Robert %A Pease, R. Fabian %A Wooley, Bruce %D June 1991 %X The Stanford Nanosecond Arithmetic Project is targeted at realizing an arithmetic processor with performance approximately an order of magnitude faster than currently available technology. The realization of SNAP is predicated on an interdisciplinary approach and effort spanning research in algorithms, data representation, CAD, circuits and devices, and packaging. SNAP is visualized as an arithmetic coprocessor implemented on an active substrate containing several chips, each of which realize a particular arithmetic function. This year's report highlights recent results in the area of wave pipelining. We have fabricated a number of prototype die, implementing a multiplier slice. Cycle times below 5ns were realized. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/481/CSL-TR-91-481.pdf %R CSL-TR-91-483 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Suggestions for implementing a fast IEEE multiply-add-fused instruction %A Quach, Nhon %A Flynn, Michael %D July 1991 %X We studied three possible strategies to overlap the operations in a floating-point add (FPA) and a floating-point multiply (FPM) for implementing an IEEE multiply-add-fused (MAF) instruction. The operations in FPM and FPA are: (a) non-overlapped, (b) fully-overlapped, and (c) partially-overlapped. The first strategy corresponds to multiply-add-chained (MAC) widely used in vector processors. The second (Greedy) strategy uses a greedy algorithm, yielding an implementation similar to the IBM RS/6000 one. The third and final (SNAP) strategy uses a less aggressive starting configuration and corresponds to the SNAP implementation. An IEEE MAF delivers the same result as that obtained via a separate IEEE FPM and FPA. Two observations have prompted this study. First, in the IBM RS/6000 implementation, the design tradeoffs have been made for high internal data precision, which facilitates the execution of elementary functions. These tradeoff decisions, however, may not be valid for an IEEE MAF. Second, the RS/6000 implementation assumed a different critical path for FPA and FPM, which does not reflect the current state-of-the-art in FP technology. Using latency and hardware costs as the performance metrics we show that: (1) MAC has the lowest FPA latency and consumes the least hardware. But its MAF latency is the highest. (2) Greedy has a medium MAF latency but the highest FPA latency. And (3) SNAP has the lowest MAF latency and a slightly higher FPA latency than that of MAC, consuming an area that is comparable with that of Greedy. Both Greedy and SNAP have higher design complexity arising from rounding for the IEEE standard. SNAP has an additional wire complexity, which Greedy does not have because of its simpler datapath. If rounding for the IEEE standard is not an issue, the Greedy strategy --- and therefore the RS/6000 --- seems reasonable for applications with a high MAF to FPA ratio. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/483/CSL-TR-91-483.pdf %R CSL-TR-91-485 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Unidraw-Based User Interface Builder %A Vlissides, John M. %A Tang, Steven %D August 1991 %X Ibuild is a user interface builder that lets a user manipulate simulations of toolkit objects rather than actual toolkit objects. Ibuild is built with Unidraw, a framework for building graphical editors that is part of the InterViews toolkit. Unidraw makes the simulation-based approach attractive. Simulating toolkit objects in Unidraw makes it easier to support editing facilities that are common in other kinds of graphical editors, and it keeps the builder insulated from a particular toolkit implementation. Ibuild supports direct manipulation analogs of InterViews' composition mechanisms, which simplify the specification of an interface's layout and resize semantics. Ibuild also leverages the C++ inheritance mechanism to decouple builder-generated code from the rest of the application. And while current user interface builders stop at the widget level, ibuild incorporates Unidraw abstractions to simplify the implementation of graphical editors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/485/CSL-TR-91-485.pdf %R CSL-TR-91-488 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Stanford ADA style checker: an application of the ANNA tools and methodology %A Walicki, Michal %A Skakkebaek, Jens Ulrik %A Sankar, Sriram %D August 1991 %X This report describes the Ada style checker, which was designed and constructed in Winter and Spring 1989-90. The style checker is based on the Stanford Anna Tools and has been annotated using Anna. The style checker examines Ada programs for "correct style'' which is defined in a style specification language (SSL). A style checker generator is used to automatically generate a style checker based on a set of style specifications. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/488/CSL-TR-91-488.pdf %R CSL-TR-91-492 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Paging Performance with Page Coloring. %A Lynch, William L. %A Flynn, Michael J. %D October 1991 %X Constraining the mapping of virtual to physical addresses page coloring can speed and/or simplify caches in the presence of virtual memory. For the mapping to hold, physical memory must be partitioned into distinct colors, and virtual pages allocated to a specific color of physical page determined by the mapping. This paper uses and analytical model and simulation to compare the paging effects of colored versus uncolored (conventional) page allocation, and concludes that these effects are small. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/492/CSL-TR-91-492.pdf %R CSL-TR-91-496 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T ANNA package specification: case studies %A Kenney, John %A Mann, Walter %D October 1991 %X We present techniques of software specification of Ada* software based on the Anna specification language and examples of Ada packages formally specified in Anna. A package specification for an abstract set type is used to illustrate the techniques and pitfalls involved in the process of software specification and development. This specification not only exemplifies good Anna style and specification approach, but has a secondary goal of teaching the reader how to use Anna and the associated set of Anna tools developed at Stanford University over the past six years. The technical report thus aims to give readers a new way of looking at the software design and development process, synthesizing fifteen years of research in the process. *Ada is a registered trademark of the U.S. Government (Ada Joint Program Office) %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/496/CSL-TR-91-496.pdf %R CSL-TR-91-498 %Z Wed, 30 Mar 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory. %T Spectral Techniques for Technology Mapping %A Yang, Jerry Chih-Yuan %A DeMicheli, Giovanni %D March 1994 %X Technology mapping is the crucial step in logic synthesis where technology dependent optimizations take place. The matching phase of a technology mapping algorithm is generally considered the most computationally intensive task, because it is called on repeatedly. In this work, we investigate applications of spectral techniques in doing matching. In particular, we present an algorithm that will detect NPN-equivalent Boolean functions. We show that while generating the spectra for Boolean functions may be expensive, this algorithm offers significant pruning of the search space and is simple to implement. The algorithm is implemented as part of the Specter technology mapper, and results are compared to other Boolean matching techniques. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/498/CSL-TR-91-498.pdf %R CSL-TR-91-484 %Z Mon, 26 Jan 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Self-Consistency and Transitivity in Self-Calibration Procedures %A Raugh, Michael %D July 1991 %X Self-calibration refers to the use of an uncalibrated measuring instrument and an uncalibrated object called an artifact, such as a rigid marked plate, to simultaneously measure the artifact and calibrate the instrument. Typically, the artifact is measured in more than one position, and the required information is derived from comparisons of the various measurements. The problems of self-calibration are surprisingly subtle. This paper develops concepts and vocabulary for dealing with such problems in one and two dimensions and uses simple (non-optimal) measurement procedures to reveal the underlying principles. The approach in two dimensions is mathematically constructive: procedures are described for measuring an uncalibrated artifact in several stages, involving progressive transformations of the instrument's uncalibrated coordinate system, until correct coordinates for the artifact are obtained and calibration of the instrument is achieved. Self-consistency and transitivity, as defined within, emerge as key concepts. It is shown that self-consistency and transitivity are necessary conditions for self-calibration. Consequently, in general, it is impossible to calibrate a two dimenstional measuring instrument by simply rotating and measureing a calibration plate about a fixed center. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/484/CSL-TR-91-484.pdf %R CSL-TR-91-465 %Z Mon, 28 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Analysis of Power Supply Networks in VLSI Circuits %A Stark, Don %D March 1991 %X Although the trend toward finer geometries and larger chips has produced faster systems, it has also created larger voltage drops and higher current densities in chip power supply networks. Excessive voltage drops in the power supply lines cause incorrect circuit operation, and high current densities lead to circuit failure via electromigration. Analyzing this power supply noise by hand for large circuits is difficult and error prone; automatic checking tools are needed to make the analysis easier. This thesis describes Ariel, a CAD tool that helps VLSI designers analyze power supply noise. The system consists of three main components, a resistance extractor, a current estimator, and a linear solver, that are used together to determine the voltage drops and current density along the supply lines. The resistance extractor includes two parts: a fast extractor that calculates resistances quickly using simple heuristics, and a slower, more accurate finite element extractor. Despite its simplicity, the fast extractor obtained nearly the same results as the finite element one and is two orders of magnitude faster. The system also contains two current estimators, one for CMOS designs and one for ECL. The CMOS current estimator is based on the switch level simulator Rsim, and produces a time-varying current distribution that includes the effects of charge sharing, image currents, and slope on the gate's inputs. The ECL, estimator does a static analysis of the design, calculating each gate's tail current and tracing through the network to find where it enters the power supplies. Extensions to the estimator allow it to handle more complex circuits, such as shared current lines and diode decoders. Finally, the linear solver applies this current pattern to the resistance network, and efficiently calculates voltages and current densities by taking advantage of topological characteristics peculiar to power supply networks. It removes trees, simple loops, and series sections for separate analysis. These techniques substantially reduce the time required for solution. This report also includes the results of running the system on several large designs, and points out flaws that Ariel uncovered in their power networks. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/91/465/CSL-TR-91-465.pdf %R CSL-TR-92-510 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Rapide-0.2 Examples %A Hsieh, Alexander %D February 1992 %X Rapide-0.2 is an executable language for prototyping distributed, time sensitive systems. We present in this report a series of simple, working example programs in the language. In each example we present one or more new concepts or constructs of the Rapide-0.2 language with later examples drawing on previously presented material. The examples are written for both those who wish to use the Rapide-0.2 language to do serious prototyping and for those who just wish to be familiar with it. The examples were not written for someone who wishes to learn prototyping in general. CSL-TN-92-387 is an informal reference manual, describing the Rapide-0.2 language and tools, which might be helpful to have in conjunction with CSL-TR-92-510 (p. 191). %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/510/CSL-TR-92-510.pdf %R CSL-TR-92-515 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Partial orderings of event sets and their application to prototyping concurrent timed systems %A Luckham, David C. %A Vera, James %A Bryan, Doug %A Augustin, Larry %A Belz, Frank %D April 1992 %X Rapide is a concurrent object-oriented language specifically designed for prototyping large concurrent systems. One of the principle design goals has been to adopt a computation model in which the synchronization, concurrency, dataflow, and timing aspects of a prototype are explicitly represented and easily accessible both to the prototype itself and to the prototyper. This paper describes the partially ordered event set (poset) computation model, and the features of Rapide for using posets in reactive prototypes and for automatically checking posets. Some critical issues in the implementation of Rapide are described and our experience with them is summarized. An example prototyping scenario illustrates uses of the poset computation model. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/515/CSL-TR-92-515.pdf %R CSL-TR-92-516 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Opportunities for Online Partial Evaluation %A Ruf, Erik %A Weise, Daniel %D April 1992 %X Partial evaluators can be separated into two classes: offline specializers, which make all of their reduce/residualize decisions before specialization, and online specializers, which make such decisions during specialization. The choice of which method to use is driven by a tradeoff between the efficiency of the specializer and the quality of the residual programs that it produces. Existing research describes some of the inefficiencies of online specializers, and how these are avoided using offline methods, but fails to address the price paid in specialization quality. This paper motivates research in online specialization by describing two fundamental limitations of the offline approach, and explains why the online approach does not encounter the same difficulties. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/516/CSL-TR-92-516.pdf %R CSL-TR-92-517 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Preserving Information during Online Partial Evaluation %A Ruf, Erik %A Weise, Daniel %D April 1992 %X The degree to which a partial evaluator can specialize a source program depends on how accurately the partial evaluator can represent and maintain information about runtime values. Partial evaluators always lose some accuracy due to their use of finite type systems; however, existing partial evaluation techniques lose information about runtime values even when their type systems are capable of representing such information. This paper describes two sources of such loss in existing specializers, solutions for both cases, and the implementation of these solutions in our partial evaluation system, FUSE. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/517/CSL-TR-92-517.pdf %R CSL-TR-92-518 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Avoiding Redundant Specialization during Partial Evaluation %A Ruf, Erik %A Weise, Daniel %D April 1992 %X Existing partial evaluators use a strategy called polyvariant specialization, which involves specializing program points on the known portions of their arguments, and re-using such specializations only when these known portions match exactly. We show that this re-use criterion is overly restrictive, and misses opportunities for sharing in residual programs, thus producing large residual programs containing redundant specializations. We develop a criterion for re-use based on computing the domains of specializations, describe an approximate implementation of this criterion based on types, and show its implementation in our partial evaluation system FUSE. In addition, we describe several extensions to our mechanism to make it compatible with more powerful specialization strategies and to increase its efficiency. After evaluating our algorithm's usefulness, we relate it to existing work in partial evaluation and machine learning. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/518/CSL-TR-92-518.pdf %R CSL-TR-92-520 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Empirical Study of an Abstract Interpretation of Scheme Programs %A Kanamori, Atty %A Weise, Daniel %D April 1992 %X Abstract Interpretation, a powerful and general framework for performing global program analysis, is being applied to problems whose difficulty far surpasses the traditional "bit-vector'' dataflow problems for which many of the high-speed abstract interpretation algorithms worked so well. Our experience has been that current methods of large scale abstract interpretation are unacceptably expensive. We studied a typical large-scale abstract interpretation problem: computing the control flow of a higher order program. Researchers have proposed various solutions that are designed primarily to improve the accuracy of the analysis. The cost of the analyses, and its relationship to accuracy, is addressed only cursorily in the literature. Somewhat paradoxically, one can view these strategies as attempts to simultaneously improve the accuracy and reduce the cost. The less accurate strategies explore many spurious control paths because many flowgraph paths represent illegal execution paths. For example, the less accurate strategies violate the LIFO constraints on procedure call and return. More accurate analyses investigate fewer control paths, and therefore may be more efficient despite their increased overhead. We empirically studied this accuracy versus efficiency tradeoff. We implemented two fixpoint algorithms, and four semantics (baseline, baseline + stack reasoning, baseline + contour reasoning, baseline + stack reasoning + contour reasoning) for a total of eight control flow analyzers. Our benchmarks test various programming constructs in isolation --- hence, if a certain algorithm exhibits poor performance, the experiment also yields insight into what kind of program behavior results in that poor performance. The results suggest that strategies that increase accuracy in order to eliminate spurious paths often generate unacceptable overhead in the parts of the analysis that do not benefit from the increased accuracy. Furthermore, we found little evidence that the extra effort significantly improves the accuracy of the final result. This suggests that increasing the accuracy of the analysis globally is not a good idea, and that future research shouldÊinvestigate adaptive algorithms that use different amounts of precision on different parts of the problem. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/520/CSL-TR-92-520.pdf %R CSL-TR-92-523 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Architectural and implementation tradeoffs in the design of multiple-context processors %A Laudon, James %A Gupta, Anoop %A Horowitz, Mark %D May 1992 %X Multiple-context processors have been proposed as an architectural technique to mitigate the effects of large memory latency in multiprocessors. We examine two schemes for implementing multiple-context processors. The first scheme switches between contexts only on a cache miss, while the other interleaves the contexts on a cycle-by-cycle basis. Both schemes provide the capability for a single context to fully utilize the pipeline. We show that cycle-by-cycle interleaving of contexts provides a performance advantage over switching contexts only at a cache miss. This advantage results from the context interleaving hiding pipeline dependencies and reducing the context switch cost. In addition, we show that while the implementation of the interleaved scheme is more complex, the complexity is not overwhelming. As pipelines get deeper and operate at lower percentages of peak performance, the performance advantage of the interleaved scheme is likely to justify its additional complexity. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/523/CSL-TR-92-523.pdf %R CSL-TR-92-526 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SPLASH: Stanford parallel applications for shared-memory* %A Singh, Jaswinder Pal %A Weber, Wolf-Dietrich %A Gupta, Anoop %D June 1992 %X We present the Stanford Parallel Applications for Shared-Memory (SPLASH), a set of parallel applications for use in the design and evaluation of shared-memory multiprocessing systems. Our goal is to provide a suite of realistic applications that will serve as a well-documented and consistent basis for evaluation studies. We describe the applications currently in the suite in detail, discuss and compare some of their important characteristicsPsuch as data locality, granularity, synchronization, etc.Pand explore their behavior by running them on a real multiprocessor as well as on a simulator of an idealized parallel architecture. We expect the current set of applications to act as a nucleus for a suite that will grow with time. This report replaces and updates CSL-TR-91-469, April 1991. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/526/CSL-TR-92-526.pdf %R CSL-TR-92-528 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Binary multiplication Using Partially Redundant Multiples %A Bewick, Gary %A Flynn, Michael J. %D June 1992 %X This report presents an extension to Booth's algorithm for binary multiplication. Most implementations that utilize Booth's algorithm use the 2 bit version, which reduces the number of partial products required to half that required by a simple add and shift method. Further reduction in the number of partial products can be obtained by using higher order versions of Booth's algorithm, but it is necessary to generate multiples of one of the operands (such as 3 times an operand) by the use of a carry propagate adder. This carry propagate addition introduces significant delay and additional hardware. The algorithm described in this report produces such difficult multiples in a partially redundant form, using a series of small length adders. These adders operate in parallel with no carries propagating between them. As a result, the delay introduced by multiple generation is minimized and the hardware needed for the multiple generation is also reduced, due to the elimination of expensive carry lookahead logic. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/528/CSL-TR-92-528.pdf %R CSL-TR-92-534 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T On the specialization of online program specializers %A Ruf, Erik %A Weise, Daniel %D July 1992 %X Program specializers improve the speed of programs by performing some of the programs' reductions at specialization time rather than at runtime. This specialization process can be time-consuming; one common technique for improving the speed of the specialization of a particular program is to specialize the specializer itself on that program, creating a custom specializer, or program generator, for that particular program. Much research has been devoted to the problem of generating efficient program generators, which do not perform reductions at program generation time which could instead have been performed when the program generator was constructed. The conventional wisdom holds that only offline program specializers, which use binding time annotations, can be specialized into such efficient program generators. This paper argues that this is not the case, and demonstrates that the specialization of a nontrivial online program specializer similar to the original "naive MIX" can indeed yield an efficient program generator. The key to our argument is that, while the use of binding time information at program generator generation time is necessary for the construction of an efficient custom specializer, the use of explicit binding time approximation techniques is not. This allows us to distinguish the problem at hand (i.e., the use of binding time information during program generator generation) from particular solutions to that problem (i.e., offline specialization). We show that, given a careful choice of specializer data structures, and sufficiently powerful specialization techniques, binding time information can be inferred and utilized without the use of explicit binding time approximation techniques. This allows the construction of efficient, optimizing program generators from online program specializers. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/534/CSL-TR-92-534.pdf %R CSL-TR-92-546 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The accuracy of trace-driven simulations of multiprocessors %A Goldschmidt, Stephen R. %A Hennessy, John L. %D September 1992 %X In trace-driven simulation, traces generated for one set of machine characteristics are used to simulate a machine with different characteristics. However, the execution path of a multiprocessor workload may depend on the ordering of events on different processors, which in turn depends on machine characteristics such as memory system timings. Trace-driven simulations of multiprocessor workloads are inaccurate unless the timing-dependencies are eliminated from the traces. We measure such inaccuracies by comparing trace-driven simulations to direct simulations of the same workloads. The results were identical only for workloads whose timing dependencies were eliminated from the traces. The remaining workloads used either first-come first-served scheduling or non-deterministic algorithms; these characteristics resulted in timing-dependencies that could not be eliminated from the traces. Workloads which used task-queue scheduling had particularly large discrepancies because task-queue operations, unlike other synchronization operations, were not abstracted. Two types of simulation results had especially large discrepancies: those related to synchronization latency and those derived from relatively small numbers of events. Studies that rely on such results should use timing- independent traces or direct simulation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/546/CSL-TR-92-546.pdf %R CSL-TR-92-553 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Branch predication using large self history %A Johnson, John D. %D December 1992 %X Branch prediction is the main method of providing speculative opportunities for new high performance processors, therefore the accuracy of branch prediction is becoming very important. Motivated by this desire to achieve high levels of branch prediction, this study examines methods of using up to 24 bits branch direction history to determine the probable outcome of the next execution of a conditional branch. Using profiling to train a prediction logic function achieves an average branch prediction accuracy of up to 96.9% for the six benchmarks used in this study. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/553/CSL-TR-92-553.pdf %R CSL-TR-92-548 %Z Tue, 05 Dec 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T System synthesis via hardware-software co-design %A Gupta, Rajesh K. %A DeMicheli, Giovanni %D October 1992 %X Synthesis of circuits containing application-specific as well as re-programmable components such as off-the-shelf microprocessors provides a promising approach to realization of complex systems using a minimal amount of application-specific hardware while still meeting the required performance constraints. We formulate the synthesis problem of complex behavioral descriptions with performance constraints as a hardware-software co-design problem. The target system architecture consists of a software component as a program running on a re-programmable processor assisted by application-specific hardware components. System synthesis is performed by first partitioning the input system description into hardware and software portions and then by implementing each of them separately. We consider the problem of identifying potential hardware and software components of a system described in a high-level modeling language. Partitioning approaches are presented based on decoupling of data and control flow, and based on communication/synchronization requirements of the resulting system design. Synchronization between various elements of a mixed system design is one of the key issues that any synthesis system must address. We present software and interface synchronization schemes that facilitate communication between system components. We explore the relationship between the non-determinism in the system models and the associated synchronization schemes needed in system implementations. The synthesis of dedicated hardware is achieved by hardware synthesis tools, while the software component is generated using software compiling techniques. We present tools to perform synthesis of a system description into hardware and software components. The resulting software component is assumed to be implemented for the DLX machine, a load/store microprocessor. We present design of an ethernet based network coprocessor to demonstrate the feasibility of mixed system synthesis. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/548/CSL-TR-92-548.pdf %R CSL-TR-92-550 %Z Wed, 23 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Cache Coherence Directories for Scalable Multiprocessors %A Simoni, Richard %D October 1992 %X Directory-based protocols have been proposed as an efficient means of implementing cache coherence in large-scale shared-memory multiprocessors. This thesis explores the trade-offs in the design of cache coherence directories by examining the organization of the directory information, the options in the design of the coherency protocol, and the implementation of the directory and protocol. The traditional directory organization that maintains a full valid bit vector per directory entry is unsuitable for large-scale machines due to high storage overhead. This thesis proposes several alternate organizations. Limited pointers directories replace the bit vactor with several pointers that indicate those caches containing the data. Although this scheme performs well across a wide range of workloads, its performance does not improve as the read/write ratio becomes very large. To address this drawback, a dynamic pointer allocation directory is proposed. This directory allocates pointers from a pool to particular memory blocks as they are needed. Since the pointers may be allocated to any block on the memory module, the probability of running short of pointers is very small. Among the set of possible organizations, dynamic pointer allocation lies at an attractive cost/performance point. Measuring the performance impact of three coherency protocol features makes the virtues of simplicity clear. Adding a clean/exclusive state to reduce the time required to write a clean block results in only modest performance improvement. Using request forwarding to transfer a dirty block directly to another cache that has requested it yields similar results. For small cache block sizes, write hits to clean blocks can be simply treated as write misses without incurring significant extra network traffic. Protocol features designed to improve performance must be examined carefully, for they often complicate the protocol without offering substantial benefit. Implementing directory-based coherency presents several challenges. Methods are described for preventing deadlock, maintaining a model of parallel execution, handling subtle situations caused by temporary inconsistencies between cache and directory state, and tolerating out-of-order message delivery. Using these techniques, cache coherence can be added to large-scale multiprocessors in an inexpensive yet effective manner. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/550/CSL-TR-92-550.pdf %R CSL-TR-92-532 %Z Mon, 28 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Piecewise Linear Models for Switch-Level Simulation %A Kao, Russell %D June 1992 %X Rsim is an efficient logic plus timing simulator that employs the switched resistor transistor model and RC tree analysis to simulate efficiently MOS digital circuits at the transistor level. We investigate the incorporation of piecewise linear transistor models and generalized moments matching into this simulation framework. General piecewise linear models allow more accurate MOS models to be used to simulate circuits that are hard for Rsim. Additionally, they enable the simulator to handle circuits containing bipolar transistors such as ECL and BiCMOS. Nonetheless, the switched resistor model has proved to be efficient and accurate for a large class of MOS digital circuits. Therefore, it is retained as just one particular model available for use in this framework. The use of piecewise linear models requires the generalization of RC tree analysis. Unlike switched resistors, more general models may incorporate gain and floating capacitance. Additionally, we extend the analysis to handle non-tree topologies and feedback. Despite the increased generality, for many common MOS and ECL circuits the complexity remains linear. Thus, this timing analysis can be used to simulate, efficiently, those portions of the circuit that are well described by traditional switch level models, while simultaneously simulating, more accurately, those portions that are not. We present preliminary results from a prototype simulator, Mom. We demonstrate its use on a number of MOS, ECL, and BiCMOS circuits. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/92/532/CSL-TR-92-532.pdf %R CSL-TR-93-564 %Z Thu, 17 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Case Study in Prototyping With Rapide: Shared Memory Multiprocessor System %A Santoro, Alexandre %D March 1993 %X Rapide is a concurrent object-oriented language designed for prototyping distributed systems. This paper describes the creation of such a prototype, more specifically a shared memory multiprocessor system. The design is presented in an evolutionary manner, starting with a simple CPU + memory model. The paper also presents some simulation results and shows how the partially ordered event sets that Rapide produces can be used both for performance analysis and for an in-depth understanding of the model's behavior. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/564/CSL-TR-93-564.pdf %R CSL-TR-93-580 %Z Wed, 09 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Technology Mapping for Generalized Fundamental-Mode Asynchronous Designs %A Siegel, Polly %A DeMicheli, Giovanni %A Dill, David %D June 1993 %X The generalized fundamental-mode asynchronous design style is one in which the combinational portions of the circuit design are separated from the storage elements, as with synchronous design styles. Synchronous technology mapping techniques can be adapted to work for this asynchronous design style if hazards are taken into account. First, we examine each step of algorithmic technology mapping for its influence on the hazard behavior of the modified network. We then present modifications to an existing synchronous technology mapper to work for this asynchronous design style. We present efficient algorithms for hazard analysis that are used during the mapping process. These algorithms have been implemented and incorporated into the program CERES to produce a technology mapper suitable for asynchronous designs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/580/CSL-TR-93-580.pdf %R CSL-TR-93-584 %Z Mon, 07 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimization of Combinational Logic Circuits Based on Compatible Gates %A Damiani, Maurizio %A Yang, Jerry Chih-Yuan %A DeMicheli, Giovanni %D June 1993 %X This paper presents a set of new techniques for the optimization of multiple-level combinational Boolean networks. We describe first a technique based upon the selection of appropriate "multiple-output" subnetworks (consisting of so-called "compatible gates" whose local functions can be optimized simultaneously. We then generalize the method to larger and more arbitrary subsets of gates. Because simultaneous optimization of local functions can take place, our methods are more powerful and general than Boolean optimization methods using "don't cares", where only single-gate optimization can be performed. In addition, our methods represent a more efficient alternative to optimization procedures based on Boolean relations because the problem can be modeled by a "unate" covering problem instead of the more difficult "binate" covering problem. The method is implemented in program ACHILLES and compares favorably to SIS . %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/584/CSL-TR-93-584.pdf %R CSL-TR-93-585 %Z Tue, 15 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Rapide-1.0 Definition of the ADAGE Avionics System %A Mann, Walter %A Belz, Frank C. %A Cornell, Paul %D September 1993 %X We have used the Rapide prototyping-languages, developed by Stanford and TRW under the ARPA ProtoTech Program, in a series of exercises to model an early version of IBM's ADAGE software architecture for helicopter avionics systems. These exercises, conducted under the ARPA Domain Specific Software Architectures (DSSA) Program, also assisted the evolution of the Rapide languages. The resulting Rapide-1.0 model of the ADAGE architecture in this paper is substantially more succinct and illuminating than the original models, developed in Rapide-0.2 and Preliminary Rapide-1.0. All Rapide versions include these key features: interfaces, by which types of components and their possible interactions with other components are defined; actions, by which the events that can be observed or generated by such components are defined; and pattern-based constraints, which define properties of the computation of interacting components in terms of partially ordered sets of events. Key features of Rapide-1.0 include services, which abstract whole communication patterns between components; behavior rules, which provide a state-transition oriented specification of component behavior and from which computation component instances can be synthesized; and architectures, which describe implementations of components with a particular interface, by showing a composition of subordinate components and their interconnections. The Rapide-1.0 model is illustrated with corresponding diagrammatic representations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/585/CSL-TR-93-585.pdf %R CSL-TR-93-588 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Update-Based Cache Coherence Protocols for Scalable Shared-Memory Multiprocessors %A Glasco, David B. %A Delagi, Bruce A. %A Flynn, Michael J. %D November 1993 %X In this paper, two hardware-controlled update-based cache coherence protocols are presented. The paper discusses the two major disadvantages of the update protocols: inefficiency of updates and the mismatch between the granularity of synchronization and the data transfer. The paper presents two enhancements to the update-based protocols, a write combining scheme and a finer grain synchronization, to overcome these disadvantages. The results demonstrate the effectiveness of these enhancements that, when used together, allow the update-based protocols to significantly improve the execution time of a set of scientific applications when compared to three invalidate-based protocols. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/588/CSL-TR-93-588.pdf %R CSL-TR-93-593 %Z Mon, 12 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Performance Advantages of Integrating Message Passing in Cache-Coherent Multiprocessors %A Woo, Steven Cameron %A Singh, Jaswinder Pal %A Hennessy, John L. %D November 1993 %X We examine the performance benefits of integrating a mechanism for block data transfer (message passing) in a cache-coherent shared address space multiprocessor. We do this through a detailed study of five important computations that appear to be likely candidates for block transfer. We find that while the benefits on a realistic architecture are significant in some cases, they are not as substantial as one might initially expect. The main reasons for this are (i) the relatively modest fraction of time that applications spend in communication that is amenable to block transfer, (ii) the difficulty of finding enough independent computation to overlap with the communication latency that remains even after block transfer, and (iii) the fact that long cache lines often capture many of the benefits of block transfer. Of the three primary advantages of block transfer, fast pipelined data transfer appears to be the most successful, followed by the ability to overlap computation and communication at a coarse granularity, and finally the benefits of replicating communicated data in main memory. We also examine the impact of varying important network parameters and processor speed on the relative effectiveness of block transfer, and comment on useful features that a block transfer engine should support for real applications. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/593/CSL-TR-93-593.pdf %R CSL-TR-93-590 %Z Thu, 17 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Effect of Fault Dropping on Fault Simulation Time %A Pan, Rong %A Touba, Nur A. %A McCluskey, Edward J. %D November 1993 %X The effect of fault dropping on fault simulation time is studied in this paper. An experiment was performed in which fault simulation times, with and without fault dropping, were measured for three different simulators. A speedup approximately between 8 and 50 for random test sets and between 1.5 and 9 for deterministic test sets was observed. The results give some indication about how much fault dropping speeds up fault simulation. These results also show the overhead of an application requiring a complete fault dictionary. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/590/CSL-TR-93-590.pdf %R CSL-TR-93-591 %Z Thu, 17 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Logic Synthesis for Concurrent Error Detection %A Touba, Nur A. %A McCluskey, Edward J. %D November 1993 %X The structure of a circuit determines how the effects of a fault can propagate and hence affects the cost of concurrent error detection. By considering circuit structure during logic optimization, the overall cost of a concurrently checked circuit can be minimized. This report presents a new technique called structure-constrained logic optimization (SCLO) that optimizes a circuit under the constraint that faults in the resulting circuit can produce only a prescribed set of errors. Using SCLO, circuits can be optimized for various concurrent error detection schemes allowing the overall cost for each scheme to be compared. A technique for quickly estimating the size of a circuit under different structural constraints is described. This technique enables rapid exploration of the design space for concurrently checked circuits. A new method for the automated synthesis of self-checking circuit implementations for arbitrary combinational circuits is also presented. It consists of an algorithm that determines the best parity-check code for encoding the output of a given circuit, and then uses SCLO to produce the functional circuit which is augmented with a checker to form a self-checking circuit. This synthesis method provides fully automated design, explores a larger design space than other methods, and uses simple checkers. It has been implemented by making modifications to SIS (an updated version of MIS [Brayton 87a]), and results for several MCNC combinational benchmark circuits are given. In most cases, a substantial reduction in overhead compared to a duplicate-and-compare implementation is achieved. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/591/CSL-TR-93-591.pdf %R CSL-TR-93-596 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Models of Communication Latency in Shared Memory Multiprocessors %A Byrd, Gregory T. %D December 1993 %X We evaluate various mechanisms for data communication in large-scale shared memory multiprocessors. Data communication involves both data transmission and synchronization, resulting in the transfer of data between computational threads. We use simple analytical models to evaluate the communication latency for each of the mechanisms. The models show that efficient and opportunistic synchronization is the most important determinant of latency, followed by efficient transmission. Producer-initiated mechanisms, in which data is sent by its producer as it is produced, generally achieve lower latencies than consumer-initated mechanisms, in which data is retrieved as and when it is needed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/596/CSL-TR-93-596.pdf %R CSL-TR-93-554 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Using a Floating-Point Multiplier's Internals for High-Radix Division and Square Root %A Schwarz, Eric M. %A Flynn, Michael J. %D January 1993 %X A method for obtaining high-precision approximations of high-order arithmetic operations at low-cost is presented in this study. Specifically, high-precision approximations of the reciprocal (12 bits worst case) and square root (16 bits) operations are obtained using the internal hardware of a floating-point multiplier without the use of look-up tables. The additional combinatorial logic necessary is very small due to the reuse of existing hardware. These low-cost high-precision approximations are used by iterative algorithms to perform the operations of division and square root. The method presented also applies to several other high-order arithmetic operations. Thus, high-radix algorithms for high-order arithmetic operations such as division and square root are possible at low-cost. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/554/CSL-TR-93-554.pdf %R CSL-TR-93-560 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Cramer Rao Bound for Discrete-Time Edge Position %A Gatherer, Alan %D February 1993 %X The problem of estimating the position of an edge from a series of samples often occurs in the fields of machine vision and signal processing. It is therefore of interest to assess the accuracy of any estimation algorithm. Previous work in this area has produced bounds for the continuous time estimator. In this paper we derive a closed form for the minimum variance bound (or Cramer Rao bound) for estimating the position of an arbitrarily shaped edge in white Gaussian noise for the discrete samples case. We quantify the effects of the sampling rate, the bandwidth of the edge, the shape of the edge and the size of the observation window on the variance of the estimator. We describe a maximum likelihood estimator and show that in practice this estimator requires fewer computations than standard correlation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/560/CSL-TR-93-560.pdf %R CSL-TR-93-561 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fetch Caches %A Bray, Brian K. %A Flynn, Michael J. %D February 1993 %X For high performance, data caches must have a low miss rate and provide high bandwidth, while maintaining low latency. Larger and more complex set associative caches provide lower miss rates but at the cost of increased latency. Interleaved data caches can improve the available bandwidth, but the improvement is limited by bank conflicts and increased latency due to the switching networks required to distribute cache addresses and to route the data. We propose using a small buffer to reduce the data read latency or improve the read bandwidth of an on-chip data cache. We call the small read-only buffer a fetch cache. The fetch cache attempts to capture the immediate spatial locality of the data read reference stream by utilizing the large number of bits that can be fetched in a single access of an on-chip cache. There are two ways a processor can issue multiple instructions per cache access: the cache access can require multiple cycles (i.e. superpipelined), or multiple instructions are issued per cycle (i.e. superscalar). In the first section, we show the use of fetch caches with multi-cycle per access data caches. When there is a read hit in the fetch cache, the read request can be serviced in one cycle, otherwise the latency is that of the primary data cache. For a four line, 16 byte wide fetch cache, the hit rate ranged from 40 to 60 percent depending on the application. In the second part, we show the use of fetch caches when multi-accesses per cycle are requested. When there is a read hit in the fetch cache, a read can be satisfied by the fetch cache, while the primary cache performs another read or write request. For a four line, 16 byte wide fetch cache, the cache bandwidth increased by 20 to 30 percent depending on the application. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/561/CSL-TR-93-561.pdf %R CSL-TR-93-562 %Z Thu, 26 Oct 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Efficient Top-Down Parsing Algorithm for General Context-Free Grammars %A Sankar, Sriram %D February 1993 %X This report describes a new algorithm for top-down parsing of general context-free grammars. The algorithm does not require any changes to be made to the grammar, and can parse with respect to any grammar non-terminal as the start symbol. It is possible to generate all possible parse trees of the input string in the presence of ambiguous grammars. The algorithm reduces to recursive descent parsing on LL grammars. This algorithm is ideal for use in software development environments which include tools such as syntax-directed editors and incremental parsers, where the language syntax is an integral part of the user-interface. General context-free grammars can describe the language syntax more intuitively than, for example, LALR(1) grammars. This algorithm is also applicable to batch-oriented language processors, especially during the development of new languages, where frequent changes are made to the language syntax and new prototype parsers need to be developed quickly. A prototype implementation of a parser generator that generates parsers based on this algorithm has been built. Parsing speeds of around 1000 lines per second have been achieved on a Sun SparcStation 2. This demonstrated performance is more than adequate for syntax-directed editors and incremental parsers, and in most cases, is perfectly acceptable for batch-oriented language processors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/562/CSL-TR-93-562.pdf %R CSL-TR-93-566 %Z Thu, 26 Oct 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Software Testing Using Algebraic Specification Based Test Oracles %A Sankar, Sriram %A Goyal, Anoop %A Sikchi, Prakash %D April 1993 %X In TAV4, the first author presented a paper describing an algorithm to perform run-time consistency checking of abstract data types specified using algebraic specifications. This algorithm has subsequently been incorporated into a run-time consistency checking tool for the Anna specification language for Ada, and works on a subset of all possible algebraic specifications. The algorithm implementation can be considered a test oracle for algebraic specifications that performs its activities while the formally specified program is running. This paper presents empirical results on the use of this test oracle on a real-life symbol table implementation. Various issues that arise due to the use of algebraic specifications and the test oracle are discussed. 50 different errors were introduced into the symbol table implementation. On testing using the oracle, 60% of the errors were detected by the oracle, 35% of the errors caused Ada exceptions to be raised, and the remaining 5% went undetected. These results are remarkable, especially since the test input was simply one sequence of symbol table operations performed by a typical client. The cases that went undetected contained errors that required very specific boundary conditions to be met --- an indication that white box test-data generation techniques may be required to detect them. Hence, a combination of white-box test-data generation along with a specification based test oracle may be an extremely versatile combination in detecting errors. This paper does not address test-data generation, rather it illustrates the usefulness of algebraic specification based test oracles during run-time consistency checking. Run-time consistency checking should be considered a complementary approach to unit testing using generated test-data. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/566/CSL-TR-93-566.pdf %R CSL-TR-93-570 %Z Mon, 28 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Frequency Domain Volume Rendering %A Totsuka, Takashi %A Levoy, Marc %D April 1993 %X The Fourier projection-slice theorem allos projections of volume data to be generated in O(nsquare log n) time for a volumbe of size ncube. The method operates by extracting and inverse Fourier transforming 2D slices from a 3D frequency domain representation of the volume. Unfortunately, these projections do not exhibit the occlusion that is characteristic of conventional volume renderings. We present a new frequency domain volume rendering algorithm that replaces much of the missing depth and shape cues by performing shading calculations in the frequency domain during slice extraction. In particular, we demonstrate frequency domain methods for computing linear or nonlinear depth cueing and directional diffuse reflection. The resulting images can be generated an order of magnitude faster than volume renderings and may be more useful for many applications. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/570/CSL-TR-93-570.pdf %R CSL-TR-93-573 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance of a Three-Stage Banyan-Based Architecture with Input and Output Buffers for Large Fast Packet Switches %A Chiussi, Fabio M. %A Tobagi, Fouad A. %D June 1993 %X Fast packet switching, also referred to as Asynchronous Transfer Mode (ATM), has emerged as the most appropriate switching technique to handle the high data rates and the wide diversity of traffic requirements envisioned in Broadband Integrated Services Digital Networks (B-ISDN). ATM switches capable of meeting the challenges posed by a successful deployment of B-ISDN must be designed and implemented. Such switches should be nonblocking and capable of handling the highly-bursty traffic conditions that future anticipated applications will generate; they should be scalable to the large sizes expected when B-ISDN becomes widely deployed; accordingly, their complexity should be as low as possible; they should be simple to operate; namely, their architecture should facilitate the determination of whether or not a call can be accepted, and the assignment of a route to a call. In this paper, we describe an architecture, referred to as the Memory/Space/Memory switching fabric, which meets these challenges. It combines input and output shared-memory buffer components with space-division banyan networks, making it possible to build a switch with several hundred I/O ports. The MSM achieves output buffering, thus performing very well under a wide variety of traffic conditions, and is self-routing, thus adapting easily to different traffic mixes. Under bursty traffic, by implementing a backpressure mechanism to control the packet flow from input to output queues, and by properly managing the buffers, we can increase the average buffer occupancy; in this way, we can achieve important reductions in total buffer requirements with respect to output-buffer switches (e.g., up to 70% reduction with bursts of average length equal to 100 packets), use input and output buffers of equal sizes, and achieve sublinear increase of the buffer requirements with the burst length. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/573/CSL-TR-93-573.pdf %R CSL-TR-93-577 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Implementation of a Three-Stage Banyan-Based Atchitecture with Input and Output Buffers for Large Fast Packet Switches %A Chiussi, Fabio M. %A Tobagi, Fouad A. %D June 1993 %X Fast packet switching, also referred to as Asynchronous Transfer Mode (ATM), has emerged as the most appropriate switching technique for future Broadband Integrated Services Digital Networks (B-ISDN). A three-stage banyan-based switch architecture with input and output buffers has been recently described [Chi93]. Such architecture, also referred to as the Memory/Space/Memory (MSM) switching fabric, is capable of meeting the challenges posed by a successful deployment of B-ISDN; namely, it is made nonblocking with low complexity, and is scalable to large sizes (>1000 input/output ports); it supports a wide diversity of traffic patterns, including highly-bursty traffic; it maintains packet sequence, is self-routing, and is simple to operate. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/577/CSL-TR-93-577.pdf %R CSL-TR-93-579 %Z Wed, 02 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Comparative Studies of Pipelined Circuits %A Klass, Fabian %A Flynn, Michael J. %D July 1993 %X Wave pipelining is an attractive technique used in high-speed computer systems to speed-up pipeline rate without partitioning a system into pipeline stages. Although recent implemetations have reported very high-speed operation rates, a real evaluation of the advantages and disadvantages of wave pipelining requires a comparative study with other techniques, in particular the understanding of the trade-offs between conventional and wave pipelining is very important. This study is an attempt to provide approximate models which can be used as first-order tools for comparative study or sensitivity analysis of conventional and wave pipelined systems with different overheads. The models presented here are for subsystem-level pipelines. The product Latency x Cycle-Time is used as a measure of performance and is evaluated as a function of all the parameters of a design, such as the propagation delay of the combinational logic, the data skew resulting from the difference between maximum and minimum propagation delays through various logic paths, rise and fall time, the setup time, hold time, and propagation delay through registers, and the uncontrollable clock skew. In this way, an analytical basis is provided for a comparison between different approaches and for optimizations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/579/CSL-TR-93-579.pdf %R CSL-TR-93-556 %Z Wed, 23 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Support for Speculative Execution in High-Performance Processors %A Smith, Michael David %D November 1992 %X Superscalar and superpipelining techniques increase the overlap between the instructions in a pipelined processor, and thus these techniques have the potential to improve processor performance by decreasing the average number of cycles between the execution of adjacent instructions. Yet, to obtain this potential performance benefit, an instruction scheduler for this high-performance processor must find the independent instructions within the instruction stream of an application to execute in parallel. For non-numerical applications, there is an insufficient number of independent instructions within a basic block, and consequently the instruction scheduler must search across the basic block boundaries for the extra instruction-level parallelism required by the superscalar and superpipelining techniques. To exploit instruction-level parallelism across a conditional branch, the instruction scheduler must support the movement of instructions above a conditional branch, and the processor must support the speculative execution of these instructions. We define boosting, an architectural mechanism for speculative execution, that allows us to uncover the instruction-level parallelism across conditional branches without adversely affecting the instruction count of the application or the cycle time of the processor. Under boosting, the compiler is responsible for analyzing and scheduling instructions, while the hardware is responsible for ensuring that the effects of a speculatively-executed instruction do not corrupt the program state when the compiler is incorrect in its speculation. To experiment with boosting, we built a global instruction scheduler, which is specifically tailored for the non-numerical environment, and a simulator, which determines the cycle-count performance of our globally-scheduled programs. We also analyzed the hardware requirements for boosting in a typical load/store architecture. Through the cycle-count simulations and an understanding of the cycle-time impact of the hardware support for boosting, we found that only a small amount of hardware support for speculative execution is necessary to achieve good performance in a small-issue, superscalar processor. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/93/556/CSL-TR-93-556.pdf %R CSL-TR-94-599 %Z Wed, 28 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T The Design and Implementation of a High-Performance Floating-Point Divider %A Oberman, Stuart %A Quach, Nhon %A Flynn, Michael J. %D January 1994 %X The increasing computation requirements of modern computer applications have stimulated a large interest in developing extremely high-performance floating- point dividers. A variety of division algorithms are available, with SRT being utilized in many computer systems.A careful analysis of SRT divider topologies has demonstrated that a relatively simple divider designed in anaggressive circuit style can achieve extremely high performance. Further, an aggressive circuit implementation can minimize many of the performance advantages of more complex divider algorithms. This paper presents the tradeoffs of the different divider topologies, the design of the divider, and performance results. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/599/CSL-TR-94-599.pdf %R CSL-TR-94-600 %Z Mon, 07 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Environmental Limits on the Performance of CMOS Wave-Pipelined Circuits %A Nowka, Kevin J. %A Flynn, Michael J. %D January 1994 %X Wave-pipelining is a circuit design technique which allows digital synchronous systems to be clocked at rates higher than can be achieved with conventional pipelining techniques. Wave-pipelining has been successfully applied to the design of SSI processor functional units, a Bipolar Population Counter, a CMOS adder, CMOS multipliers, and several simple CMOS circuits. For controlled operating environments, speed-ups of 2 to 10 have been reported for these designs. This report details the effects of temperature variation, supply voltage variation, and process variation on wave-pipelined static CMOS designs, derives limits for the performance of wave-pipelined circuits due to these variations, and compares the performance effects with those of traditional pipelined circuits. This study finds that wave-pipelined circuits designed for commercial operating environments are limited to 2 to 3 waves per pipeline stage when clocked from a fixed frequency source. Variable rate, internal clocking can approach the theoretical limit of waves at a cost of interface complexity. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/600/CSL-TR-94-600.pdf %R CSL-TR-94-601 %Z Mon, 21 Mar 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory. %T Efficient Scheduling on Multiprogrammed Shared-Memory Multiprocessors %A Tucker, Andrew %D March 1994 %X Shared-memory multiprocessors are often used as compute servers, with multiple users running applications in a multiprogrammed style. On such systems, naive time-sharing scheduling policies can result in poor performance for parallel applications. Most parallel applications are written with the model of a stable computing environment, where applications are running uninterrupted on a fixed number of processors. On a time-sharing system, processes are interrupted periodically and the number of processors running an application continually varies. The result is an decrease in performance for a number of reasons, including processes being obliviously preempted inside critical sections and cached data being replaced by intervening processes. This thesis explores using more sophisticated scheduling systems to avoid these problems. Robust implementations of previously proposed approaches involving cache affinity scheduling and gang scheduling are developed and evaluated. It then presents the design, implementation, and performance of process control, a novel scheduling approach using explicit cooperation between the application and kernel to minimize context switching. Performance results from a suite of workloads containing both serial and parallel applications, run on a 4-processor Silicon Graphics workstation, confirm the effectiveness of the process control approach. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/601/CSL-TR-94-601.pdf %R CSL-TR-94-604 %Z Thu, 22 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Integrating multiple communication paradigms in high performance multiprocessors %A Heinlein, John %A Gharachorloo, Kourosh %A Gupta, Anoop %D February 1994 %X In the design of FLASH, the successor to the Stanford DASH multiprocessor, we are exploring architectural mechanisms for efficiently supporting both the shared memory and message passing communication models in a single system. The unique feature in the FLASH (FLexible Architecture for SHared memory) system is the use of a programmable controller at each node that replaces the functionality of hardwired cache coherence state machines in systems like DASH. The base coherence protocol is supported by executing appropriate software handlers on the programmable controller to service memory and coherence operations. The same programmable controller is also used to support message passing. This approach is attractive because of the flexibility software provides for implementing different coherence and message passing protocols, and because of the simplification in system design and debugging that arises from the shift of complexity from hardware to software. This paper focuses on the use of the programmable controller to support message passing. Our goal is to provide message passing performance that is comparable to an aggressive hardware implementation dedicated to this task. In FLASH, message data is transferred as a sequence of cache line sized units, thus exploiting the datapath support already present for cache coherence. In addition, we avoid costly interrupts to the main processor by having the programmable engine handle the control for message transfers. Furthermore, in contrast to most earlier work, we provide an integrated solution that handles the interaction of message data with virtual memory, protected multiprogramming, and cache coherence. Our preliminary performance studies indicate that this system can sustain message transfers at a rate of several hundred megabytes per second, efficiently utilizing the available network bandwidth. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/604/CSL-TR-94-604.pdf %R CSL-TR-94-613 %Z Wed, 26 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Design and Validation of Update-Based Cache Coherence Protocols %A Glasco, David B. %A Delagi, Bruce A. %A Flynn, Michael J. %D March 1994 %X In this paper, we present the details of the two update-based cache coherence protocols for scalable shared-memory multiprocessors that were studied in our previous work. First, the directory structures required for the protocols are briefly reviewed. Next, the state diagrams and some examples of the two update-based protocols are presented; one of the protocols is based on a centralized directory, and the other is based on a singly-linked distributed directory. Protocol deadlock and the additional requirements placed the protocols to avoid such deadlock are also examined. Finally, protocol validation using an exhaustive validation tool known as Murphi is discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/613/CSL-TR-94-613.pdf %R CSL-TR-94-614 %Z Mon, 21 Mar 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory. %T Co-Synthesis of Hardware and Software for Digital Embedded Systems %A Gupta, Rajesh Kumar %D December 1993 %X As the complexity of systems being subject to computer-aided synthesis and optimization techniques increases, so does the need to find ways to incorporate predesigned components into the final system implementation. In this context, a general-purpose microprocessor provides a sophisticated low-cost component that can be tailored to realize most system functions through appropriate software. This approach is particularly useful in the design of embedded systems that have a relatively simple target architecture, when compared to general-purpose computing systems such as workstations. In embedded systems the processor is used as a resource dedicated to implement specific functions. However, the design issues in embedded systems are complicated since most of these systems operate in a time-constrained environment. Recent advances in chip-level synthesis have made it possible to synthesize application-specific circuits under strict timing constraints. This dissertation formulates the problem of computer-aided design of embedded systems using both application-specific as well as general-purpose reprogrammable components under timing constraints. Given a specification of system functionality and constraints in a hardware description language, we model the system as a set of bilogic flow graphs, and formulate the co-synthesis problem as a partitioning problem under constraints. Timing constraints are used to determine the parts of the system functionality that are delegated to application-specific hardware and the software that runs on the processor. The software component of such a 'mixed' system poses an interesting problem due to its interaction with concurrently operating hardware. We address this problem by generating software as a set of concurrent fixed-latency serialized operations called threads. The satisfaction of the imposed performance constraints is then ensured by exploiting concurrency between program threads, achieved by an inter-leaved execution on a single processor system. This co-synthesis of hardware and software from behavioral specifications makes it possible to build time-constrained embedded systems by using off-the-shelf parts and application-specific circuitry. Due to the reduction in size of application-specific hardware needed compared to an all-hardware solution, the needed hardware component can be easily mapped to semicustom VLSI such as gate arrays, thus shortening the design time. In addition, the ability to perform a detailed analysis of timing performance provides an opportunity to improve the system definition by creating better prototypes. The algorithms and techniques described have been implemented in a framework called Vulcan, which is integrated with the Stanford Olympus Synthesis System and provides a path from chip-level synthesis to system-level synthesis. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/614/CSL-TR-94-614.pdf %R CSL-TR-94-618 %Z Wed, 05 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimum Routing of Multicast Audio and Video Streams in Communications Networks %A Noronha, Ciro A., Jr. %A Tobagi, Fouad A. %D April 94 %X In this report, we consider the problem of routing multicast audio and video streams in a communications network. After describing the previous work in the area and identifying its shortcomings, we show that the problem of optimally routing multicast streams can be formulated as an integer programming problem. We propose an efficient solution technique, composed of two parts: (i) an extension to the decomposition principle, to speed up the linear relaxation of the problem, and (ii) enhanced value-fixing rules, to prune the search space for the integer problem. We characterize the reduction in run time gained using these techniques. Finally, we compare the run times for the optimum multicast routing algorithm and for existing heuristic algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/618/CSL-TR-94-618.pdf %R CSL-TR-94-619 %Z Wed, 05 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Evaluation of Multicast Routing Algorithms for Multimedia Streams %A Noronha, Ciro A., Jr. %A Tobagi, Fouad A. %D April 1994 %X Multimedia applications place new requirements on networks as compared to traditional data applications: (i) they require relatively high bandwidths on a continuous basis for long periods of time; (ii) involve multipoint communications and thus are expected to make heavy use of multicasting; and (iii) tend to be interactive and thus require low latency. These requirements must be taken into account when routing multimedia traffic in a network. This report presents a performance evaluation of routing algorithms in the multimedia environment, where the requirements of multipoint communications, bandwidth and latency must be satisfied. We present an exact solution to the optimum multicast routing problem, based on integer programming, and use this solution as a benchmark to evaluate existing heuristic algorithms, considering both performance and cost of implementation (as measured by the average run time), under realistic network and traffic scenarios. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/619/CSL-TR-94-619.pdf %R CSL-TR-94-620 %Z Thu, 10 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler %A Wilson, Robert %A French, Robert %A Wilson, Christopher %A Amarasinghe, Saman %A Anderson, Jennifer %A Tjiang, Steve %A Liao, Shih-Wei %A Tseng, Chau-Wen %A Hall, Mary %A Lam, Monica %A Hennessy, John %D May 1994 %X Compiler infrastructures that support experimental research are crucial to the advancement of high-performance computing. New compiler technology must be implemented and evaluated in the context of a complete compiler, but developing such an infrastructure requires a huge investment in time and resources. We have spent a number of years building the SUIF compiler into a powerful, flexible system, and we would now like to share the results of our efforts. SUIF consists of a small, clearly documented kernel and a toolkit of compiler passes built on top of the kernel. The kernel defines the intermediate representation, provides functions to access and manipulate the intermediate representation, and structures the interface between compiler passes. The toolkit currently includes C and Fortran front ends, a loop-level parallelism and locality optimizer, an optimizing MIPS back end, a set of compiler development tools, and support for instructional use. Although we do not expect SUIF to be suitable for everyone, we think it may be useful for many other researchers. We thus invite you to use SUIF and welcome your contributions to this infrastructure. The SUIF software is freely available via anonymous ftp from suif.Stanford.EDU. Additional information about SUIF can be found on the World-Wide Web at http://suif.Stanford.EDU. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/620/CSL-TR-94-620.pdf %R CSL-TR-94-621 %Z Wed, 09 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Synthesis for Scan Dependence in Built-In Self-Testable Designs %A Avra, LaNae J. %A McCluskey, Edward J. %D May 1994 %X This report introduces new design and synthesis techniques that reduce the area and improve the performance of embedded built-in self-test (BIST) architectures such as circular BIST and parallel BIST. Our goal is to arrange the system bistables into scan paths so that some of the BIST and scan logic is shared with the system logic. Logic sharing is possible when scan dependence is introduced in the design. Other BIST design techniques attempt to avoid all types of scan dependence because it can reduce the fault coverage of embedded, multiple input signature registers (MISRs). We show that introducing certain types of scan dependence in embedded MISRs can result in reduced overhead and improved fault coverage, and we describe synthesis techniques that maximize the amount of this beneficial scan dependence. Finally, we present fault simulation, layout area, and delay results for circular BIST versions of benchmark circuits that have been synthesized with our techniques. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/621/CSL-TR-94-621.pdf %R CSL-TR-94-622 %Z Wed, 09 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Synthesis-for-Test Design System %A Avra, LaNae J. %A Gerbaux, Laurent %A Giomi, Jean-Charles %A Martinolle, Francoise %A McCluskey, Edward J. %D May 1994 %X Hardware synthesis techniques automatically generate a structural hardware implementation given an abstract (e.g., functional, behavioral, register transfer) description of the behavior of the design. Existing hardware synthesis systems typically use cost and performance as the main criteria for selecting the best hardware implementation, and seldom even consider test issues during the synthesis process. We have developed and implemented a computer-aided design tool whose primary objective is to generate the lowest-cost, highest-performance hardware implementation that also meets specified testability requirements. By considering testability during the synthesis process, the tool is able to generate designs that are optimized for specific test techniques. The input to the tool is a behavioral VHDL specification that consists of high-level software language constructs such as conditional statements, assignment statements, and loops, and the output is a structural VHDL description of the design. Implemented synthesis procedures include compiler optimizations, inter-process analysis, high-level synthesis operations (scheduling, allocation, and binding) and control logic generation. The purpose of our design tool is to serve as a platform for experimentation with existing and future synthesis-for-test techniques, and it can currently generate designs optimized for both parallel and circular built-in self-test architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/622/CSL-TR-94-622.pdf %R CSL-TR-94-623 %Z Wed, 12 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Communication Mechanisms in Shared Memory Multiprocessors %A Byrd, Gregory T. %A Delagi, Bruce A. %A Flynn, Michael J. %D May 1994 %X Shared memory systems generally support consumer-initiated communication; when a process needs data, it is retrieved from the global memory. Systems that were designed around the message passing model, on the other hand, support producer-initiated communication mechanisms; the producer of data sends it directly to the other processes that require it. Parallel applications require both kinds of communication. In this paper, we examine the performance of five shared-memory communication mechanisms -- invalidate-based cache coherence, prefetch, locks, deliver, and StreamLine -- to determine the effectiveness of architectural support for efficient producer-initiated communication. We find that StreamLine, a cached-based message passing mechanism, offers the best performance on our simulated benchmarks. In addition, StreamLine is much less sensitive to system parameters such as cache line size and network performance. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/623/CSL-TR-94-623.pdf %R CSL-TR-94-626 %Z Wed, 12 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Synthesis and Optimization of Synchronous Logic Circuits %A Damiani, Maurizio %D June 1994 %X The design automation of complex digital circuits offers important benefits. It allows the designer to reduce design time and errors, to explore more thoroughly the design space, and to cope effectively with an ever-increasing project complexity. This dissertation presents new algorithms for the logic optimization of combinational and synchronous digital circuits. These algorithms rely on a common paradigm. Namely, global optimization is achieved by the iterative local optimization of small subcircuits. The dissertation first explores the combinational case. Chapter 2 presents algorithms for the optimization of subnetworks consisting of a single-output subcircuit. The design space for this subcircuit is described implicitly by a Boolean function, a so-called function . Efficient methods for extracting this function are presented. Chapter 3 is devoted to a novel method for the optimization of multiple-output subcircuits. There, we introduce the notion of compatible gates . Compatible gates represent subsets of gates whose optimization is particularly simple. The other three chapters are devoted to the optimization of synchronous circuits. Following the lines of the combinational case, we attempt the optimization of the gate-level (rather than the state diagram -level) representation. In Chapter 4 we focus on extending combinational techniques to the sequential case. In particular, we present algorithms for finding a synchronous function that can be used in the optimization process. Unlike the combinational case, however, this approach is exact only for pipeline-like circuits. Exact approaches for general, acyclic circuits are presented in Chapter 5. There, we introduce the notion of synchronous recurrence equation. Eventually, Chapter 6 presents methods for handling feedback interconnection. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/626/CSL-TR-94-626.pdf %R CSL-TR-94-627 %Z Wed, 12 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Efficient Shared Memory Layer for Distributed Memory Machines. %A Scales, Daniel J. %A Lam, Monica S. %D July 1994 %X This paper describes a system called SAM that simplifies the task of programming machines with distributed address spaces by providing a shared name space and dynamic caching of remotely accessed data. SAM makes it possible to utilize the computational power available in networks of workstations and distributed memory machines, while getting the ease of programming associated with a single address space model. The global name space and caching are especially important for complex scientific applications with irregular communication and parallelism. SAM is based on the principle of tying synchronization with data accesses. Precedence constraints are expressed by accesses to single-assignment values, and mutual exclusion constraints are represented by access to data items called accumulators. Programmers easily express the communication and synchronization between processes using these operations; they can also use alternate paradigms tyhat are built with the SAM primitives. Operations for prefetching data and explicitly sending data to another processor integrate cleanly with SAM's shared memory model and allow the user to obtain the efficiency of message passing when necessary. We have built implementations of SAM for the CM-5, the Intel iPSC/860, the Intel Paragon, the IBM SP1, and heterogeneous networks of Sun, SGI, and DEC workstations (using PVM). In this report, we describe the basic functionality provided by SAM, discuss our experience in using it to program a variety of scientific applications and distributed data structures, and provide performance results for these complex applications on a range of machines. Our experience indicates that SAM significantly simplifies the programming of these parallel systems, supports the necessary functionality for developing efficient implementations of sophisticated applications, and provides portability across a range of distributed memory environments. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/627/CSL-TR-94-627.pdf %R CSL-TR-94-628 %Z Mon, 09 Jan 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Tolerating Latency Through Software-Controlled Data Prefetching %A Mowry, Todd C. %D June 1994 %X The large latency of memory accesses in modern computer systems is a key obstacle to achieving high processor utilization. Furthermore, the technology trends indicate that this gap between processor and memory speeds is likely to increase in the future. While increased latency affects all computer systems, the problem is magnified in large-scale shared-memory multiprocessors, where physical dimensions cause latency to be an inherent problem. To cope with the memory latency problem, the basic solution that nearly all computer systems rely on is their cache hierarchy. While caches are useful, they are not a panacea. Software-controlled prefetching is a technique for tolerating memory latency by explicitly executing prefetch instructions to move data close to the processor before it is actually needed. This technique is attractive because it can hide both read and write latency within a single thread of execution while requiring relatively little hardware support. Software-controlled prefetching, however, presents two major challenges. First, some sophistication is required on the part of either the programmer, runtime system, or (preferably) the compiler to insert prefetches into the code. Second, care must be taken that the overheads of prefetching, which include additional instructions and increased memory queueing delays, do not outweigh the benefits. This dissertation proposes and evaluates a new compiler algorithm for inserting prefetches into code. The proposed algorithm attempts to minimize overheads by only issuing prefetches for references that are predicted to suffer cache misses. The algorithm can prefetch both dense-matrix and sparse-matrix codes, thus covering a large fraction of scientific applications. It also works for both uniprocessor and large-scale shared-memory multiprocessor architectures. We have implemented our algorithm in the SUIF (Stanford University Intermediate Form) optimizing compiler. The results of our detailed architectural simulations demonstrate that the speed of some applications can be improved by as much as a factor of two, both on uniprocessor and multiprocessor systems. This dissertation also compares software-controlled prefetching with other latency-hiding techniques (e.g., locality optimizations, relaxed consistency models, and multithreading), and investigates the architectural support necessary to make prefetching effective. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/628/CSL-TR-94-628.pdf %R CSL-TR-94-629 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Precise Delay Generation Using Coupled Oscillators %A Maneatis, John George %D June 1994 %X This thesis describes a new class of delay generation structures which can produce precise delays with sub- gate delay resolution. These structures are based on coupled ring oscillators which oscillate at the same frequency. One such structure, called an array oscillator, consists of a linear array of ring oscillators. A unique coupling arrangement forces the outputs of the ring oscillators to be uniformly offset in phase by a precise fraction of a buffer delay. This arrangement enables the array oscillator to achieve a delay resolution equal to a buffer delay divided by the number of rings. Another structure, called a delay line oscillator, consists of a series of delay stages, each based on a single coupled ring oscillator. These delay stages uniformly span the delay interval to which they are phase locked. Each delay stage is capable of generating a phase shift that varies over a positive and negative range. These characteristics allow the structure to precisely subdivide delays into arbitrarily small intervals. The buffer stages used in the ring oscillators must have high supply noise rejection to avoid losing precision to output jitter. This thesis presents several types of buffer stage designs for achieving high supply noise rejection and low supply voltage operation. These include a differential buffer stage design based on a source coupled pair using load elements with symmetric I-V characteristics and a single-ended buffer stage design based on a diode clamped common source device. The thesis also discusses techniques for achieving low jitter phase-locked loop performance which is important to achieving high precision. Based on the concepts developed in this thesis, an experimental differential array oscillator delay generator was designed and fabricated in a 1.2-um N- well CMOS technology. The delay generator achieved a delay resolution of 43ps while operating at 331MHz with peak delay error of 47ps. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/629/CSL-TR-94-629.pdf %R CSL-TR-94-630 %Z Thu, 27 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Expansion Caches For Superscalar Processors %A Johnson, John D. %D June 1994 %X Superscalar implementations present increased demands on instruction caches as well as instruction decoding and issuing mechanisms leading to very complex hardware requirements. This work proposes utilizing an expanded instruction cache to reduce and simplify the complexity of hardware required to implement a superscalar machine. Trace driven simulation is used for evaluating the presented Expanded Parallel Instruction Cache (EPIC) machine and its performance is found to be comparable to a dynamically scheduled superscalar model. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/630/CSL-TR-94-630.pdf %R CSL-TR-94-632 %Z Mon, 07 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Benefits of Clustering in Shared Address Space Multiprocessors: An Applications-Driven Investigation %A Erlichson, Andrew %A Nayfeh, Basem A. %A Singh, Jaswinder Pal %A Olukotun, Kunle %D October 1994 %X Clustering processors together at a level of the memory hierarchy in shared address space multiprocessors appears to be an attractive technique from several standpoints: Resources are shared, packaging technologies are exploited, and processors within a cluster can share data more effectively. We investigate the performance benefits that can be obtained by clustering on a range of important scientific and engineering applications. We find that in general clustering is not very effective in reducing the inherent communication to computation ratios. Clustering is more useful in reducing working set requirements in unstructured applications, and can improve performance substantially when small first level caches are clustered in these cases. This suggests that clustering at the first level cache might be useful in highly-integrated, relatively fine-grained environments. For less integrated machines such as current distributed shared memory multiprocessors, our results suggest that clustering is not very useful in improving application performance, and the decision about whether or not to cluster should be made on the basis of engineering and packaging constraints. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/632/CSL-TR-94-632.pdf %R CSL-TR-94-633 %Z Wed, 09 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Synthesis Techniques for Built-In Self-Testable Designs %A Avra, LaNae Joy %D July 1994 %X This technical report contains the text of LaNae Joy Avra's thesis "Synthesis Techniques for Built-In Self-Testable Designs." %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/633/CSL-TR-94-633.pdf %R CSL-TR-94-634 %Z Wed, 05 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Architectural and Implementation Tradeoffs for Multiple-Context Processors %A Laudon, James P. %D September 1994 %X Tolerating memory latency is essential to achieving high performance in scalable shared-memory multiprocessors. In addition, tolerating instruction (pipeline dependency) latency is essential to maximize the performance of individual processors. Multiple-context processors have been proposed as a universal mechanism to mitigate the negative effects of latency. These processors tolerate latency by switching to a concurrent thread of execution whenever one of the threads blocks due to a high-latency operation. Multiple context processors built so far, however, either have a high context-switch cost which disallows tolerance of short latencies (e.g., due to pipeline dependencies), or alternatively they require excessive concurrency from the software. We propose a multiple-context architecture that combines full single-thread support with cycle-by-cycle context interleaving to provide lower switch costs and the ability to tolerate short latencies. We compare the performance of our proposal with that of earlier approaches, showing that our approach offers substantially better performance for parallel applications. We also explore using our approach for uniprocessor workstations --- an important environment for commodity microprocessors. We show that our approach also offers much better performance for multiprogrammed uniprocessor workloads. Finally, we explore the implementation issues for both our proposed and existing multiple-context architectures. One of the larger costs for a multiple-context processor arises in providing a cache capable of handling multiple outstanding requests, and we propose a lockup-free cache which provides high performance at a reasonable cost. We also show that amount of processor state that needs to be replicated to support multiple contexts is modest and the extra complexity required to control the multiple contexts under both our proposed and existing approaches is manageable. The performance benefits and reasonable implementation cost of our approach make it a promising candidate for addition to future microprocessors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/634/CSL-TR-94-634.pdf %R CSL-TR-94-635 %Z Thu, 01 Sep 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T A Performance/Area Workbench for Cache Memory Design %A Okuzawa, Osamu %A Flynn, Michael J. %D August 1994 %X For high performance processor design, cache memory size is an important parameter which directly affects performance and the chip area. Modeling performance and area is required for design tradeoff of cache memory. This paper describes a tool which calculates cache memory performance and area. A designer can try a variety of cache parameters to complete the specification of a cache memory. Data examples calculated using this tool are shown. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/635/CSL-TR-94-635.pdf %R CSL-TR-94-636 %Z Thu, 27 Oct 94 00:00:00 GMT %I Stanford University, Department of Computer Science %T Mable: A Technique for Efficient Machine Simulation %A Davies, Peter %A Lacroute, Philippe %A Heinlein, John %A Horowitz, Mark %D October 1994 %X We present a framework for an efficient instruction-level machine simulator which can be used with existing software tools to develop and analyze programs for a proposed processor architecture. The simulator exploits similarities between the instruction sets of the emulated machine and the host machine to provide fast simulation. Furthermore, existing program development tools on the host machine such as debuggers and profilers can be used without modification on the emulated program running under the simulator. The simulator can therefore be used to debug and tune application code for the new processor without building a whole new set of program development tools. The technique has applicability to a diverse set of simulation problems. We show how the framework has been used to build simulators for a shared-memory multiprocessor, a superscalar processor with support for speculative execution, and a dual-issue embedded processor. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/636/CSL-TR-94-636.pdf %R CSL-TR-94-637 %Z Mon, 28 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Testing Digital Circuits for Timing Failures by Output Waveform Analysis %A Franco, Piero %D September 1994 %X Delay testing is done to ensure that a digital circuit functions at the designed speed. Delay testing is complicated by test invalidation and fault detection size. Furthermore, we show that simple delay models are not sufficient to provoke the longest delay through a circuit. Even if all paths are robustly tested, path delay testing cannot guarantee that the circuit functions at the desired speed. Output Waveform Analysis is a new approach for detecting timing failures in digital circuits. Unlike conventional testing where the circuit outputs are sampled, the waveform between samples is analyzed. The motivation is that delay changes affect the shape of the output waveform, and information can be extracted from the waveform to detect timing failures. This is especially useful as a Design-for-Testability technique for Built-In Self-Test or pseudo-random testing environments, where delay tests are difficult to apply and test invalidation is a problem. Stability Checking is a simple form of Output Waveform Analysis. In a fault-free circuit, the outputs are expected to have reached the desired logic values by the time they are sampled, so delay faults can be detected by observing the outputs for any changes after the sampling time. Apart from traditional delay testing, Stability Checking is also useful for on-line or concurrent testing under certain timing restrictions. A padding algorithm was implemented to show that circuits can be efficiently modified to meet the required timing constraints. By analyzing the output waveform before the sampling time, circuits with timing flaws can be detected even before the circuit fails. This is useful in high reliability applications as a screening technique that does not stress the circuit, and for wear-out prediction. A symbolic waveform simulator has been implemented to show the benefits of the proposed Output Waveform Analysis techniques. Practical test architectures have been designed, and various waveform analyzers have been manufactured and tested. These include circuits implemented using the Stanford BiCMOS process, and a design implemented in a 25k gate Test Evaluation Chip Experiment. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/637/CSL-TR-94-637.pdf %R CSL-TR-94-638 %Z Wed, 12 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Design, Implementation and Evaluation of Jade: A Portable, Implicitly Parallel Programming Language %A Rinard, Martin C. %D August 1994 %X Over the last decade, research in parallel computer architecture has led to the development of many new parallel machines. These machines have the potential to dramatically increase the resources available for solving important computational problems. The widespread use of these machines, however, has been limited by the difficulty of developing useful parallel software. This thesis presents the design, implementation and evaluation of Jade, a new programming language for parallel computations that exploit task-level concurrency. Jade is structured as a set of constructs that programmers use to specify how a program written in a standard sequential, imperative language accesses data. The implementation dynamically analyzes these specifications to automatically extract the concurrency and map the computation onto the parallel machine. The resulting parallel execution preserves the semantics of the original serial program. We have implemented Jade on a wide variety of parallel computing platforms: shared-memory multiprocessors such as the Stanford DASH machine, homogeneous message-passing machines such as the Intel iPSC/860, and on heterogeneous networks of workstations. Jade programs port without modification between all of these platforms. We evaluate the design and implementation of Jade by parallelizing several complete scientific and engineering applications in Jade and executing these applications on several computational platforms. We analyze how well Jade supports the process of developing these applications and present results that characterize how well they perform. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/638/CSL-TR-94-638.pdf %R CSL-TR-94-639 %Z Wed, 05 Oct 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Two Case Studies in Latency Tolerant Architectures %A Bennett, James E. %A Flynn, Michael J. %D October 1994 %X Researchers have proposed a variety of techniques for dealing with memory latency, such as dynamic scheduling, hardware prefetching, software prefetching, and multiple contexts. This paper presents the results of two case studies on the usefulness of some simple techniques for latency tolerance. These techniques are nonblocking caches, reordering of loads and stores, and basic block scheduling for the expected latency of loads. The effectiveness of these techniques was found to vary according to the type of application. While nonblocking caches and load/store reordering consistently improved performance, scheduling based on expected latency was found to decrease performance in most cases. This result shows that the assumption of a uniform miss rate used by the scheduler is incorrect, and suggests that techniques for estimating the miss rates of individual loads are needed. These results were obtained using a new simulation environment, MXS, currently under development. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/639/CSL-TR-94-639.pdf %R CSL-TR-94-640 %Z Mon, 28 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Transformed Pseudo-Random Patterns for BIST %A Touba, Nur A. %A McCluskey, Edward J. %D October 1994 %X This paper presents a new approach for on-chip test pattern generation. The set of test patterns generated by a pseudo-random pattern generator (e.g., an LFSR) is transformed into a new set of patterns that provides the desired fault coverage. The transformation is performed by a small amount of mapping logic that decodes sets of patterns that don't detect any new faults and maps them into patterns that detect the hard-to-detect faults. The mapping logic is purely combinational and is placed between the pseudo-random pattern generator and the circuit under test (CUT). A procedure for designing the mapping logic so that it satisfies test length and fault coverage requirements is described. Results are shown for benchmark circuits which indicate that an LFSR plus a small amount of mapping logic reduces the test length required for a particular fault coverage by orders of magnitude compared with using an LFSR alone. These results are compared with previously published results for other methods, and it is shown that the proposed method requires much less overhead to achieve the same fault coverage for the same test length. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/640/CSL-TR-94-640.pdf %R CSL-TR-94-642 %Z Wed, 09 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Apparatus for Pseudo-Deterministic Testing %A Mukund, Shridhar K. %A McCluskey, Edward J. %A Rao, T.R.N. %D October 1994 %X Pseudo-random testing is popularly used, particularly in Built-In Self Test (BIST) applications. To achieve a desired fault coverage, pseudo-random patterns are often supplemented with few deterministic patterns. When positions of deterministic patterns in the pseudo-random sequence are known a priori, pseudo-random sub-sequences can be chosen such that they also cover these deterministic patterns. We call this method of test application, pseudo-deterministic testing. The theory of discrete logarithm has been applied to determine positions of bit-patterns in the pseudo-random sequence generated by a modular form or internal-XOR Line ar Feedback Shift Register (LFSR) [5,7]. However, the scheme requires that all the inputs of the combinational logic block (CLB), under test, come from the same LFSR source. This constraint in circuit configuration severely limits its application. In this paper, we propose a practical and cost effective technique for pseudo-de terministic testing. For most part, the problem of circuit configuration has been simplified to one of scan path insertion, by employing LFSR/SR (an arbitrary length shift register driven by a standard form or external-XOR LFSR). To enable the usage of LFSR/SR as a pseudo-deterministic pattern source, we propose a method to determine positions of bit-patterns, at arbitrarily chosen tap configurations, in the LFSR/SR sequence. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/642/CSL-TR-94-642.pdf %R CSL-TR-94-643 %Z Tue, 20 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design-for-Current-Testability (DFCT) for Dynamic CMOS Logic %A Ma, Siyad C. %A McCluskey, Edward J. %D November 1994 %X The applicability of quiescent current monitoring (IDDQ testing) to dynamic logic is discussed here. IDDQ is very useful in detecting some defects that can escape functional and delay tests, however, we show that some defects in domino logic cannot be detected by either voltage or current measurements. A design-for-current-testability (DFCT) modification for dynamic logic is presented and shown to enable detection of these defects. The DFCT circuitry is designed with a negligible performance impact during normal operation. This is particularly important since the main reason for using dynamic logic is because of its speed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/643/CSL-TR-94-643.pdf %R CSL-TR-94-644 %Z Wed, 21 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Synthesis of Asynchronous Controllers for Heterogeneous Systems %A Yun, Kenneth Yi %D August 1994 %X There are two synchronization mechanisms used in digital systems: synchronous and asynchronous. Synchronous or asynchronous refers to whether the system events occur in lock-step based on a clock or not. Today's system components typically employ the synchronous paradigm primarily because of the availability of the rich set of design tools and algorithms and, perhaps, because of the designers' perception of ``ease of design'' and the lack of alternatives. Even so, the interfaces among the system components do not strictly adhere to the synchronous paradigm because of the cost benefit of mixing modules operating at different clock rates and modules with asynchronous interfaces. This thesis addresses the problem of how to synthesize controllers operating in heterogeneous systems - systems with components employing different synchronization mechanisms. We introduce a new design style called extended-burst-mode. The extended-burst-mode design style covers a wide spectrum of sequential circuits ranging from delay-insensitive to synchronous. We can synthesize multiple-input change asynchronous finite state machines, and many circuits that fall in the gray area between synchronous and asynchronous which are difficult or impossible to synthesize automatically using existing methods. Our implementation of extended-burst-mode machines uses standard combinational logic, generates low-latency outputs and guarantees freedom from hazards at the gate level. We present a complete set of automated sequential synthesis algorithms: hazard-free state assignment, hazard-free state minimization, and critical-race-free state encoding. We also describe two radically different hazard-free combinational synthesis methods: two-level sums-of-products implementation and multiplexor trees implementation. Existing theories for hazard-free combinational synthesis are extended to handle non-monotonic input changes. A set of requirements for freedom from logic hazards is presented for each combinational synthesis method. Experimental data from a large set of examples are presented and compared to competing methods, whenever possible. To demonstrate the effectiveness of the design style and the synthesis tool, the design of a commercial-scale SCSI controller data path is presented. This design is functionally compatible with an existing high performance commercial chip and meets the ANSI SCSI-2 standard. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/644/CSL-TR-94-644.pdf %R CSL-TR-94-646 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Technology Mapping for VLSI Circuits Exploiting Boolean Properties and Operations %A Mailhot, Frederic %D December 1994 %X Automatic synthesis of digital circuits has gained increasing importance. The synthesis process consists of transforming an abstract representation of a system into an implementation in a target technology. The set of transformations has traditionally been broken into three steps: high-level synthesis, logic synthesis and physical design. This dissertation is concerned with logic synthesis. More specifically, we study technology mapping, which is the link between logic synthesis and physical design. The object of technology mapping is to transform a technology-independent logic description into an implementation in a target technology. One of teh key operations during technology mapping is to recognize logic equivalence between a portion of the initial logic description and an element of the target technology. We introduce new methods for establishing logic equivalence between two logic functions. The techniques, based on Boolean comparisons, use Binary Decision Diagrams (BDDs). An algorithm for dealing with completely specified functions is first presented. Then we introduce a second algorithm, which is applicable to incompletely specified functions. We also present an ensemble of techniques for optimizing delay, which rely on an iterative approach. All these methods have proven to be efficient both for run-times and quality of results, when compared to other existing technology mapping systems. The algorithms presented have been implemented in a technology mapping program, Ceres. Results are shown that highlight the apllication of the different algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/646/CSL-TR-94-646.pdf %R CSL-TR-94-647 %Z Tue, 06 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design Issues in Floating-Point Division %A Oberman, Stuart F. %A Flynn, Michael J. %D December 1994 %X Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay close attention to all aspects of floating-point computation. This paper presents the algorithms often utilized for floating-point division, and it also presents implementation alternatives available for designers. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/647/CSL-TR-94-647.pdf %R CSL-TR-94-648 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Synthesis of Gate-Level Speed-Independent Circuits %A Beerel, Peter A. %A Myers, Chris J. %A Meng, Teresa H.-Y. %D December 1994 %X This paper presents a CAD tool for the synthesis of robust asynchronous control circuits using limited-fanin basic gates such as AND gates, OR gates, and C-elements. The synthesized circuits are speed-independent; that is, they work correctly regardless of individual gate delays. Included in our synthesis procedure is an efficient procedure for logic optimizations using {\em observability don't cares} and {\em incremental verification}. We apply the procedure to a variety of specifications taken from industry and previously published examples and compare our speed-independent implementations to those generated using a non-speed-independent synthesis procedure included in Berkeley's SIS. Our implementations are not only more robust to delay variations since those produced by SIS rely on bounded delay lines to avoid circuit hazards but also are on average 13 percent faster with an area penalty of only 14 percent. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/648/CSL-TR-94-648.pdf %R CSL-TR-94-649 %Z Thu, 08 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Routing of Streams in WDM Reconfigurable Networks %A Noronha, Ciro A., Jr. %A Tobagi, Fouad A. %D December 1994 %X Due to its low attenuation, fiber has become that medium of choice for point-to-point links. Using Wavelength-Division Multiplexing (WDM), many channels can be created in the same fiber. A network node equipped with a tunable optical transmitter can select any of these channels for sending data. An optical interconnection combines the signal from the various receivers in the network, and makes it available to the optical receivers, which may also be tunable. By properly tuning transmitters and/or receivers, point-to-point links can be dynamically created and destroyed. Therefore, in a WDM network, the routing algorithm has an additional degree of freedom compared to traditional networks: it can modify the network topology to create the routes. In this report, we consider the problem of routing audio/video streams in WDM networks. We present a general linear integer programming formulation for the problem. However, since this is a complex solution, we propose simpler heuristic algorithms, both for the unicast case and for the multicast case. The performance of these heuristics is evaluated in a number of scenarios, with a realistic traffic model, and from the evaluation we derive guidelines for usage of the heuristic algorithms. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/649/CSL-TR-94-649.pdf %R CSL-TR-94-653 %Z Wed, 21 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Routing of Video/Audio Streams In Packet-Switched Networks %A Noronha, Ciro A., Jr. %D December 1994 %X The transport of multimedia streams in computer communication networks raises issues at all layers of the OSI model. This thesis considers some of the issues related to supporting multimedia streams at the network layer; in particular, the issue of appropriate routing algorithms. New routing algorithms, capable of efficiently meeting multimedia requirements, are needed. We formulate the optimum multipoint stream routing problem as a linear integer programming problem and propose an efficient solution technique. We show that the proposed solution technique significantly decreases the time to compute the solution, when compared to traditional methods. We use the optimum multicast stream routing problem as a benchmark to characterize the performance of existing heuristic algorithms under realistic network and traffic scenarios, and derive guidelines for using their usage and for upgrading the network capacity. We also consider the problem of routing multimedia streams in a Wavelength-Division Multiplexing (WDM) optical network, which has an additional degree of freedom over traditional networks - its topology can be changed by the routing algorithm to create routes as needed, by tuning optical transmitters and/or receivers. We show that the optimum reconfiguration and routing problem can formulated as a linear integer programming problem. Since this is a complex solution, we also propose a set of heuristic algorithms, both for unicast and multicast routing. We evaluate the performance of the proposed heuristics and derive guidelines for their usage. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/653/CSL-TR-94-653.pdf %R CSL-TR-94-654 %Z Mon, 19 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Multipliers and Datapaths %A Al-Twaijry, Hesham %A Flynn, Michael J. %D December 1994 %X People traditionally have considered the number of counters in the critical path as the metric for the performance of a multiplier. This report presents the view that tree topologies which have the least number of levels do not always give the fastest possible multiplier when constrained to be part of a microprocessor. It proposes two new topologies: hybrid structure and higher order arrays which are faster than conventional tree topologies for typical datapaths. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/654/CSL-TR-94-654.pdf %R CSL-TR-94-655 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T I/O Characterization and Attribute Caches for Improved I/O System Performance %A Richardson, Kathy J. %D December 1994 %X Workloads generate a variety of disk I/O requests to access file information, execute programs, and perform computation. I/O caches capture most of these requests, reducing execution time, providing high I/O rates, and decreasing the disk bandwidth needed by each workload. A cache has difficulty capturing the full range of I/O behavior, however, when it treats the requests as single stream of uniform tasks. The single stream contains I/O requests for data with vastly different reuse rates and access patterns. Disk files can be classified as accesses to inodes, directories, datafiles or executables. The combined cache behavior of all four taken together provides few clues for improving performance of the I/O cache. But individually, the cache behavior of each reveals the distinct components that make up aggregate I/O behavior. Inodes and directories turn out to be small, highly reused files. Datafiles and executable files have more diverse characteristics. The smaller ones exhibit moderate reuse and have little sequential access, while the larger files tend to be accessed sequentially and not reused. Properly used, file type and file size information improves cache performance. The dissertation introduces attribute caches to improve I/O cache performance. Attribute caches use file attributes to selectively cache I/O data with a cache scheme tailored to the expected behavior of the file type. Inodes and directories are cached in very small blocks, capitalizing on their high reuse rate, and small space requirements. Large files are cached in large cache blocks capitalizing on their sequential access patterns. Small and medium sized files are cached in average 4 kbyte blocks that minimizes the memory required to service the bulk of requests. The portion of cache dedicated to each group varies with total cache size. This allows the important features of the workload to be captured at the appropriate cache size, and increases the total cache utilization. For a set of 11 measured workloads an attribute cache scheme reduced the miss ratio 25--60\% depending on cache size, and required only about 1/8 as much memory as a typical I/O cache implementation achieving the same miss ratio. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/655/CSL-TR-94-655.pdf %R CSL-TR-94-656 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T I/O Characterization and Attribute Cache Data for Eleven Measured Workloads %A Richardson, Kathy J. %D December 1994 %X Workloads generate a variety of disk I/O requests to access file information, execute programs, and perform computation. Workload characterization is crucial to optimizing I/O system performance. This report contains detailed workload characterization data for eleven measured workloads. It includes numerous tables, and cache behavior plots for each workload. The workload I/O traces, from which the characterization is derived, include both file system information and I/O system information, where previous traces only included one or the other. The additional information allows I/O characterization at the system level, and greatly increases the body of knowledge about the make-up and type of disk I/O requested. The new information shows that the I/O request stream contains statistically diverse components that can be separated. This allows the important features of the workload to be captured at the appropriate cache size, and increases the total cache utilization. Note: This technical report is a companion report to the dissertation "I/O Characterization and Attribute Caches for Improved I/O System Performance" (CSL-TR-94-655). While the dissertation is self contained, this report is not; it presents data that is analyzed and discussed only in the dissertation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/656/CSL-TR-94-656.pdf %R CSL-TR-94-624 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T WSIM: A Symbolic Waveform Simulator %A Franco, Piero %A McCluskey, Edward J. %D June 1994 %X A symbolic waveform simulator is proposed in this report. The delay of faulty element is treated as a variable in the generation of the output waveform. Therefore, many timing simulations with different delay values do not have to be done to analyze the behavior of the circuit-under-test with the timing fault. The motivation for this work was to investigate delay testing by Output Waveform Analysis, where an accurate representation of the actual waveforms is required, although the simulator can be used for other applications as well (such as power analysis). Output Waveform Analysis will be briefly reviewed, followed by a description of both a simplified and a complete implementation of the waveform simulator, and simulation results. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/624/CSL-TR-94-624.pdf %R CSL-TR-94-625 %Z Tue, 08 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Experimental Chip to Evaluate Test Techniques Part 1: Description of Experiment %A Franco, Piero %A Stokes, Robert L. %A Farwell, William D. %A McCluskey, Edward J. %D June 1994 %X A Test Chip has been designed and manufactured to evaluate different testing techniques for combinational or full-scan circuits. The Test Chip is a 25k gate CMOS gate-array using LSI Logic's LFT150K technology, and includes support (design for testability) circuitry and five types of circuits-under-test (CUT). Over 5,000 die have been manufactured. The five circuits-under-test include both datapath and synthesized control logic. The tests include design verification (simulation), exhaustive, pseudo-random, and deterministic vectors for various fault models (stuck-at, transition, delay faults, and IDDQ Testing). The chip will also be testing using the CrossCheck methodology, as well as other new technques, including Stability Checking and Very-Low-Voltage Testing. The experiment includes an investigation of both serial and parallel signature analysis. This report describes the Test Evaluation Chip Experiment, including the design of the Test Chip and the tests applied. A future report will cover the experimental results and data analysis. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/625/CSL-TR-94-625.pdf %R CSL-TR-94-631 %Z Mon, 05 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SimOS: A Fast Operating System Simulation Environment %A Rosenblum, Mendel %A Varadarajan, Mani %D July 1994 %X In this paper we describe techniques for building a software development environment for operating system software. These techniques allow an operating system to be run at user-level on a general-purpose operating system such as System V R4 Unix. The approach used in this work is to simulate a machine's hardware using services provided by the underlying operating system. We describe how to simulate the CPU using the operating system's process abstraction, the memory management unit using file mapping operations, and the I/O devices using separate processes. The techniques we present allow the simulator to run with sufficient speed and detail that workloads that exercise bugs on the real machine can be transferred and run in near real-time on the simulated machine. The speed of the simulation depends on the quantity and the cost of the simulated operations. Real programs usually run in the simulated environment at between 50% and 100% of the speed of the underyling machine. The simulation detail we provide allows an operating system running in the simulated environment to be nearly indistinguishable from the real machine from a user perspective. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/631/CSL-TR-94-631.pdf %R CSL-TR-94-617 %Z Thu, 05 Jan 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fast Multiplication: Algorithms and Implementations %A Bewick, Gary W. %D April 1994 %X This thesis investigates methods of implementing binary multiplication with the smallest possible latency. The principle area of concentration is on multipliers with lengths of 53 bits, which makes the results suitable for IEEE-754 double precision multiplication. Low latency demands high performance circuitry, and small physical size to limit propagation delays. VLSI implementations are the only available means for meeting these two requirements, but efficient algorithms are also crucial. An extension to Booth's algorithm for multiplication (redundant Booth) has been developed, which represents partial products in a partially redundant form. This redundant representation can reduce or eliminate the time required to produce "hard" multiples (multiples that require a carry propagate addition) required by the traditional higher order Booth algorithms. This extension reduces the area and power requirements of fully parallel implementations, but is also as fast as any multiplication method yet reported. In order to evaluate various multiplication algorithms, a software tool has been developed which automates the layout and optimization of parallel multiplier trees. The tool takes into consideration wire and asymmetric input delays, as well as gate delays, as the tree is built. The tool is used to design multipliers based upon various algorithms, using both Booth encoded, non-Booth encoded and the new extended Booth algorithms. The designs are then compared on the basis of delay, power, and area. For maximum speed, the designs are based upon a 0.6mu BiCMOS process using emitter coupled logic (ECL). The algorithms developed in this thesis make possible 53x53 multipliers with a latency of less than 2.6 nanoseconds @ 10.5 Watts and a layout area of 13 mm@+[2]. Smaller and lower power designs are also possible, as illustrated by an example with a latency of 3.6 nanoseconds @ 5.8 W, and an area of 8.9 mm@+[2]. The conclusions based upon ECL designs are extended where possible to other technologies (CMOS). Crucial to the performance of multipliers are high speed carry propagate adders. A number of high speed adder designs have been developed, and the algorithms and design of these adders are discussed. The implementations developed for this study indicate that traditional Booth encoded multipliers are superior in layout area, power, and delay to non-Booth encoded multipliers. Redundant Booth encoding further reduces the area and power requirements. Finally, only half of the total multiplier delay was found to be due to the summation of the partial products. The remaining delay was due to wires and carry propagate adder delays. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/617/CSL-TR-94-617.pdf %R CSL-TR-94-650 %Z Mon, 12 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Uniform Approach to the Synthesis of Synchronous and Asynchronous Circuits %A Myers, Chris J. %A Meng, Teresa H.-Y. %D December 1994 %X In this paper we illustrate the application of a synthesis procedure used for timed asynchronous circuits to the design of synchronous circuits. In addition to providing a uniform synthesis approach, our procedure results in circuits that are significantly smaller and faster than those designed using the synchronous design tool SIS. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/650/CSL-TR-94-650.pdf %R CSL-TR-94-651 %Z Wed, 21 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Hazard-Free Decomposition of High-Fanin Gates in Asynchronous Circuit Synthesis %A Myers, Chris J. %A Meng, Teresa H.-Y. %D December 1994 %X In this paper we present an automated procedure to decompose high-fanin gates generated by asynchronous circuit synthesis procedures for technology mapping to practical gate libraries. Our procedure begins with a specification in the form of an event-rule system, a circuit implementation in the form of a production rule set, and a given gate library. For each gate in the implementation that has a fanin larger than the maximum in the library, a new signal is added to the specification. Each valid decomposition of the high-fanin gates using these new signals is examined by resynthesis until all gates have been successfully decomposed, or it has been determined that a solution does not exist. The procedure has been automated and used to decompose high-fanin gates from several examples generated by the synthesis tools ATACS and SYN. Our resulting implementations using ATACS, when compared with SIS which uses synchronous technology mapping and adds delay elements to remove hazards, are up to 50 percent smaller and have less than half the latency using library delays generated by HSPICE. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/651/CSL-TR-94-651.pdf %R CSL-TR-94-605 %Z Thu, 16 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance and Area Analysis of Processor Configurations with Scaling of Technology %A Fu, Steve %A Flynn, Michael J. %D March 1994 %X The increasing density of transistors on integrated circuits and the increasing sensitivity toward costs have stimulated interest in developing techniques for relating transistor count to performance. This paper maps different processor configuration to transistor level area models and proposes an optimum evolution path of processor design as minimum feature size of technology is scaled. A parameter for measuring incremental performance improvement with respect to increasing transistor count is proposed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/605/CSL-TR-94-605.pdf %R CSL-TR-94-657 %Z Mon, 13 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Instruction Level Parallel Processors---A New Architectural Model for Simulation and Analysis %A Rudd, Kevin W. %D December 1994 %X Trends in high-performance computer architecture have led to the development of increased clock-rate and dynamic multiple-instruction issue processor designs. There have been problems combining both these techniques due to the pressure that the complex scheduling and issue logic puts on the cycle time. This problem has limited the performance of multiple-instruction issue architectures. The alternative approach of static multiple-operation issue avoids the clock-rate problem by allowing the hardware to concurrently issue only those operations that the compiler scheduled to be issued concurrently. Since there is no hardware support required to achieve multiple-operation issue (there are multiple operations in a single instruction and the hardware issues a single instruction at a time), these designs can be effectively scaled to high clock rates. However, these designs have the problem that the scheduling of operations into instructions is rigid and to increase the performance of the system the entire system must be scaled uniformly so that the static schedule is not compromised. This report describes an architectural model that allows a range of hybrid architectures to be studied. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/657/CSL-TR-94-657.pdf %R CSL-TR-94-652 %Z Wed, 29 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Synthesis and Verification of Gate-Level Timed Circuits %A Myers, Chris J. %A Rokicki, Tomas G. %A Meng, Teresa H.-Y. %D December 1994 %X This paper presents a CAD system for the automatic synthesis and verification of gate-level timed circuits. Timed circuits are a class of asynchronous circuits which incorporate explicit timing information in the specification which is used throughout the synthesis procedure to optimize the design. This system accepts a textual specification capable of specifying general circuit behavior and timing requirements. This specification is systematically transformed to a graphical representation that can be analyzed using an exact and efficient timing analysis algorithm to find the reachable state space. From this state space, our synthesis procedure derives a timed circuit that is hazard-free using only basic gates to facilitate the mapping to semi-custom components, such as standard-cells and gate-arrays. The resulting gate-level timed circuit implementations are up to 40 percent smaller and 50 percent faster than those produced using other asynchronous design methodologies. We also demonstrate that our timed designs can be smaller and faster than their synchronous counterparts. To address verification, we have applied our timing analysis algorithm to verify efficiently not only our synthesized circuits but also a wide collection of reasonable-sized, highly concurrent timed circuits that could not previously be verified using traditional techniques. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/652/CSL-TR-94-652.pdf %R CSL-TR-94-645 %Z Thu, 09 Feb 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Rationale, Design and Performance of the Hydra Multiprocessor %A Olukotun, Kunle %A Bergmann, Jules %A Chang, Kun-Yung %A Nayfeh, Basem A. %D November 1994 %X In Hydra four high performance processors communicate via a shared secondary cache. The shared cache is implemented using multichip module (MCM) packaging technology. The Hydra multiprocessor is designed to efficiently support automatically parallelized programs that have high degrees of fine grained sharing. This paper motivates the Hydra multiprocessor design by reviewing current trends in architecture and development in parallelizing compiler technology and implementation technology. The design of the Hydra multiprocessor is described and explained. Initial estimates of the interprocessor communication latencies show them to be much better than current bus-based multiprocessors. These lower latencies result in higher performance on applications with fine grained parallelism. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/645/CSL-TR-94-645.pdf %R CSL-TR-94-602 %Z Mon, 24 Apr 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Analyzing and Tuning Memory Performance in Sequential and Parallel Programs %A Martonosi, Margaret Rose %D January 1994 %X Recent architecture and technology trends have led to a significant gap between processor and main memory speeds. When cache misses are common, memory stalls can significantly degrade execution time. To help identify and fix such memory bottlenecks, this work presents techniques to efficiently collect detailed information about program memory performance and effectively organize the data collected. These techniques help guide programmers or compilers to memory bottlenecks. They apply to both sequential and parallel applications and are embodied in the MemSpy performance monitoring system. This thesis contends that the natural interrelationship between program memory bottlenecks and program data structures mandates the use of data oriented statistics, a novel approach that associates program performance information with application data structures. Data oriented statistics, viewed alone or paired with traditional code oriented statistics, offer a powerful, new dimension for performance analysis. I develop techniques for aggregating statistics on similarly-used data structures and for extracting intuitive source-code names for statistics. The thesis also argues that MemSpy's detailed statistics on the frequency and causes of cache misses are crucial in understanding memory bottlenecks. Common memory performance bugs are often most easily distinguished by noting the causes of their resulting cache misses. Since collecting such detailed information seems, at first glance, to require large execution time slowdowns, this dissertation also evaluates techniques to improve the performance of MemSpy's simulation-based monitoring. The first optimization, hit bypassing, improves simulation performance by specializing processing of cache hits. The second optimization, reference trace sampling, improves performance by simulating only sampled portions out of the full reference trace. Together, these optimizations reduce simulation time by nearly an order of magnitude. Overall, having used MemSpy to tune several applications, these experiences demonstrate that MemSpy generates effective memory performance profiles, at speeds competitive with previous, less detailed approaches. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/602/CSL-TR-94-602.pdf %R CSL-TR-94-607 %Z Mon, 28 Nov 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Spreadsheets for Images %A Levoy, Marc %D February 1994 %X We describe a data visualization system based on spreadsheets. Cells in our spreadsheet contain graphical objects such as images, volumes, or movies. Cells may also contain graphical widgets such as buttons, sliders, or movie viewers. Objects are displayed in miniature inside each cell. Formulas for cells are written in a programming language that includes operators for array manipulation, image processing, and rendering. Formulas may also contain control structures, procedure calls, and assignment operators with side effects. Compared to flow chart visualization systems, spreadsheets are more expressive, more scalable, and easier to program. Compared to numerical spreadsheets, spreadsheets for images pose several unique design problems: larger formulas, longer computation times, and more complicated intercell dependencies. We describe an implementation based on the Tcl programming language and the Tk widget set, and we discuss our solutions to these design problems. We also point out some unexpected uses for our spreadsheets: as a visual database browser, as a graphical user interface builder, as a smart clipboard for the desktop, and as a presentation tool. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/607/CSL-TR-94-607.pdf %R CSL-TR-94-616 %Z Wed, 03 May 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Reuse of High Precision Arithmetic Hardware to Perform Multiple Low Precision Calculations %A Z ucker, Daniel %A Lee, Ruby %D April 1994 %X Many increasingly important applications, such as video compression, graphics, or multimedia, require only low-precision arithmetic. However, because the widespread adoption of the IEEE floating point standard has led to the ubiquity of IEEE double precision hardware, this double precision hardware is frequently used to do the low precision calculations. Naturally, it seems an inefficient use of resources to use 54 bits of hardware to perform an 8 or 12 bit calculation. This paper presents a method for packing operands to perform multiple low precision arithmetic operations using regular high precision hardware. Using only source level software modification, a speedup of 15% is illustrated for the Discrete Cosine Transform. Since no machine-specific optimizations are required, this method will work on any machine that supports IEEE arithmetic. Finally, an analysis of speedup and suggestions for future work are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/616/CSL-TR-94-616.pdf %R CSL-TR-94-641 %Z Thu, 07 Dec 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Using Checking Experiments to Test Two-State Latches %A Makar, Samy R. %A McCluskey, Edward J. %D November 1995 %X Necessary and sufficient conditions for an exhaustive functional test (checking experiment) of various latches are derived. These conditions are used to derive minimum-length checking experiments. The checking experiment for the D-latch is simulated using an HSpice implementation of the transmission gate latch. All detectable stuck-at, stuck-open, stuck-on, and bridging faults are detected. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/94/641/CSL-TR-94-641.pdf %R CSL-TR-80-182 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design for autonomous test %A McCluskey, Edward J. %A Bozorgui-Nesbat, Saied %D June 1981 %X A technique for modifying networks so that they are capable of self test is presented. The major innovation is partitioning the network into subnetworks with sufficiently few inputs that exhaustive testing of the subnetworks is possible. Procedures for reconfiguring the existing registers into modified linear feedback registers (LFSR's) which apply the exhaustive (not pseudo-random) test patterns or convert the responses into signatures are described. No fault models or test pattern generation programs are required. A method to modify CMOS circuits so that exhaustive testing can be used even when stuck-open faults must be detected is described. A detailed example using the 74181 ALU is presented. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/80/182/CSL-TR-80-182.pdf %R CSL-TR-80-184 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design automation at Stanford II %A vanCleemput, Willem M. %D February 1980 %X This report contains a copy of the visual aids used by the authors during the presentation of their work at the Second Workshop on Design Automation at Stanford, held on Feb. 19, 1980. The topics covered range from circuit level simulation and integrated circuit process modelling to high level languages and design techniques. The presentations are a survey of the activities in design automation at Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/80/184/CSL-TR-80-184.pdf %R CSL-TR-80-189 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Center-based broadcasting %A Wall, David W. %A Owicki, Susan S. %D June 1980 %X We consider the problem of routing broadcast messages in a loosely-coupled store-and-forward network like the ARPANET. Dalal discussed a solution to this problem that minimizes the cost of a broadcast; in contrast, we are interested in performing broadcast with small delay. Existing algorithms can minimize the delay but seem unsuitable for use in a distributed environment because they involve a high degree of overhead in the form of redundant messages or data-structure space. We propose the schemes of center-based forwarding; the routing of all broadcasts via the shortest-path tree for some selected node called the center. These algorithms have small delay and also are easy to implement in a distributed system. To evaluate center-based forwarding, we define four measures of the delay associated with a given broadcast mechanism, and then propose three ways of selecting a center node. For each of the three forms of center-based forwarding we compare the delay to the minimum delay for any broadcasting scheme and also to the minimum delay for any single tree. In most cases, a given measure of the delay on the centered tree is bounded by a small constant factor relative to either of these two minimum delays. When it is possible, we give a tight bound on the ratio between the center-based delay and the minimum delay; otherwise we demonstrate that no bound is possible. These results give corollary bounds on how bad the three centered trees can be with respect to each other; most of these bounds are immediately tight, and the rest are replaced by better bounds that are also shown to be tight. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/80/189/CSL-TR-80-189.pdf %R CSL-TR-80-192 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Verifying network protocols using temporal logic %A Hailpern, Brent T. %A Owicki, Susan S. %D June 1980 %X Programs that implement computer communications protocols can exhibit extremely complicated behavior, and neither informal reasoning nor testing is reliable enough to establish their correctness. In this paper we discuss the application of program verification techniques to protocols. This approach is more reliable than informal reasoning, but has the advantage over formal reasoning based on finite-state models that the complexity of the proof does not grow unmanageably as the size of the program increases. Certain tools of concurrent program verification that are especially useful for protocols are presented: history variables that record sequences of input and output values, temporal logic for expressing properties that must hold in a future system state (such as eventual receipt of a message), and module specification and composition rules. The use of these techniques is illustrated by verifying a simple data transfer protocol from the literature. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/80/192/CSL-TR-80-192.pdf %R CSL-TR-80-193 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A language for microcode description and simulation in VLSI %A Hennessy, John L. %D July 1980 %X This paper presents a programming language based system for specifying and simulating microcode in a VLSI chip. The language is oriented toward PLA implementation of microcoded machines using either a microprogram counter or a finite state machine. The system supports simulation of the microcode and will drive a PLA layout program to automatically create the PLA. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/80/193/CSL-TR-80-193.pdf %R CSL-TR-81-201 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Research in VLSI systems design and architecture %A Baskett, Forest %A Clark, James %A Hennessy, John %A Owicki, Susan %A Reid, Brian %D March 1981 %X The Computer Systems Laboratory has been involved in a VLSI research program for one and a half years. The major areas under investigation have included: analysis and synthesis design aids, applications of VLSI to computer graphics, the design of a personal workstation, special purpose chip design, VLSI computer architectures, and hardware specification and verification. Progress on these research problems is discussed, and a research program for the next two years is proposed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/201/CSL-TR-81-201.pdf %R CSL-TR-81-209 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Dynamic detection of concurrency in do-loops using ordering matrices %A Wedig, Robert G. %D May 1981 %X This paper describes the data structures and techniques used in dynamically detecting concurrency in Directly Executed Language (DEL) instruction streams. By dynamic detection, it is meant that these techniques are designed to be used at run time with no special source manipulation or preprocessing required to perform the detection. An abstract model of a concurrency detection structure called an ordering matrix is presented. This structure is used, with two other execution vectors, to represent the dependencies between instructions and indicate where potential concurrency exists. An algorithm is developed which utilizes the ordering matrix to detect concurrency within determinate DO-loops. It is then generalized to detect concurrency in arbitrary del instruction streams. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/209/CSL-TR-81-209.pdf %R CSL-TR-81-214 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An exponential failure/load relationship: results of a multi-computer statistical study %A Iyer, Ravishankar K. %A Butner, Steven E. %A McCluskey, Edward J. %D July 1981 %X In this paper we present an exponential statistical model which relates computer failure rates to level of system activity. Our analysis reveals a strong statistical dependency of both hardware and software component failure rates on several common measures of utilization (specifically CPU utilization, I/O initiation, paging, and job-step initiation rates). We establish that this effect is not dominated by a specific component type, but exists across the board in the two systems studied. Our data covers three years of normal operation (including significant upgrades and reconfigurations) for two large Stanford University computer complexes. The complexes, which are composed of IBM mainframe equipment of differing models and vintage, run similar operating systems and provide the same interface and capability to their users. The empirical data domes from identically-structured and maintained failure logs at the two sites along with IBM OS/VS2 operating system performance/load records The statistically strong relationship between failures and load is evident for many equipment types, including electronic, mechanical, as well as software components. This is in opposition to the commonly-held belief that systems which are primarily electronic in nature exhibit no such effect to any significant degree. The exponential character of our statistical model is significantly not only in its simplicity, but also due to its compatibility with classical reliability techniques. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/214/CSL-TR-81-214.pdf %R CSL-TR-81-219 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Consistency in interprocessor communications for fault-tolerant multiprocessors %A Fu, Peter Lincoln %D September 1981 %X Consistency among processors is vital for fault-tolerant multiprocessors. This report describes modular communication interprocessor interface units which implement distributed consistency schemes such that failures within a single processor module cannot affect the consistency of data transferred among the remaining processors. Furthermore, one scheme provides concurrent and consistent self-diagnosis data on the integrity of the units themselves. Another scheme is tolerant to almost all failures within two processor modules. The theory of the schemes are explained and their implementations in LSI circuits are described in detail. The interprocessor communication structure defined by any of these schemes serves well as a critical element in highly reliable multiprocessor systems. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/219/CSL-TR-81-219.pdf %R CSL-TR-81-221 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Parametric curves, surfaces and volumes in computer graphics and computer-aided geometric design %A Clark, James H. %D November 1981 %X This document has four purposes. It is a tutorial in parametric curve and surface representations, it describes a number of algorithms for generating both shaded and line-drawn pictures of bivariate surfaces and trivariate volumes, it explicitly gives transformations between all of the widely used curve and surface representations, and it proposes a solution to the problem of displaying the results of three-dimensional flow-field calculations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/221/CSL-TR-81-221.pdf %R CSL-TR-81-223 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T MIPS: a VLSI processor architecture %A Hennessy, John L. %A Jouppi, Norman %A Baskett, Forest %A Gill, John %D November 1981 %X MIPS is a new single chip VLSI processor architecture. It attempts to achieve high performance with the use of a simplified instruction set, similar to those found in microengines. The processor is a fast pipelined engine without pipeline interlocks. Software solutions to several traditional hardware problems, such as providing pipeline interlocks, are used. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/223/CSL-TR-81-223.pdf %R CSL-TR-81-224 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Code generation and reorganization in the presence of pipeline constraints %A Hennessy, John L. %A Gross, Thomas %D November 1981 %X Pipeline interlocks are used in a pipelined architecture to prevent the execution of a machine instruction before its operands are available. An alternative to this complex piece of hardware is to rearrange the instructions at compile-time to avoid pipeline interlocks. This problem, called code reorganization, is studied. The basic problem of reorganization of machine level instructions at compile-time is shown to be NP-complete. A heuristic algorithm is proposed and its properties and effectiveness are explored. The impact of code reorganization techniques on the rest of a compiler system are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/224/CSL-TR-81-224.pdf %R CSL-TR-81-225 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic compiler code generation %A Ganapathi, Mahadevan %A Fischer, Charles N. %A Hennessy, John L. %D November 1981 %X A classification of automatic code generation techniques and a survey of the work on these techniques is presented. Automatic code-generation research is classified into three categories: formal treatments, interpretive approaches and descriptive approaches. An analysis of these approaches and a critique of automatic code-generation algorithms are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/225/CSL-TR-81-225.pdf %R CSL-TR-81-226 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SILT: a VLSI design language %A Davis, Tom %A Clark, James %D October 1982 %X SILT is an efficient, medium-level language to describe VLSI layout. Layout features are described in terms of a coordinate system based on the concept of relative geometry. SILT provides hierarchical cell description, a library format for parameterized cells with defaults for the parameters, constraint checking (but not enforcement), and some name control. It is designed to be used with a graphical interface, but can be used by itself. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/226/CSL-TR-81-226.pdf %R CSL-TR-81-228 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hardware/software tradeoffs for increased performance %A Hennessy, John L. %A Jouppi, Norman %A Baskett, Forest %A Gross, Thomas %A Gill, John %D February 1983 %X Most new computer architectures are concerned with maximizing performance by providing suitable instruction sets for compiled code, and support for systems functions. We argue that the most effective design methodology must make simultaneous tradeoffs across all three areas: hardware, software support, and systems support. Recent trends lean toward extensive hardware support for both the compiler and operating systems software. However, consideration of all possible design tradeoffs may often lead to less hardware support. Several examples of this approach are presented, including: omission of condition codes, word-addressed machines, and imposing pipeline interlocks in software. The specifics and performance of these approaches are examined with respect to the MIPS processor. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/81/228/CSL-TR-81-228.pdf %R CSL-TR-82-229 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The SUN workstation architecture %A Bechtolsheim, Andrew %D March 1982 %X The Sun workstation is a personal computer system that combines graphics and networking capabilities with powerful local processing. The workstation has been developed for research in VLSI design automation, text processing, distributed operating systems and programming environments. Clusters of Sun workstations are connected via a local network sharing a network-based file system. The Sun workstation is based on a Motorola 6800 processor, has a 1024 by 800 pixel bitmap display, and uses Ethernet as its local network. The hardware supports virtual memory management, a "RasterOP" mechanism for high-speed display updates, and data-link-control for the Ethernet. The entire workstation electronics consists of 260 chips mounted on three 6.75 by 12 inch PC boards compatible with the IEEE 796 Bus (Intel Multibus). In addition to implementing a workstation, the boards have been configured to serve as network nodes for file servers, printer servers, network gateways, and terminal concentrators. The report discusses the architecture and implementation of the Sun workstation, gives the background and goals of the project, contemplates future developments, and describes in detail its three main components: the processor, graphics, and Ethernet boards. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/82/229/CSL-TR-82-229.pdf %R CSL-TR-82-230 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Packet-voice communication on an Ethernet local computer network: an experimental study %A Gonsalves, Timothy A. %D February 1982 %X Local computer networks have been used successfully for data applications such as file transfers for several years. Recently, there have been several proposals for using these networks for voice applications. This paper describes simple voice protocol for use on a packet-switching local network. This protocol is used in an experimental study of the feasibility of using a 3 Mbs experimental Ethernet network for packet-voice communications. This study shows that with appropriately chosen parameters the experimental Ethernet is capable of supporting about 40 simultaneous 64 Kbps voice conversations with acceptable quality. This corresponds to a utilization of 95% of the network capacity. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/82/230/CSL-TR-82-230.pdf %R CSL-TR-82-231 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Dynamic detection of concurrency in DEL instruction streams %A Wedig, Robert G. %D February 1982 %X Detection of concurrency in Directly Executed Languages (DEL) is investigated. It is theorized that if DELs provide a minimal time -space execution of serial programs, then concurrency detection of such instruction streams approaches the minimum execution time possible for a single task without resorting to algorithm restructuring or source manipulation. It is shown how DEL encodings facilitate the detection of concurrency by allowing early decoding and explicity detection of dependency information. The decoding and dependency algorithms as applied to DELs are developed in detail. Concurrency structures are presented which facilitate the detection process. Since all concurrency is capable of exploitation as soon as it is known that the code is to be executed, i.e., the result of the branch is known, it is proven that all explicit parallelism can be detected and exploited using the techniques developed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/82/231/CSL-TR-82-231.pdf %R CSL-TR-82-232 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Studies in microprocessor design %A Alpert, Donald %D June 1982 %X Microprocessor design practice is briefly surveyed. Examples are given for high-level and low-level tradeoffs in specific designs with emphasis on integrated memory functions. Some relations between architectural complexity and design are discussed, and a simple model is presented for implementing a RISC-like architecture. A direction for microprocessor architecture is proposed to allow flexibility for designing with varying processing technologies, cost goals, and performance goals. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/82/232/CSL-TR-82-232.pdf %R CSL-TR-82-233 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Yale user's guide: a SILT-based layout editor %A Davis, Tom %A Clark, James %D October 1982 %X YALE is a layout editor which runs on SUN workstations, and deals with cells expressed in the SILT language. It provides graphical hooks into many features describable in SILT. YALE runs under the V kernel, and makes use of a window manager than provides a multiple viewpoint capability. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/82/233/CSL-TR-82-233.pdf %R CSL-TR-68-1 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T On the computational complexity of finite functions %A Spira, Philip M. %D May 1968 %X One of the most rapidly expanding fields of applied mathematics and engineering is automata theory. Although the term "automaton" is derived from "self-moving thing," the prime concern of automata theory is the study of information-processing devices. A specific example of information processing is computation, and thus the mathematical properties of devices which perform computations are of interest to automata theorists. In this thesis we investigate the computation by logic circuits of a certain class of functions having finite domain. To a given function f a number of so-called complexity criteria can be assigned relative to that class, e.g., the minimum computation time of or the minimum number of elements contained in any circuit of the class which is capable of computing f . Our prime criterion of interest will be computation time. The type of circuits investigated in this thesis are called (d,r ) circuits. A (d,r ) circuit is composed of logical elements each having at most r inputs and one output. Each input value and output value is an element from the set Z d = {0,1,...,d - 1}, and each element has unit delay in computing its output. Thus a given element computes a function from Z S(k,d) to Z d , for some k 2 r, in unit time. The output of one element can be connected to inputs of any number of elements (including itself) and can also comprise one of the outputs of the circuit and an element receives a given one of its inputs either from the output of some element or from the inputs to the circuit. When individual elements are interconnected to form a (d,r) circuit, we can associate a computation time with the entire circuit. Specifically, let f : X1,...,Xn . Y be any function on finite sets X1,...,Xn. Let C be a (d,r) circuit whose input lines are partitioned into n sets. Let IC,j be the set of configurations of values from Z d on the jth (J = 1,2,...,n) and let OC be the set of output configurations of the circuit. Then C is said to compute f in time t if there are maps gj : Xj . IC,j (j = 1,2,...,n) and a 1 - 1 function h : Y . OC such that, if the input from time 0 through time t - 1 is [g1(x1),...,gn(xn)], then the output of C at time t will be h(f(x1,...,xn)). Winograd has done pioneering work on the time of computation of finite functions by (d,r) circuits. He has derived lower bounds on computation time and has constructed near optimal circuits for many classes of finite functions. A principal contribution of this thesis is a complete determination of the time necessary to compute multiplication in a finite group with a (d,r) circuit. A new group theoretic quantity d(G) is defined whose reciprocal is the proper generalization of Winograd's a(G) to nonabelian groups. Then a novel method of circuit synthesis for group multiplication is given. In contrast to previous procedures, it is valid for any finite group--abelian or not. It is completely algebraic in character and is based upon our result that any finite group has a family of subgroups having a trivial intersection and minimum order d(G). The computation time achieved is, in all cases, at most one unit greater than our lower bound. In particular, if G is abelian our computation time is never greater--and often considerably less--than Winograd's. We then generalize the group multiplication procedure to a method to compute any finite function. For given sets X1, X2 and Y and any family of subsets of Y having a certain property called completeness, a corresponding hierarchy of functions having domain X1 x X2 and range Y is established -- the position of a function depending upon its computation time with our method. For reasons which we explain in the test, this appears to be a very natural classification criterion. At the bottom of the hierarchy are invertible functions such as numerical addition and multiplication, and the position of a function in the hierarchy depends essentially upon how far it is from being invertible. For large |X1| and |X2| almost all functions are near the top, corresponding to the fact that nearly all f : X1 x X2 . Y require computation time equal to the maximum required for any such function. The new method is then applied to the case of finite semigroup multiplication. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/68/1/CSL-TR-68-1.pdf %R CSL-TR-70-11 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The SPOOF: a new technique for analyzing the effects of faults on logic networks %A Clegg, Frederick W. %D August 1970 %X In general, one cannot predict the effects of possible failures on the functional characteristics of a logic network without knowledge of the structure of that network. The SPOOF or structure-and parity-observing output function described in this report provides a new and convenient means of characterizing both network structure and output function in a single algebraic expression. A straightforward method for the determination of a SPOOF for any logic network is demonstrated. Similarities between SPOOF's and other means of characterizing network structure are discussed. Examples are presented which illustrate the ease with which the effects of any "stuck-at" fault - single or multiple - on the functional characteristics of a logic network are determined using SPOOF's. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/70/11/CSL-TR-70-11.pdf %R CSL-TR-71-15 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fault equivalence in sequential machines %A Boute, Raymond %A McCluskey, Edward J. %D June 1971 %X This paper is concerned with the relationships among faults as they affect sequential machine behavior. Of particular interest are equivalence and dominancy relations. It is shown that for output faults (i.e., faults that do not affect state behavior), fault equivalence is related to the existence of an automorphism of the state table. For the same class of faults, the relation between dominance and equivalence is considered and some properties are pointed out. Another class of possible faults is also considered, namely, memory faults (i.e., faults in the logic feedback lines). These clearly affect the state behavior of the machine, and their influence on machine properties, such as being strongly connected, is discussed. It is proven that there exist classes of machines for which this property of being strongly connected is destroyed by every possible single fault. Further results on both memory and output faults are also presented. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/71/15/CSL-TR-71-15.pdf %R CSL-TR-71-24 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An improved reliability model for NMR %A Siewiorek, Daniel P. %D December 1971 %X The classical reliability model for N-modular redundancy (NMR) assumes the network to be failed when a majority of modules which drive the same voter fail. It has long been known that this model is pessimistic since there are instances, termed compensating module failures, where a majority of the modules fail but the network is nonfailed. A different module reliability model based on lead reliability is proposed which has the classical NMR reliability model as a special case. It is shown that the standard procedure for altering the classical model to take compensating module failures into account may predict a network reliability which is too low in some cases and too high in others. It is also demonstrated that the improved model can increase the predicted mission time (the time the system is to operate at or above a given reliability) by 50% over the classical model prediction for a simple network. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/71/24/CSL-TR-71-24.pdf %R CSL-TR-72-30 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Adaptive design methods for checking sequences %A Boute, Raymond T. %D July 1972 %X The length of checking sequences for sequential machines can be considerably reduced if, instead of preset distinguishing sequences, one uses so-called distinguishing sets of sequences, which serve the same purpose, but are generally shorter. The design of such a set turns out to be equivalent to the design of an adaptive distinguishing experiment,* though a checking sequence, using a distinguishing set, remains essentially preset. This property also explains the title. All machines having preset distinguishing sequences also have distinguishing sets. In case no preset distinguishing sequences exist, most of the earlier methods call for the use of locating sequences, which result in long checking experiments. However, in many of these cases, a distinguishing set can be found, thus resulting in even more savings in length. Finally, the characterizing sequences used in locating sequences can also be adaptively designed, and thus the basic idea presented below is advantageous even when no distinguishing sets exist. By "experiment" we mean the application of sequence(s) to the machine while observing the output. In some instances, the words "experiment" and "sequence" can be used interchangeably. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/72/30/CSL-TR-72-30.pdf %R CSL-TR-72-35 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Separate non-homomorphic checking codes for binary addition %A Kolupaev, Stephen G. %D July 1972 %X In this paper, necessary and sufficient conditions for successful detection of errors in a binary adder by any separate code are developed. We demonstrate the existence of separate checking codes for addition modulo $2^n$ (n >= 4) and modulo $2^n$-1 (n > 5, n even), which are not homomorphic images of the addition being checked. A non-homomorphic code is constructed in a regular fashion from a single check symbol with special properties. Finding all such intial check symbols requires an exhaustive search of a large tree, and results indicate that the number of distinct codes for a particular modulus grows rapidly with n. In an appendix, we examine a modulo $2^n$ adder where the carry out of the high position is also presented to a checker. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/72/35/CSL-TR-72-35.pdf %R CSL-TR-72-36 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of a parallel encoder/decoder for the Hamming code, using ROM %A Mitarai, H. %A McCluskey, E. J. %D June 1972 %X ROM implementation of logic circuits which have a large number of inputs in generally considered unwise. However, in the design of an encoder/decoder for the Hamming code, ROM implementation is found to yield many advantages over SSI and MSI implementation. There is a one-to-one correspondence between the partition of H matrix into submatrices and the partition of the set of the inputs to the encoder into subsets of the inputs to the ROM modules. Hence, several methods of partitioning the H matrix for the Hamming code are devised. The resulting ROM implementation is shown to save package count compared with other implementations. However, at the present state of technology, there is a trade-off between speed and package count. In the applications where speed is of the utmost importance, the SSI implementation using ECL logic is the most attractive. The disadvantage of ROM in speed should diminish in the near future when semiconductor memory technology will progress to the point where the slow DTL/TTL gates in the input buffer, the address decoder, and the output buffer of ROM, can be replaced by faster gates. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/72/36/CSL-TR-72-36.pdf %R CSL-TR-73-49 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Self-testing residue trees %A Kolupaev, Stephen G. %D August 1973 %X Error detection and correction in binary adders often require computing the residue modulo A of a binary number. We present here a totally self-checking network which extracts the residue of a binary input number of arbitrary width, with respect to any odd modulus A. This network has the tree structure commonly used for residue extraction: a binary tree of circuit blocks, where each block outputs the residue of its inputs. The network we describe differs from previous designs in that the signals between blocks of the tree are not binary-coded. Instead, the l-out-of-A code is used, where A is the modulus desired. Use of this code permits the network to be free of inverters, giving it an advantage in speed. The network output is also coded l-out-of-A, and with respect to this code, the residue tree is totally self-checking in the sense of Anderson. The residue tree described here requires logic gates with A inputs, when the modulus desired is A . This makes the basic design somewhat impractical for a large modulus, because gates with large fan-in are undesirable. To extend the usefulness of this network, we present a technique which uses several residue trees of this design, each for a different modulus. The outputs of these residue trees are combined by a totally self-checking translator from the code of multiple residues to the l-out-of-A code. Using this multiple residue scheme, the modulus of each residue tree can be made much smaller than the desired modulus A. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/73/49/CSL-TR-73-49.pdf %R CSL-TR-73-52 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hazards in asynchronous systems %A Chewning, D. R. %A Bredt, Thomas H. %D September 1972 %X Necessary and sufficient conditions are given for the existence of static and dynamic hazards in combinational circuits that undergo multiple input changes. These theorems are applied in the analysis of modules, such as the wye module, that have been proposed for asynchronous systems. We show that unless internal module delays are strictly less than delays between modules, incorrect operation can occur due to hazards in module implementations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/73/52/CSL-TR-73-52.pdf %R CSL-TR-73-56 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Reliability modeling of NMR networks %A Abraham, Jacob %A Siewiorek, Daniel P. %D June 1973 %X A survey of the literature in the area of redundant system reliability modeling is presented with special emphasis on Triple Modular Redundancy (TMR). Areas where the classical method of TMR reliability prediction may prove inadequate are identified, like the interdependence of fault patterns at points of network fan-in and fan-out. This is especially true if the assumption of highly reliable subsystems, which is frequently made by the modeling techniques, is dropped. It is also not clear if the methods give an upper or a lower bound to the reliability. As a solution, a method of partitioning an arbitrary network into cells so that the faults in a cell are independent of faults in other cells is proposed. An algorithm is then given to calculate a tight lower bound on the reliability of any such cell, by considering only the structure of the interconnections within the cells. The value of reliability found is exact if TMR is assumed to be a coherent system. An approximation to the algorithm is also described; this can be used to find a lower bound to the reliability without extensive calculation. Modifications to the algorithm to improve it and to take care of special cases are given. Finally, the algorithm is extended to N-Modular Redundant (NMR) networks. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/73/56/CSL-TR-73-56.pdf %R CSL-TR-73-62 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A highly efficient redundancy scheme: self-purging redundancy %A Losq, Jacques %D July 1975 %X The goals of this paper are to present an efficient redundancy scheme for highly reliable systems, to give a method to compute the exact reliability of such schemes and to compare this scheme with other redundancy schemes. This redundancy scheme is self-purging redundancy; a scheme that uses a threshold voter and that purges the failed modules. Switches for self-purging systems are extremely simple: there is no replacement of failed modules and module purging is quite simply implemented. Because of switch simplicity, exact reliability calculations are possible. The effects of switch reliability are quantitatively examined. For short mission times, switch reliability is the most important factor: self-purging systems have a probability of failure several times larger than the figure obtained when switches are assumed to be perfect. The influence of the relative frequency of the diverse types of failures (permanent, intermittent, stuck-at,...) are also investigated. Reliability functions, mission time improvements and switch efficiency are displayed. Self-purging systems are compared with ot her redundant systems, like hybrid or NMR, for their relative merits in reliability gain, simplicity, cost and confidence in the reliability estimation. The high confidence in the reliability evaluation of self-purging systems makes them a standard for the validation of several models that have been proposed to take into account switch reliability. The accuracy of models using coverage factors can be evaluated that way. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/73/62/CSL-TR-73-62.pdf %R CSL-TR-74-66 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Computer system performance measurement: instruction set processor level and microcode level %A Svobodova, Liba %D June 1974 %X Techniques based on hardware monitoring were developed to measure computer system performance on the instruction set processor level and the microcode level. Knowledge of system behavior and system utilization at these two levels is extremely valuable for design of new processors. The reasons why such information is needed are discussed and applicable measurement techniques for obtaining necessary data are reviewed. A hardware monitor is a preferable measurement tool since it can trace most of the significant events attributed to these two levels without introducing any artifact. Described hardware monitoring techniques were implemented on the S/370 Model 145 at Stanford University. Measurements performed on the instruction set processor level were concerned with determining execution frequencies on individual instructions under normal system workload. The microcode level measurements measured the number and the type of S/370 Model 145 microwords executed in the process of interpretation of an individual S/370 instruction and the average execution time of each such instruction. Implementation of each technique is described and the results based on the outcome of performed measurements are presented. Finally, effectiveness and ease of use of the discussed techniques are considered. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/74/66/CSL-TR-74-66.pdf %R CSL-TR-74-75 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Influence of fault-detection and switchinig mechanisms on the reliability of stand-by systems %A Losq, Jacques %D July 1975 %X This paper concerns the reliability of stand-by systems when switch reliability is taken into account. It is assumed that failures obey a Poisson distribution for modules and switches. A very detailed method is given to model stand-by systems. Several cases are investigated: ideal systems, real systems with fault-detection mechanisms that can detect any module error and systems for which the fault-detection mechanisms detect only some of the module errors. The reliability versus time curves are determined for each value of the number of spares. It is shown that the best number of spares increases as the length of the mission increases. Systems with extremely short mission time have the best reliability when they have only one spare. The limit when the number of spares increases is the reliability obtained with simplex systems. Whatever the number of spares is, the reliability of stand-by systems goes to zero as time goes to infinity. For a given mission time, it is possible to determine the best number of spares and the best possible reliability. For a given reliability, it is possible to compute the number of spares that gives the longest mission time. These models can be used to determine whether or not there exists a stand-by system that meets the requirements of a given reliability and a given mission time. If such stand-by system exists, its characteristics (minimum number of spares and reliability) can be derived. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/74/75/CSL-TR-74-75.pdf %R CSL-TR-74-77 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Parallel solution methods for triangular linear systems of equations %A Orcutt, Samuel E. %D June 1974 %X In this paper we consider developing parallel solution methods for triangular linear systems of equations. For a system of N equations in N unknowns the serial method requires O(N2) steps, and the straightforward parallel method requires steps and O(N) processors. In this paper we develop methods that require O(log N) time when used with O(N3) processors and O( R(N) log N) time when used with O(N2) processors. We also consider solutions to band triangular systems and develop a method that requires O((log N) (log m)) time and O(Nm2) processors, where m is the bandwidth of the system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/74/77/CSL-TR-74-77.pdf %R CSL-TR-74-85 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The solution of large multi-dimensional Poisson problems %A Stone, Harold S. %D May 1974 %X The Buneman algorithm for solving Poisson problems can be adapted to solve large Poisson problems on computers with a rotating drum memory so that the computation is done with very little time lost due to rotational latency of the drum . %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/74/85/CSL-TR-74-85.pdf %R CSL-TR-75-92 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The complexity of control structures and program validation %A Davison, Joseph W. %D May 1975 %X A preliminary examination of the influence of control structures on the complexity of the proof of correctness of computer programs. A block structured proof technique is defined and studied. Two parameters affecting the complexity of the proof are defined; the number of exits from a block, and the cycle rank of a block, a measure of loop complexity. Proof complexity classes of flowcharts are defined, with maximum values for these parameters. The question investigated is: How does restricting the complexity affect the class of functions realizable, assuming a given set of primitive actions and predicates: It is found that loop complexity may be traded for exits, and that for a given number of exits there are functions requiring any specific loop complexity. Further, it is shown that blocks with two exits are considerably more powerful than those with only one. In fact, for a given maximal loop complexity, there are functions that cannot be realized with one-exit blocks, but can be realized with two-exit blocks, even if the loop complexity is restricted to essentially one internal loop per block. Looking at it the other way around, the addition of a second exit to a block allows construction of flowcharts with any specified loop complexity. This result appears to be extendable to blocks with more exits, but this has not been completed. The work is primarily of a graph theoretical nature, and may also be interpreted as an examination of sequential control structures from the point of view of feedback loop complexity. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/92/CSL-TR-75-92.pdf %R CSL-TR-75-93 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Sequential circuit output probabilities from regular expressions %A Parker, Kenneth P. %A McCluskey, Edward J. %D June 1975 %X This paper presents a number of methods for finding sequential circuit output probabilities using regular expressions. Various classes of regular expressions, based on their form, are defined and it is shown how to easily find multistep transition probabilities directly from the regular expressions. A new procedure for finding steady state probabilities is given which proceeds either from a regular expression or a state diagram description. This procedure is based on the concept of synchronization of the related machine, and is useful for those problems where synchronization sequences exist. In the cases where these techniques can be utilized, substantial savings in computation can be realized. Further, application to other areas such as multinomial Markov processes is immediate. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/93/CSL-TR-75-93.pdf %R CSL-TR-75-95 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The stack working set: a characterization of spatial locality %A Rau, B. Ramakrishna %D July 1975 %X Multilevel memory hierarchies are attractive from the point of view of cost-performance. However, they present far greater problems than two-level hierarchies when it comes to analytic performance evaluation. This may be attributed to two factors: firstly, the page size (or the unit of information transfer between two levels) varies with the level in the hierarchy; secondly, the request streams that the lower (slower) levels see are the fault streams out of the immediately higher levels. Therefore, the request stream seen by each level is not necessarily the same as the one generated by the processor. Since the performance depends directly upon the properties of the request stream, this poses a problem. A model for program behavior, which explicitly characterizes the spatial locality of the program, is proposed and validated. It is shown that the spatial locality of a program is an invariant of the hierarchy when characterized in this manner. This invariance is used to solve the first problem stated - that of the varying page sizes. An approximate technique is advanced for the characterization of the fault stream as a function of the request stream and the capacity of the level. A procedure is then outlined for evaluating the performance of a multilevel hierarchy analytically. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/95/CSL-TR-75-95.pdf %R CSL-TR-75-96 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A rollback interval for networks with an imperfect self-checking property %A Shedletsky, John J. %D December 1975 %X Dynamic self-checking is a technique used in computers to detect a fault quickly before extensive data contamination caused by the fault can occur. When the self-checking properties of the computer circuits are not perfect, as in the case with self-testing only and partially self-checking circuits, the recovery procedure may be required to roll back program execution to a point prior to the first undetected data error caused by the detected fault. This paper presents a method by which the rollback distance required to achieve a given probability of successful data restoration may be calculated. To facilitate this method, operational interpretations are given to familiar network properties such as the self-testing, secureness, and self-checking properties. An arithmetic and logic unit with imperfect self-checking capability is analyzed to determine the minimum required rollback distance for the recovery procedure. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/96/CSL-TR-75-96.pdf %R CSL-TR-75-97 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Deterministic sequential networks under random control %A Varszegi, Sandor %D September 1975 %X This paper presents a network-oriented approach for the treatment of deterministic sequential networks under random control. Considered are the cases of multinomial, stationary Markov and arbitrary input processes. Probabilities of the state and output processes are directly derived from the primary information of the network and the source. Coded networks are treated using the logic circuits or Boolean functions. The isomorphism between Boolean and event algebras is made use of, and the probabilities of the response processes are obtained in the form of algebraic probability expressions interpreted over the determining (i.e., input and initial state) minterm or signal joint probabilities. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/97/CSL-TR-75-97.pdf %R CSL-TR-75-98 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A tale of three emulators %A Hoevel, Lee W. %A Wallach, Walter A. Jr. %D November 1975 %X This is a preliminary report on the development of emulator code for the Stanford EMMY. Emulation is introduced as an interpretive computing technique. Various classes of emulation and their correlation to the image machine are presented. Functional and structural overviews of three emulators for the Stanford EMMY are presented. These are IBM System/360; CRIL; and DELtran. Performance estimates are included for each of these systems. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/98/CSL-TR-75-98.pdf %R CSL-TR-75-101 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A new philosophy for wire routing %A Rau, B. Ramakrishna %D November 1975 %X A number of interconnection algorithms exist and have been used quite successfully. However, most of them, though differing in detail, appear to subscribe to the same underlying philosophy which has developed from that for single layer boards. Arguments are advanced which question the validity of this philosophy in this environment of multilayer board technology. A new philosophy is developed in this report, which, it is hoped, will be more suited for use with multilayer boards. Based on this philosophy, an interconnection algorithm is then developed in a step by step fashion. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/101/CSL-TR-75-101.pdf %R CSL-TR-75-102 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High performance emulation %A Wallach, Walter A. Jr. %D November 1975 %X The Stanford EMMY is examined as an emulation engine. Using the 360 emulator and the DELtran interpreter as examples, the performance of the current EMMY architecture is examined as a high performance emulation vehicle. The problems of using a sequential, vertically organized processor for high speed emulation are developed and discussed. A flexible control structure for high speed emulation studies is derived from an existing high performance processor. This structure issues a stream of microinstructions to a central command bus, allowing user-defined execution resources to execute them in overlapped fashion. These execution resources may be added or deleted with little or no processor rewiring. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/75/102/CSL-TR-75-102.pdf %R CSL-TR-76-106 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Mathematical models for the circuit layout problem %A vanCleemput, Willem M. %D February 1976 %X In the first part of this paper the basic differences between the classical (placement, routing) and the topological approach to solving the circuit layout problem are outlined. After a brief survey of some existing mathematical models for the problem, an improved model is suggested. This model is based on the concept of partially oriented graph and contains more topological information than earlier models. This reduces the need for special constraints on the graph embedding algorithm. The models also allow pin and gate assignment in function of the layout, under certain conditions. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/106/CSL-TR-76-106.pdf %R CSL-TR-76-108 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Cascade structure in totally self-checking networks %A Kolupaev, Stephen %D April 1976 %X In the well-known totally self-checking (TSC) network, a failure must not change one output codeword into another. Called the fault-secure property, this permits a receiver of the net's output to assume that any codeword it receives is correct. Further, the self-testing property requires that each possible failure in the net must produce at least one non-code output. Thus a receiver can monitor the health of the network by watching for non-code outputs. In this paper we propose modifications of these two properties. The self-testing property is made more stringent. Each possible failure in the net is required to produce an output which is in a distinguished subset of the non-code outputs. The fault-secure requirement is modified to permit a fault to interchange certain output codewords. In particular, all outputs not in the distinguished subset are partitioned into equivalent classes, and a fault is permitted to change the output from one codeword to another codeword in the same class. However, a fault is not permitted to change the output from a codeword to any member of a different equivalence class (one not containing the correct output) . These modified properties define a generalization of the TSC network. A network which meets the modified properties is called a generalized self-checking (GSC) network. Self-checking and self-testing (Morphic) networks and TSC networks are special cases of the GSC network. Examining TSC networks, we find a further connection with the GSC network. It has been known for some time that not every subnetwork of a TSC network need by TSC. We show that every subnetwork of a TSC network is GSC, and every TSC network is a cascade of GSC networks. This establishes the GSC network as the basic building block from which every TSC network is constructed. We explore a brute-force method for constructing a desired TSC network by cascading GSC subnetworks. The method resorts to enumeration at many points of decision and thus is not a practical design tool. However, it does yield a very nice alternate realization of the Morphic OR, and suggests specializations which merit further study. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/108/CSL-TR-76-108.pdf %R CSL-TR-76-111 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A distributed algorithm for constructing minimal spanning trees in computer-communication networks %A Dalal, Yogen K. %D June 1976 %X This paper presents a distributed algorithm for constructing minimal spanning trees in computer-communication networks. The algorithm can be executed concurrently and asynchronously by the different computers of the network. This algorithm is also suitable for constructing minimal spanning trees using a multiprocessor computer system. There are many reasons for constructing minimal spanning trees in computer-communication networks since minimal spanning tree routing is useful in distributed operating systems for performing broadcast, in adaptive routing algorithms for transmitting delay estimates, and in other networks like the Packet Radio Network. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/111/CSL-TR-76-111.pdf %R CSL-TR-76-113 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Error correction by alternate-date retry %A Shedletsky, John J. %D May 1976 %X A new technique for low-cost error correction in computers is the alternate-data retry, (ADR). An ADR is initiated by the detection of an error in the initial execution of an operation. The ADR is a re-execution of the operation, but with an alternate representation of the initial data. The choice of the alternate representation and the design of the processing circuits combine to insure that even an error due to a permanent fault is not repeated during retry. Error-correction is provided at a hardware cost comparable to that of a conventional retry capability. Sufficient conditions are given for the design of circuits with an ADR capability. The application of an ADR capability to memory and to the data paths of a processor is illustrated. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/113/CSL-TR-76-113.pdf %R CSL-TR-76-114 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T EMMY/360 functional characteristics %A Wallach, Walter A. Jr. %D June 1976 %X An emulation of the IBM System/360 architecture is presented - the EMMY/360. Problem state code which executes correctly on an IBM 360 will also execute correctly on the EMMY/360. Code producing execution exceptions will, in most cases, produce the same results on the two systems. Certain exceptions occurring on IBM 360 cannot occur on the EMMY/360, such as address specification exceptions for main store operands, and certain precise interrupts on IBM 360 will be imprecise on the EMMY/360, such as address exceptions. The EMMY/360 supports the Standard 360 instruction set with single precision floating point. The 360 input/output structure is not supported; I/O on the EMMY system is done by Function Call instruction, rather than channel program and Start-Test I/O. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/114/CSL-TR-76-114.pdf %R CSL-TR-76-115 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Principles of self-checking processor design and an example %A Wakerly, John F. %D December 1975 %X A self-checking processor has redundant hardware to insure that no likely failure can cause undetected errors and all likely failures are detected in normal operation. We show how error-detecting codes and self-checking circuits can be used to achieve these properties in a microprogrammed processor. The choice of error-detecting codes and the placement of checkers to monitor coded data paths are discussed. The use of codes to detect errors in arithmetic and logic operations and microprogram control units is described. An example processor design is given and some observations on the diagnosis and repair of such a processor are made. From the example design it appears that somewhat less than 50% overall redundancy is required to guarantee the detection of all failures that affect a single medium- or large-scale integration circuit package. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/115/CSL-TR-76-115.pdf %R CSL-TR-76-116 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Asynchronous serial interface for connecting a PDP-11 to the ARPANET (BBN 1822) %A Crane, Ronald C. %D July 1976 %X This report describes an interface to permit the connection of any PDP-11 to either the Packet radio network of the ARPAnet. The interface connects to an IMP on one side, meeting the specifications published in BBN report number 1822, NS RO 16 bit parallel interface (DRV-11 or DR11-C) as described in the DEC peripherals and interfacing handbook. The interface card itself is a double height board (5.2"x 8.5") which can be plugged into any peripheral slot in a PDP-11 backplane. The interface card is connected to the parallel interface card via two cables with Berg 40 pin connectors (DEC H-856) and to the IMP via an Amphenol bayonet connector (48-10R-18-315). All 3 cables and connectors are supplied with the I/O interface card. The parallel interface card (DEC DR11-C or DRV-11) together with the special I/O interface card described in this report comprise the 1822 interface. The report includes descriptions of the operation of circuits, programming, and diagnostics for the 1822 interface. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/116/CSL-TR-76-116.pdf %R CSL-TR-76-117 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An "almost-exact" solution to the N-processor, M-memory bandwidth problem %A Rau, B. Ramakrishna %D June 1976 %X A closed-form expression is derived for the memory bandwidth obtained when N processors are permitted to generate requests to M memory modules. Use of generating functions is made, in a rather unusual fashion, to obtain this expressio n. The one approximation involved is shown to result in only a very small error -- and that, too, only for small values of M and N. This expression, which is asymptotically exact, is shown to be more accurate than existing closed form approximations. Lastly, a family of asymptotically exact solutions are presented which are easier to evaluate than is the first one. Although these expressions are less accurate than the previously derived closed-form solution, they are, nevertheless, better than existing solutions. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/117/CSL-TR-76-117.pdf %R CSL-TR-76-118 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Stanford emulation laboratory %A Flynn, Michael J. %A Hoevel, Lee W. %A Neuhauser, Charles J. %D June 1976 %X The Stanford Emulation Laboratory is designed to support general research in the area of emulation. Central to the laboratory is a universal host machine, the EMMY, which has been designed specifically to be an unbiased, yet efficient host for a wide range of target machine architectures. Microstore in the EMMY is dynamically microprogrammable and thus is used as the primary data storage resource of the emulator. Other laboratory equipment includes a reconfigurable main memory system and an independent control processor to monitor emulation experiments. Laboratory software, including two microassemblers, is briefly described. Three laboratory applications are described: (1) A conventional target machine emulation (a system 360), (2) 'microscopic' examination of emulated target machine I-streams, and (3) Direct execution of a high level language (Fortran II). %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/118/CSL-TR-76-118.pdf %R CSL-TR-76-119 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A simulator for the evaluation of digital system reliability %A Thompson, Peter Alan %D August 1977 %X This report describes a simulation package designed to evaluate the reliability of digital systems. The simulator can be used to model many different types of systems, at varying levels of detail. The user is given much freedom to use the elements of the model in the way best suited to simulating the operation of a system in the presence of faults. The simulation package then generates random faults in the model, and uses a Monte Carlo analysis to obtain curves of reliability. Three examples are given of simulations of digital systems which have redundancy. The difference between this type of simulation and other simulation techniques is discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/119/CSL-TR-76-119.pdf %R CSL-TR-76-120 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Detection of intermittent faults in sequential circuits %A Savir, Jacob %D March 1978 %X Testing for intermittent faults in digital circuits has been given significant attention in the past few years. However, very little theoretical work was done regarding their detection in sequential circuits. This paper shows that the testing properties of intermittent faults in sequential circuits can be studied by means of a probabilistic automaton. The evaluation and derivation of optimal intermittent fault detection experiments in sequential circuits is done by creating a product state table from the faulty and fault-free versions of the circuit under test. Both deterministic and random test procedures are discussed. The underlying optimality criterion maximizes the probability of fault detection. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/120/CSL-TR-76-120.pdf %R CSL-TR-76-123 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Research in the Digital Systems Laboratory %A Faculty, The %D June 1976 %X This report summarizes the research carried out in the Digital Systems Laboratory* at Stanford University during the period August 1975 through July 1976. Research investigations were concentrated into the following major areas: Computer Performance; Computer Reliability Studies, including fault-tolerant computing, evaluation of dual-computer configurations, and implementation of reliable software systems; Computer Architecture, including organization of computer systems, feasibility of real-time emulation, and directly executed languages; Design Automation of Digital Systems; Computer Networks, including network interconnection protocols, the 2000 terminal computing system, and packet-switched network technology/cost studies; LSI Multiprocessors; Compiler Implementation; and Parallel Computer Systems. Renamed Computer Systems Laboratory in 1978. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/123/CSL-TR-76-123.pdf %R CSL-TR-76-125 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance bounds for parallel processors %A Lee, Ruby Bei-Loh %D November 1976 %X A general model of computation on a p-parallel processor is proposed, distinguishing clearly between the logical parallelism (p* processes) inherent in a computation, and the physical parallelism (p processors) available in the computer organization. This shows the dependence of performance bounds on both the computation being executed and the computer architecture. We formally derive necessary and sufficient conditions for the maximum attainable speedup of a p-parallel processor over a uniprocessor to be Sp 2 min( F(p,ln p), F(p*,ln p*)), where ln p approximates Hp , the pth. harmonic number. We also verify that empirically-derived speedups are 0( F(p*,ln p*)). Finally, we discuss related performance measures of minimum execution time, maximum efficiency and minimum space-time product. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/125/CSL-TR-76-125.pdf %R CSL-TR-76-126 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The optimal placement of dynamic recovery checkpoints in recoverable computer systems %A Warren-Angelucci, Wayne %D December 1976 %X Reliability is an important concern of any computer system. No matter how carefully designed and constructed, computer systems fail. The rapid and systematic restoration of service after an error or malfunction is always a major design and operational goal. In order to overcome the effects of a failure, recovery must be performed to go from the failed sate to an operational state. This thesis describes a recovery method which guarantees that a computer system, its associated data bases and communication transactions will be restored to an operational and consistent state within a given time and cost bound after the occurrence of a system failure. This thesis considers the optimization of a specific software strategy - the rollback and recovery strategy, within the framework of a graph model of program flow which encompasses communication interfaces and data base transactions. Algorithms are developed which optimize the placement of dynamic recovery checkpoints. Presented is a method for statically pre-computing a set of optimal decision parameters for the associated program model, and run-time technique for dynamically determining the optimal placement of program recovery checkpoints. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/76/126/CSL-TR-76-126.pdf %R CSL-TR-77-129 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T On accuracy improvement and applicability conditions of diffusion approximation with applications to modelling of computer systems %A Yu, Philip S. %D January 1977 %X Starting with single server queueing systems, we find a different way to estimate the diffusion parameters. The boundary condition is handled using the Feller's elementary return process. Extensive comparisons by asymptotic, simulation and numerical techniques have been conducted to establish the superiority of the proposed method compared with conventional methods. The limitation of the diffusion approximation is also investigated. When the coefficient of variation of interarrival time is larger than one, the mean queue length may vary over a wide range even if the mean and variance of interarrival time are kept unchanged. The diffusion approximation is applicable under the condition that the high variation of interarrival time conducted on 2-stage hyperexponential distributions. A similar anomaly is observed in two server closed queueing networks when the service time of any server has a large coefficient of variation. Again, a similar regularity condition on the service time distribution is required in order for the diffusion approximation to be applicable. For general queueing networks, the problems become more complicated. A simple way to estimate the coefficient of variation of interarrival time (when the network is decomposable) is proposed. Besides the anomalies cited before, networks under certain topologies, such as networks with feedback loops, especially self loops, can not be decomposed into separate single servers when the coefficient of variation of service time distributions become large, even if the large variations are due to a large number of short service times. Nevertheless, the decomposability of a network can be improved by replacing each server with a self loop by an equivalent server without a self loop. Finally, we consider the service center with a queue dependent service rate or arrival rate. Generalization to two server closed queueing networks where each server may have a self loop is also considered. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/129/CSL-TR-77-129.pdf %R CSL-TR-77-130 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The structure of directly executed languages: a new theory of interpretive system design %A Hoevel, Lee W. %A Flynn, Michael J. %D March 1977 %X This paper concerns two important issues in the design of optimal languages for direct execution in an interpretive system: binding the operand identifiers in an executable instruction unit to the arguments of the routine implementingthe operator defined by that instruction; and binding operand identifiers to execution variables. These issues are central to the performance of a system both in space and time. Historically, some form of "machine language" is used as the directly executable medium for a computing system. These languages traditionally are constrained to a single "n-address" instruction format; this leads to an excessive number of "overhead" instructions that do nothing but move values from one storage resource to another being imbedded in the executable instruction stream. We propose to reduce this overhead by increasing the number of instruction formats available at the directly executed language level. Machine languages are also constricted with respect to the manner in which operands can be "addressed" within an instruction. Usually, some form of indexed base-register scheme is available, along with a direct addressing mechanism for a few, "special" storage cells (i.e., registers and perhaps the zeroth page of main store). We propose a different identification mechanism--based on the Contour Model of Johnston. Using our scheme, only N bits are needed to encode any identifier in a scope containing less than 2**N distinct identifiers. Together, these two results lead to directly executed language designs which are optimal in the sense that: (1) k executable instructions are required to implement a source statement containing k functional operators; (2) the space required to represent the executable form of a source statement containing k distinct functional operators and v distinct variables approaches F*k + N*v -- where there are less than 2**F distinct functional operators in the scope of definition for the source statement, and less than 2**N distinct variables in this scope. (3) the time needed to execute the representation of a source statement containing k functional operators, d distinct variables in its domain, and r distinct variables in its range approaches d + r + k ; where time is measured in memory references. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/130/CSL-TR-77-130.pdf %R CSL-TR-77-131 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Sequential prefetch strategies for instructions and data %A Rau, B. Ramakrishna %D January 1977 %X An investigation of sequential prefetch as a means of reducing the average access time is conducted. The use of a target instruction buffer is shown to enhance the performance of instruction prefetch. The concept of generalized sequentiality is developed to enable the study of sequentiality in data streams. Generalized sequentiality is shown to be present to a significant degree in data streams from measurements on representative programs. This results is utilized to develop a data prefetch mechanism which is found to be capable of anticipating, on the average, about 75% of all data requests. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/131/CSL-TR-77-131.pdf %R CSL-TR-77-132 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Manual for a general purpose simulator used to evaluate reliability of digital systems %A Thompson, Peter A. %D August 1977 %X A simulation technique has been developed for the reliability evaluation of arbitrarily defined computer systems. The main simulation program is written in FORTRAN IV, and requires no changes to simulate many different systems. The user defines a model for a particular system by supplying a set of short FORTRAN subroutines, and a specially formatted block of numerical parameters. The subroutines specify the functional behavior of various subsystems comprising the model, while the numerical parameters describe how the subsystems are interconnected, their time delays what faults occur in each one, etc. The main simulation program uses this model to perform a Monte-Carlo type evaluation of the systems' reliability. This report supplements a basic description of the technique by supplying all the details necessary for writing subroutines, specifying numerical parameters, and using the main simulation program. The simulation is event-driven, and automatically generates pseudo-random faults and time delays according to parameters given by the user. Some problems typical of event simulators, such as ambiguities arising from random time-delay generation, can be solved by taking advantage of special facilities built into the simulation package. A complete source listing of the main program is included for reference. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/132/CSL-TR-77-132.pdf %R CSL-TR-77-134 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of two-level fault-tolerant networks form threshold elements %A Butakov, Evguenij A. %A Posherstnick, Marat S. %D March 1977 %X Only a small part of all Boolean functions of n-variables can be realized by one threshold element (T.E.). For all other functions the net must be built with at least two T.E.'s. The problem of constructing a fault-tolerant two-level network from T.E. is investigated. The notion of limiting function is introduced. It is shown that the use of these limiting functions induces a reduction in the number of possible candidates during the process of finding a realization of an arbitrary function by threshold functions. The method is based on the two-asummability property of threshold functions and therefore is applicable to completely specified Boolean functions with less than nine variables. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/134/CSL-TR-77-134.pdf %R CSL-TR-77-135 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Passage time distributions for a class of queueing networks: closed, open, or mixed, with difference classes of customers with applications to computer system modeling %A Yu, Philip S. %D March 1977 %X Networks of queues are important models of multiprogrammed time-shared computer systems and computer communication networks. Although equilibrium state probabilities of a broad class of network models have been derived in the past, analytic or approximate solutions for response time distributions or more general passage time distributions are still open problems. In this paper we formulate the passage time problem as a "hitting time" or "first passage time" problem in a Markov system and derive the analytic solution to passage time distributions of closed queueing networks. Efficient numerical approximation is also proposed. The result for closed queueing networks is further extended to obtain approximate passage time distributions for open queueing networks. Finally, we employ the techniques derived in this paper to study the interfault time and response time distribution and density functions of multiprogramming, size of main memory, service time of paging devices and rate of file I/O requests on the shape of distribution functions and density functions have been examined. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/135/CSL-TR-77-135.pdf %R CSL-TR-77-136 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A structural design language for computer aided design of digital systems %A vanCleemput, Willem M. %D April 1977 %X In this report a language (SDL) for describing structural properties of digital systems will be presented. SDL can be used at all levels of the design process, i.e. from the system level down to the circuit level. The language is intended as a complement to existing computer hardware description languages, which emphasize behavioral description. The language was motivated partly by the nature of the design process. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/136/CSL-TR-77-136.pdf %R CSL-TR-77-137 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance analysis of computer communication networks via random access channels %A Yu, Philip S. %D April 1977 %X The field of computer communication networks has grown very rapidly in the past few years. One way to communicate is via multiple access broadcast channels. A new class of random access schemes referred to as the Mp-persistent CSMA scheme is proposed. It incorporates the nonpersistent CSMA scheme and the 1-persistent CSMA scheme, both slotted and unslotted versions, as its special cases with p=0 and 1, respectively. The performance of the Mp-persistent CSMA scheme under packet switching is analyzed and compared with other random access schemes. By dynamically adjusting p, the unslotted version can achieve better performance in both throughput and delay than the currently available unslotted CSMA schemes under packet switching. Furthermore, the performance of various random access schemes under message switching is analyzed and compared with that under packet switching. In both slotted and unslotted versions of the M0-persistent CSMA scheme, the performance under message switching is superior to that under packet switching in the sense that not only the channel capacity is larger but also the average number of retransmissions per successful message under message switching is smaller than that per successful packet under packet switching. In dynamic reservation schemes, message switching leads to larger channel capacity. However, in both slotted and unslotted versions of the ALOHA scheme, the channel capacity is reduced when message switching is used instead of packet switching. This phenomenon may also happen in the Mp-persistent CSMA scheme as p deviates from 0 to 1 for certain distributions of message length. Hence, the performance under message switching may be superior to or inferior to that under packet switching depending upon the random access scheme being used and the distribution of message length (usually a large coefficient of variation of message length implies a large degradation of channel capacity in this case) for certain random access schemes. Nevertheless, for radio channels, message switching can achieve larger channel capacity if appropriate CSMA schemes are used. A mixed strategy which is a combination of message switching and packet switching is proposed to improve the performance of a point to point computer communication network when its terminal access networks communicate via highly utilized radio channels. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/137/CSL-TR-77-137.pdf %R CSL-TR-77-138 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Program behavior and the performance of interleaved memories %A Rau, B. Ramakrishna %D May 1977 %X One of the major factors influencing the performance of an interleaved memory system is the behavior of the request sequence, but this is normally ignored. This report examines this issue. Using trace driven simulations it is shown that the commonly used assumption, that all requests are equally likely to be to any module, is not valid. The duality of memory interference with paging is noted and this suggests the use of the Least-Recently-Used Stack Model to model program behavior. Simulation shows that this model is quite successful. An accurate expression for the bandwidth is derived based upon this model. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/138/CSL-TR-77-138.pdf %R CSL-TR-77-139 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Properties and applications of the least-recently-used stack model %A Rau, B. Ramakrishna %D May 1977 %X The Least-Recently-Used Stack Model (LRUSM) is known to be a good model of temporal locality. Yet, little analysis of this model has been performed and documented. Certain properties of the LRUSM are developed here. In particular, the concept of the Stack Working Set is introduced and expressions are derived for the forward recurrence time to the next reference to a page, for the time that a page spends in a cache of a given size and for the time from last reference to the page being replaced. The fault stream out of a cache memory is modelled and it is shown how this can be used to partially analyze a multilevel memory hierarchy. In addition, the Set Associative Buffer is analyzed and a necessary and sufficient condition for the optimality of the LRU replacement algorithm is advanced. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/139/CSL-TR-77-139.pdf %R CSL-TR-77-142 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimal layout of CMOS functional arrays %A Uehara, T. %A vanCleemput, Willem M. %D March 1978 %X Designers of MOS LSI circuits can take advantage of complex functional cells in order to achieve better performance. This paper discusses the implementation of a random logic function on an array of CMOS transistors. A graph-theoretical algorithm which minimizes the size of an array is presented. This method is useful for the design of cells used in conventional design automation systems. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/142/CSL-TR-77-142.pdf %R CSL-TR-77-143 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SPRINT - an interactive system for printed circuit board design user's guide %A vanCleemput, Willem M. %A Bennett, T. C. %A Hupp, J. A. %A Stevens, K. R. %D June 1977 %X The SPRINT system; for the design of printed circuit boards is a collection of programs that allows designers to interactively design two-sided boards using a Tektronix 4013 graphics terminal. The major parts of the system are: a compiler for SDL, the Structure Design Language, an interactive component placement program, an interactive manual conductor routing program, an automatic batch router, a via elimination program and a set of artwork generation programs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/143/CSL-TR-77-143.pdf %R CSL-TR-77-147 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Verifying concurrent programs with shared date classes %A Owicki, Susan S. %D August 1977 %X Monitors are a valuable tool for organizing operations on shared data in concurrent programs. In some cases, however, the mutually exclusive procedure calls provided by monitors are overly restrictive. Such applications can be programmed using shared classes, which do not enforce mutual exclusion. This paper presents a method of verifying parallel programs containing shared classes. One first proves that each class procedure performs correctly when executed by itself, then shows that simultaneous execution of other class procedures can not interfere with its correct operation. Once a class has been verified, calls to its procedures may be treated as uninterruptable actions; this simplifies the proof of higher-level program components. Proof rules for classes and procedure calls are given in Hoare's axiomatic style. Several examples are verified, including two versions of the readers and writers problem and a dynamic resource allocator. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/147/CSL-TR-77-147.pdf %R CSL-TR-77-149 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Interpretive machines %A Iliffe, John K. %D June 1977 %X These lectures survey attempts to apply computers directly to high level languages using microprogrammed interpreters. The motivation for such work is to achieve language implementations that are more effective in some measure of translation, execution or response to the user than would otherwise be obtained. The implied comparison is with the established technique of compiling into a fixed general-purpose machine code prior to execution. It is argued that while substantial benefits can be expected from microprogramming it does not represent the best approach to design when the contributing factors are analyzed in a general system context, that is to say when wide performance range, multiple source language, and stringent security requirements have to be satisfied. An alternative is suggested, using a combination of interpretation and a primitive instruction set and providing security at the microprogram level. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/149/CSL-TR-77-149.pdf %R CSL-TR-77-150 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Research in the Digital Systems Laboratory: August 1976-July 1977 %A Staff %D July 1977 %X This report summarizes the research carried out in the Digital Systems Laboratory at Stanford University during the period from August 1976 through July 1977. Research investigations were concentrated in the following areas: Computer Reliability and Testing, including detection of intermittent failures, testing for sequential circuits, self-checking linear feedback shift registers, simulation analysis of high-reliability systems, effects of failures on gracefully degradable systems, fault diagnosis in digital systems, and software reliability; Critical Fault-Pattern Determination; Computer Architecture, including trace facility, memory interleaving, and monitors for signal activity; Organization of Computer Systems, including an emulation research laboratory, emulators, and memory performance; Feasibility of Real-Time Emulation, including directly executable languages; Distributed Date Processing for Ballistic Missile Defense; Description Languages and Design for General-Purpose Computer Architectures, including evaluation of existing hardware description languages, development of a structural description language, applications of the structural design language, bounds for maximal parallelism, and parallel information processing in bilogical systems; Computer Networks, including broadcast protocols in packet-switched computer networks and the optimal placement of dynamic-recovery checkpoints in recoverable computer systems; Design and Verification of Reliable Software including specifications and proofs for abstract data types in concurrent programs, specification and verification of monitors, and operating system design; Design Automation, including a language for describing the structure of digital systems, the SPRINT printed-circuit design system, computer-aided layout of large-scale integrated circuits, and an interactive system for design capture; Database, including studies in distributed processing and problem solving, a database maintenance system, and the implementation of database in medicine; and Digital Incremental Computers. Renamed Computer Systems Laboratory in 1978. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/77/150/CSL-TR-77-150.pdf %R CSL-TR-78-154 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Notes on modelling of computer systems and networks %A Yu, Philip S. %D April 1978 %X Formulation of given computer system or network problems into abstract stochastic models is considered. Generally speaking, model formulation is an art. While analytic results are clearly not powerful enough to provide a "cookbook" approach to modelling, general methodology and difficulties on model formulation are discussed through examination of various computer system and network models. These models are presented in a systematic way based on the hierarchical approach. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/154/CSL-TR-78-154.pdf %R CSL-TR-78-155 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The formal definition of a real-time language %A Hennessy, John L. %A Kieburtz, Richard B. %D July 1978 %X This paper presents the formal definition of TOMAL (Task-Oriented Microprocessor Application Language), a programming language intended for real-time systems running on small processors. The formal definition addresses all aspects of the language. Because some modes of semantic definition seem particularly well-suited to certain aspects of a language, and not as suitable for others, the formal definition employs several, complementary modes of definition. The primary definition is axiomatic in the notation of Hoare; it is employed to define most of the transformations of data control states affected by statements of the language. Simple, denotational (but not lattice-theoretic) semantics complement the axiomatic semantics to define type-related features, such as the finding of names to types, data type coercions, and the evaluation of expressions. Together, the axiomatic and denotational semantics define all the features of the sequential language. An operational definition, not included in the paper, is used to define real-time execution, and to extend the axiomatic definition to account for all aspects of concurrent execution. Semantic constraints, sufficient to guarantee conformity of a program with the axiomatic definition, can be checked by analysis of a TOMAL program at compilation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/155/CSL-TR-78-155.pdf %R CSL-TR-78-156 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimal program control structures based on the concept of decision entropy %A Lee, Ruby Bei-Loh %D July 1978 %X The ability to make decisions dynamically during program execution is a very powerful and valuable tool. Unfortunately, it also causes severe performance degradations in high-speed computer organizations which use parallel, pipelined or lookahead techniques to speed up program execution. An optimal control structure is one where the average number of decisions to be made during program execution is minimal among all control structures for the program. Since decisions are usually represented by conditional branch instructions, finding an optimal control structure is equivalent to minimizing the expected number of conditional branch instructions to be encountered per program execution. By decision entropy, we mean a quantitative characterization of the uncertainty in the instruction stream due to dynamic decisions imbedded in the program. We define this concept of decision entropy in the Channon information-theoretic sense. We show that a program's intrinsic decision entropy is an absolute lower bound on the expected number of decisions, or conditional branch instructions, per program execution. We show that this lower bound is achieved if each decision has maximum uncertainty. We also indicate how optimal control structures may be constructed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/156/CSL-TR-78-156.pdf %R CSL-TR-78-157 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Syndrome-testable design of combinational circuits %A Savir, Jacob %D October 1978 %X Classical testing of combinational circuits requires a list of the fault-free responses of the circuit to the test set. For most practical circuits implemented today the large storage requirement for such a list make such a test procedure very expensive. In this paper we describe a method of designing combinational circuits in such a way that their test procedure will require the knowledge of only one characteristic of the fault-free circuit, called the syndrome. This solves the storage problem associated with the test procedure. It is shown that the syndrome-testable design is inexpensive and can be easily implemented by the logic designer. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/157/CSL-TR-78-157.pdf %R CSL-TR-78-158 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance characterization of parallel computations %A Lee, Ruby Bei-Loh %D September 1978 %X This paper defines and interprets quantitative measures by which we may characterize the absolute and relative performance of a parallel computation, compared with an equivalent serial computation. The absolute performance measures are the Parallelism Index, PI(P), the Utilization, U(P), and the maximum Quality, Q(P). The corresponding relative performance measures are the Speedup, S(P,1), the Efficiency, E(P,1), and the Quality, Q(P,1). We show how the corresponding absolute and relative performance measures are related via the Redundancy measure, R(P,1). We also examine the range of permissible values for each performance measure. Ideally, we would like to compare an optimal parallel computation with an optimal equivalent serial computation, in order to determine the performance improvements due solely to parallel versus serial processing. Toward this end, we define optimal parallel and serial computations, and show such optimality may be approximated in practice. In order to facilitate the calculation of the above performance measures, we show how the complexity of modelling an arbitrary parallel computation may be reduced substantially to two simple canonical forms, which we denote the computation's Parallelism Profile and TOP-form. Finally we show how all the canonical forms and performance measures may be generalized from one computation to a set of computations, to arrive at aggregate canonical and performance descriptions. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/158/CSL-TR-78-158.pdf %R CSL-TR-78-159 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Specification and verification of network mail system %A Owicki, Susan S. %D November 1978 %X Techniques for describing and verifying modular systems are illustrated using a simple network mail problem. The design is presented in a top-down style. At each level of refinement, the specifications of the higher level are verified from the specifications of lower level components. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/159/CSL-TR-78-159.pdf %R CSL-TR-78-163 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An introduction to the DDL-P language %A Cory, Wendell E. %A Duley, J. R. %A vanCleemput, Willem M. %D March 1979 %X This report describes the Pascal-based implementation of DDL (Digital Design Language) and its simulator. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/163/CSL-TR-78-163.pdf %R CSL-TR-78-164 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T DDL-P command language manual %A Cory, Wendell E. %A Duley, J. R. %A vanCleemput, Willem M. %D March 1979 %X This report describes the command language for the simulator, associated with DDL (Digital Design Language). %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/164/CSL-TR-78-164.pdf %R CSL-TR-78-165 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Partitioning of digital systems %A Mei, K. %A vanCleemput, Willem M. %A Blount, M. %A Hanson, D. L. %A Payne, Thomas S. %A Savir, Jacob %A Scheffer, L. K. %D April 1979 %X The aim of this study is to develop concepts and tools for understanding the influence of partitioning on the life-cycle cost of a system. Throughout this study three types of boards will be considered as examples to illustrate the concepts being developed. These three board types are being used by the U.S. Navy for various types of equipment. The types considered are: Type 1A: A small PC card with space for up to 8 IC's and a single 40-pin connector. Type 2A: A PC card with space for up to 18 IC's and a single 100-pin connector. Type 5X: A PC card with space for up to 55 IC's and two connectors: a 100-pin connector for the back plane connection and a 30-pin test point connector to be used for diagnostic purposes only. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/78/165/CSL-TR-78-165.pdf %R CSL-TR-79-168 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T UFORT: a Fortran-to-Universal PCODE Translator (FIXFOR-2) %A Chow, Frederick %A Nye, Peter %A Wiederhold, Gio %D January 1980 %X The Fortran compiler described in this document, UFORT, was written specifically to serve in a Pascal environment using the Universal P-Code as an intermediate pseudomachine. The need for implementation of Fortran these days is due to the great volume of existing Fortran programs, rather than to a desire to have this language available to develop new programs. We have hence implemented the full, but traditional Fortran standard, rather than the recently adopted augmented Fortran standard. All aspects of Fortran which are commonly used in large scientific programs are available, including such features as SUBROUTINES, labelled COMMON, and COMPLEX arithmetic. In addition, a few common extensions, such as integers of different lengths and assignment of strings to variables, have been added. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/168/CSL-TR-79-168.pdf %R CSL-TR-79-170 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Interpretive architectures: a theory of ideal language machines %A Flynn, Michael J. %A Hoevel, Lee %D February 1979 %X This paper is a study in ideal computer architectures or program representations. An ideal architecture can be defined with respect to the representation that was used to originally describe a program, i.e. the higher level language. Traditional machine architectures name operations and objects which are presumed to be present in the host machine: a memory space of certain size, ALU operations, etc. An ideal machine framed about a specific higher level language assumes operations present in that language and uses these operations to describe relationships between objects described in the source representation. The notion of ideal is carefully constrained. The object program representation must be easily decompilable, (i.e. the source is readily reconstructable). It is simply assumed that the source itself is a good representation for the original problem, thus any nonassignment operation present in the source program statement will appear as a single instruction (operation) in the ideal representation. All named objects are defined with respect to the natural scope of definition of the source program. For simplicity of discussion, statistical behavior of the program or the language is assumed to be unknown; that is, Huffman codes are not used. From the above, a canonic interpretive form (CIF) or measure of a higher level language program is developed. CIF measures both static space to represent the program and dynamic time measurements of the number of instructions to be interpreted and the number of memory references these instructions will require. The CIF or ideal program representation is then compared using the Whetstone benchmark in its characteristics to several contemporary architectural approaches; IBM 370 Honeywell Level 66, Burroughs S-Language Fortran and DELtran, a quasi-ideal Fortran architecture based on CIF principles. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/170/CSL-TR-79-170.pdf %R CSL-TR-79-171 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A theory of interpretive architectures: some notes on DEL design and a Fortran case study %A Hoevel, Lee %A Flynn, Michael J. %D February 1979 %X An interpretive architecture is a program representation that peculiarly suits a particular high level language or class of languages. The architecture is a program representation which we call a directly executed language (DEL). In a companion paper we have explored the theory involved in the creation of ideal DEL forms and have analyzed how some traditional instruction sets compare to this measure. This paper is an attempt to develop a reasonably comprehensive theory of DEL synthesis. By assuming a flexible interpretation oriented host machine, synthesis involves three particular areas: (1) sequencing; both between image machine instructions and within the host interpreter, (2) action rules including both format for transformation and operation invoked, and finally, (3) the name space which includes both name structure and name environment. A complete implementation of a simple version of FORTRAN is described in the appendix of the paper. This DEL for FORTRAN called DELtran comes close to achieving the ideal program measures. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/171/CSL-TR-79-171.pdf %R CSL-TR-79-174 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Pascal*: a Pascal based systems programming language %A Hennessy, John L. %D June 1980 %X Pascal* (Pascal-star) is a new programming language which is upward compatible with standard Pascal and suitable for systems programming. Although there are several additions to the language, simplicity remains a major design goal. The major additions reflect trends evident in newer languages such as Euclid, Mesa, and Ada, including: modules, simple parametric types, structures constants and values, several minor extensions to the control structures of the language, random access files, arbitrary return types for functions, and an exception handling mechanism. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/174/CSL-TR-79-174.pdf %R CSL-TR-79-175 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Symbolic debugging of optimized code %A Hennessy, John L. %D July 1979 %X The long standing conflict between the optimization of code and the ability to symbolically debug the code is examined. The effects of local and global optimizations on the variables of a program are categorized and models for representing the effect of optimizations are given. These models are used by algorithms which determine the subset of variables whose values do not correspond to those in the original program. Algorithms for restoring these variables to their correct values are also developed. Empirical results from the application of these algorithms to local optimization are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/175/CSL-TR-79-175.pdf %R CSL-TR-79-176 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SYNDIA user's guide %A Cory, Wendell E. %D August 1979 %X This report describes how to use the Syndia/Syngra system available at SU-SCORE. This system accepts a BNF-like grammar; specifications and automatically generates syntax diagrams on a Tektronix graphics terminal. Syndia is the major component of this system; Syndgra acts acts as an interface between Syndia and the SUDS2 graphics editor. Syndia performs no ambiguity or consistency checks on the BNF input. This report assumes that the reader is familiar with BNF and syntax diagram representations of grammars. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/176/CSL-TR-79-176.pdf %R CSL-TR-79-177 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T ADLIB user's manual %A Hill, Dwight D. %D August 1979 %X ADLIB (A Design Language for Indicating Behavior) is a new computer design language recently developed at Stanford. ADLIB is a superset of PASCAL with special facilities for concurrency and interprocess communication. It is normally used under the SABLE simulation system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/177/CSL-TR-79-177.pdf %R CSL-TR-79-178 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design automation at Stanford %A vanCleemput, Willem M. %D July 1979 %X This report contains a copy of the visual aids used by the authors during the presentation of their work at the First Workshop on Design Automation at Stanford, held July 3-4, 1979. The topics covered a range from circuit level simulation and integrated circuit process modelling to high level languages and design techniques. The presentations are a survey of the activities in design automation at Stanford University. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/178/CSL-TR-79-178.pdf %R CSL-TR-79-179 %Z Thu, 01 Dec 94 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Testability considerations in microprocessor-based design %A Hayes, John P. %A McCluskey, Edward J. %D November 1979 %X This report contains a survey of testability conditions in microprocessor-based design. General issues of testability, testing methods, and fault modeling are presented. Specific techniques of testing and designing for testable microprocessor-based systems are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/79/179/CSL-TR-79-179.pdf %R CSL-TR-95-661 %Z Thu, 02 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance Factors for Superscalar Processors %A Bennett, James E. %A Flynn, Michael J. %D February 1995 %X This paper introduces three performance factors for dynamically scheduled superscalar processors. These factors, availability, efficiency, and utility, are then used to explain the variations in performance that occur with different processor and memory system features. The processor features that are investigated are branch prediction depth and following multiple branch paths. The memory system features that are investigated are cache size, associativity, miss penalty, and memory bus bandwidth. Dynamic scheduling with appropriate levels of bus bandwidth and branch prediction is shown to be remarkably effective at achieving good performance over a range of differing application types and over a range of cache miss rates. These results were obtained using a new simulation environment, MXS, which directly executes the benchmarks. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/661/CSL-TR-95-661.pdf %R CSL-TR-95-662 %Z Thu, 02 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Limits of Scaling MOSFETs %A McFarland, Grant %A Flynn, Michael J. %D January 1995 %X The fundamental electrical limits of MOSFETs are discussed and modeled to predict the scaling limits of digital bulk CMOS circuits. Limits discussed include subthreshold currents, time dependent dielectric breakdown (TDDB), hot electron effects, and drain induced barrier lowering (DIBL). This paper predicts the scaling of bulk CMOS MOSFETs to reach its limits at drawn dimensions of approximately 0.1um. These electrical limits are used to find scaling factors for SPICE Level 3 model parameters, and a scalable Level 3 device model is presented. Current trends in scaling interconnects are also discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/662/CSL-TR-95-662.pdf %R CSL-TR-95-659 %Z Tue, 28 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High-Speed BiCMOS Memories %A Wingard, Drew Eric %D December 1994 %X Existing BiCMOS static memories do not simultaneously combine the speed of bipolar memories with the low power and density of CMOS memories. Beginning with fundamentally fast low=swing bipolar circuits and zero-power CMOS storage latches, we introduce CMOS devices into the bipolar circuits to reduce the power dissipation without compromising speed and insert bipolar transistors into CMOS storage arrays to improve the speed without power nor density penalties. Replacing passive load resistors with switched PMOS transistors reduces the amount of power required to keep bipolar decoder outputs low. The access delay need not increase because the load resistance is quickly reduced via a low-swing signal when the decoder could switch. For ECL NOR decoders, we apply a variable BiCMOS current source that is simplified by carefully regulating the negative supply. We also develop techniques that improve the reading and writing characteristics of the CMOS-storage, emitter-access memory cell. The 16K-word 4-bit asynchronous CSEA memory was fabricated in a 0.8-micron BiCMOS technology and accesses in 3.7ns while using 1.75 W. An improved 64Kx4 design is simulated to run at 3.4ns and 2.3W. Finally, a synchronous 4Kx64 CSEA memory is estimated to operate at 2.5ns and 2.4W in the same process technology. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/659/CSL-TR-95-659.pdf %R CSL-TR-95-658 %Z Wed, 29 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T RYO: a Versatile Instruction Instrumentation Tool for PA-RISC %A Z ucker, Daniel F. %A Karp, Alan H. %D January 1995 %X RYO (Roll Your Own) is actually a family of novel instrumentation tools for the PA-RISC family of processors. Relatively simple awk scripts, these tools instrument PA-RISC assembly instruction sequences by replacing individual machine instructions with calls to user written routines. Examples are presented showing how to generate address traces by replacing memory instructions, and how to analyze floating point arithmetic by replacing floating point instructions. This paper introduces the overall structure and design of RYO, as well as giving detailed instructions on its use. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/658/CSL-TR-95-658.pdf %R CSL-TR-95-660 %Z Tue, 28 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors %A Holt, Chris %A Heinrich, Mark %A Singh, Jaswinder Pal %A Rothberg, Edward %A Hennessy, John %D January 1995 %X Distributed shared memory (DSM) machines can be characterized by four parameters, based on a slightly modified version of the logP model. The l (latency) and o (occupancy of the communication controller) parameters are the keys to performance in these machines, and are largely determined by major architectural decisions about the aggressiveness and customization of the node and network. For recent and upcoming machines, the g (gap) parameter that measures node-to-network bandwidth does not appear to be a bottleneck. Conventional wisdom is that latency is the dominant factor in determining the performance of a DSM machine. We show, however, that controller occupancy--which causes contention even in highly optimized applications--plays a major role, especially at low latencies. When latency hiding is used, occupancy becomes more critical, even in machines with high latency networks. Scaling the problem size is often used as a technique to overcome limitations in communication latency and bandwidth. We show that in many structured computations occupancy-induced contention is not alleviated by increasing problem size, and that there are important classes of applications for which the performance lost by using higher latency networks or higher occupancy controllers cannot be regained easily, if at all, by scaling the problem size. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/660/CSL-TR-95-660.pdf %R CSL-TR-95-663 %Z Tue, 28 Mar 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Technology Mapping for Asynchronous Designs %A Siegel, Polly Sara Kay %D March 1995 %X Asynchronous design styles have been increasing in popularity as device sizes shrink and concurrency is exploited to increase system performance. However, asynchronous designs are difficult to implement correctly because the presence of hazards, which are of little consequence to most parts of synchronous systems, can cause improper circuit operation. Many asynchronous design styles, together with accompanying automated synthesis algorithms, address the issues of design complexity and correctness. Typically, these synthesis systems take a high-level description of an asynchronous system and produce a logic-level description of the resultant design that is hazard-free for transitions of interest. The designer then must manually translate this logic-level description into a technology- specific implementation composed of an interconnection of elements from a semi-custom cell library. At this stage, the designer must be careful not to introduce new hazards into the design. The size of designs is limited in part by the inability to safely (and reliably) map the technology-independent description into an implementation. In this thesis, we address the problem of technology mapping for two different asynchronous design styles. We first address the problem for burst-mode designs. We developed theorems and algorithms for hazard-free mapping of burst-mode designs, and implemented these algorithms on top of an existing synchronous technology mapper. We incorporated this mapper into a toolkit for asynchronous design, and used the toolkit to implement a low-power infrared communications chip. We then extended this work to apply to the problem of hazard-free technology mapping of speed-independent designs. The difficulty in this design style is in the decomposition phase of the mapping algorithm, and we developed theory and algorithms for correct hazard-free decomposition of this design style. We also developed an exact covering algorithm which takes advantage of logic sharing within the design. These algorithms were then applied to benchmark circuits. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/663/CSL-TR-95-663.pdf %R CSL-TR-95-664 %Z Wed, 19 Apr 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Nondeterministic Operators in Algebraic Frameworks %A Meldal, Sigurd %A Walicki, Michal Antonin %D March 1995 %X A major motivating force behind research into abstract data types is the realization that software should be described in an abstract manner - on the one hand leaving open decisions regarding further refinement and on the other allowing for substitutivity of modules as long as they satisfy a particular specification. The use of nondeterministic operators is a useful abstraction tool: nondeterminism represents a natural abstraction whenever there is a hidden state or other components of a system description which are, methodologically, conceptually or technically, inaccessible at a particular level of specification granularity. In this report we explore the various approaches to dealing with nondeterminism within the framework of algebraic specifications. The basic concepts involved in the study of nondeterminism are introduced. The main alternatives for the interpretation of nondeterministic operations, homomorphisms between nondeterministic structures and equivalence of nondeterministic terms are sketched, and we discuss various proposals for initial and terminal semantics. We offer some comments on the continuous semantics of nondeterminism and the problem of solving recursive equations over signatures with binary nondeterministic choice. We also present the attempts at reducing reasoning about nondeterminism to reasoning in first order logic, and present a calculus dealing directly with nondeterministic terms. Finally, rewriting with nondeterminism is discussed: primarily as a means of reasoning, but also as a means of assigning operational semantics to nondeterministic specifications. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/664/CSL-TR-95-664.pdf %R CSL-TR-95-666 %Z Thu, 13 Apr 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T ON DIVISION AND RECIPROCAL CACHES %A Oberman, Stuart F. %A Flynn, Michael J. %D April 1995 %X Floating-point division is generally regarded as a high latency operation in typical floating-point applications. Many techniques exist for increasing division performance, often at the cost of increasing either chip area, cycle time, or both. This paper presents two methods for decreasing the latency of division. Using applications from the SPECfp92 and NAS benchmark suites, these methods are evaluated to determine their effects on overall system performance. The notion of recurring computation is presented, and it is shown how recurring division can be exploited using an additional, dedicated division cache. Additionally, for multiplication-based division algorithms, reciprocal caches can be utilized to store recurring reciprocals. Due to the similarity between the algorithms typically used to compute division and square root, the performance of square root caches is also investigated. Results show that reciprocal caches can achieve nearly a 2X reduction in effective division latency for reasonable cache sizes. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/666/CSL-TR-95-666.pdf %R CSL-TR-95-667 %Z Mon, 05 Jun 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Better Optical Triangulation through Spacetime Analysis %A Curless, Brian %A Levoy, Marc %D April 1995 %X The standard methods for extracting range data from optical triangulation scanners are accurate only for planar objects of uniform reflectance illuminated by an incoherent source. Using these methods, curved surfaces, discontinuous surfaces, and surfaces of varying reflectance cause systematic distortions of the range data. Coherent light sources such as lasers introduce speckle artifacts that further degrade the data. We present a new ranging method based on analyzing the time evolution of the structured light reflections. Using our spacetime analysis, we can correct for each of these artifacts, thereby attaining significantly higher accuracy using existing technology. We present results that demonstrate the validity of our method using a commercial laser stripe triangulation scanner. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/667/CSL-TR-95-667.pdf %R CSL-TR-95-668 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Architecture Evaluator's Work Bench and its Application to Microprocessor Floating Point Units %A Fu, Steve %A Quach, Nhon %A Flynn, Michael %D June 1995 %X This paper introduces Architecture Evaluator's Workbench(AEWB), a high level design space exploration methodology, and its application to floating point units(FPUs). In applying AEWB to FPUs, a metric for optimizing and comparing FPU implementations is developed. The metric -- FUPA incorporates four aspects of AEWB -- latency, cost, technology and profiles of target applications. FUPA models latency in terms of delay, cost in terms of area, and profile in terms of percentage of different floating point operations. We utilize sub-micron device models, interconnect models, and actual microprocessor scaling data to develop models used to normalize both latency and area enabling technology-independent comparison of implementations. This report also surveys most of the state of the art microprocessors, and compares them utilizing FUPA. Finally, we correlate the FUPA results to reported SPECfp92 results, and demonstrate the effect of circuit density on FUPA implementations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/668/CSL-TR-95-668.pdf %R CSL-TR-95-669 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Testing BiCMOS and Dynamic CMOS Logic %A Ma, Siyad %D June 1995 %X In a normal integrated circuit (IC) production cycle, manufactured ICs are tested to remove defective parts. The purpose of this research is to study the effects of real defects in BiCMOS and Dynamic CMOS circuits, and propose better test solutions to detect these defects. BiCMOS and Dynamic CMOS circuits are used in many new high performance VLSI ICs. Fault models for BiCMOS and Dynamic CMOS circuits are discussed first. Shorted and open transistor terminals, the most common failure modes in MOS and bipolar transistors, are simulated for BiCMOS and Dynamic CMOS logic gates. Simulations show that a faulty behavior similar to data retention faults in memory cells can occur in BiCMOS and Dynamic CMOS logic gates. We explain here why it is important to test for these faults, and present test techniques that can detect these faults. Simulation results also show that shorts and opens in Dynamic CMOS and BiCMOS circuits are harder to test than their counterparts in Static CMOS circuits. Simulation results also show that the testability of opens in BiCMOS gates can be predicted without time-consuming transistor-level simulations. We present a prediction method based on an extended switch-level model for BiCMOS gates. To improve the testability of dynamic CMOS circuits, design-for-testability circuitry are proposed. Scan cell designs add scan capabilities to dynamic latches and flip-flops with negligible performance overhead, while design-for-current-testability circuitry allows quiescent supply current (IDDQ) measurements for dynamic CMOS circuits. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/669/CSL-TR-95-669.pdf %R CSL-TR-95-670 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design and Analysis of Update-Based Cache Coherence Protocols for Scalable Shared-Memory Multiprocessors %A Glasco, David Brian %D June 1995 %X This dissertation examines the performance difference between invalidate-based and update-based cache coherence protocols for scalable shared-memory multiprocessors. The first portion of the dissertation reviews cache coherence. First, chapter 1 describes the cache coherence problem and identifies the two classes of cache coherence protocols, invalidate-based and update-based. The chapter also reviews bus-based protocols and reviews the additional requirements placed on the protocols to extend them to scalable systems. Next, chapter 2 reviews two latency tolerating techniques, relaxed memory consistency models and software-controlled data prefetch, and examines their impact on the cache coherence protocols. Finally, chapter 3 reviews the details of three invalidate-based protocols defined in the literature and defines two new update-based protocols. The second portion of this dissertation examines the performance differences between invalidate-based and update-based protocols. First, chapter 4 presents the methodology used to examine the performance of the protocols. This presentation includes a discussion of the simulation environment, the simulated architecture and the scientific applications. Next, chapter 5 describes and analyzes the performance of two enhancements to the update-based cache coherence protocols. The first enhancement, a fine-grain or word based synchronization scheme, combines data synchronization with the data. This allows the system to take advantage of the fine-grain data updates which result from the update-based protocols. The second enhancement, a write grouping scheme, is necessary to reduce the network traffic generated by the update-based protocols. Next, chapter 6 presents and discusses the simulated results that demonstrate that update-based protocols, with the two enhancements, can significantly improve the performance of the fine-grain scientific applications examined compared to invalidate-based protocols. Chapter 7 examines the sensitivity of the protocols to changes in the architectural parameters and to migratory data. Finally chapter 8 discusses how the choice of protocols affect the correctness, cost and efficiency of the cache coherence mechanism. Overall, this work demonstrates that update-based protocols can be used not only as a coherence mechanism, but also as a latency reducing and tolerating technique to improve the performance of a set of fine-grain scientific applications. But as with other latency reducing techniques, such as data prefetch, the technique must be used with an understanding of its consequences. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/670/CSL-TR-95-670.pdf %R CSL-TR-95-671 %Z Wed, 21 Jun 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Characterization and reduction of metastability errors in CMOS interface circuits %A Portmann, Clemenz Lenard %D June 1995 %X In synchronous digital logic systems, asynchronous external signals must be referenced to the system clock or synchronized. Synchronization of asynchronous signals, however, inevitably leads to metastability errors. Metastability error rates can increase by orders of magnitude as clock frequencies increase in high performance designs, and supply voltages decrease in low- power designs. This research focuses on the characterization of metastability parameters and error reduction with no penalty in circuit performance. Two applications, high-speed flash analog- to-digital conversion and synchronization of asynchronous binary signals in application-specific integrated circuits have been investigated. Applications such as telecommunications and instrumentation for time-domain analysis require analog-to-digital converters with metastability error probabilities on the order of 10^-10 errors/ cycle, achievable in high performance designs only through the use of dedicated circuitry for error reduction. A power and area efficient externally pipelined metastability error reduction technique for flash converters has been developed. Unresolved comparator outputs are held valid, causing the encode logic to fail benignly in the presence of metastability. In an n bit converter, errors are passed as a single unsettled bit to the converter output and are reduced with an external pipeline of only n latches per stage rather than an internal pipeline of 2^n-1 latches per stage. An 80-MHz, externally pipelined, 7-bit flash analog-to-digital converter was fabricated in 1.2-um CMOS. Measured error rates were less than 10^-12 errors/cycle. Using internal pipelining with two levels of 127 latches to achieve equivalent performance would require 3.48 times more power for the error reduction circuitry with a Nyquist frequency input. This corresponds to a reduction in the total power for the implemented converter of 1.24 times compared with the internally pipelined converter. In synchronizers and arbiters, general purpose applications require mean time between failures on the order of one per year or tens of years. Comparison of previous designs has been difficult due to varying technologies, test setups, and test conditions. To address this problem, a test circuit for synchronizers was implemented in 2-um and 1.2-um CMOS technologies. Using the test setup, the evaluation and comparison of synchronizer performance in varying environments and technologies is possible. The effects of loading, output buffering, supply scaling, supply noise, and technology scaling on synchronizer performance are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/671/CSL-TR-95-671.pdf %R CSL-TR-95-672 %Z Wed, 28 Jun 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Delay Models for CMOS Circuits %A McFarland, Grant %A Flynn, Michael %D June 1995 %X Four different CMOS inverter delay models are derived and compared. It is shown that inverter delay can be estimated with fair accuracy over a wide range of input rise times and loads as the sum of two terms, one proportional to the input rise time, and one proportional to the capacitive load. Methods for estimating device capacitance from HSPICE parameters are presented, as well as means of including added delay due to wire resistance and the use of series transistors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/672/CSL-TR-95-672.pdf %R CSL-TR-95-665 %Z Wed, 02 Aug 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Interprocedural Parallelization Analysis: Preliminary Results %A Hall, Mary W. %A Amarasinghe, Saman P. %A Murphy, Brian R. %A Liao, Shih-Wei %A Lam, Monica S. %D March 1995 %X This paper describes a fully interprocedural automatic parallelization system for Fortran programs, and presents the results of extensive experiments obtained using this system. The system incorporates a comprehensive and integrated collection of analyses including dependence, privatization and reduction recognition for both array and scalar variables, and scalar symbolic analysis to support these. All the analyses have been implemented in the SUIF (Stanford University Intermediate Format) compiler system, with the aid of an interprocedural analysis construction tool known as FIAT. Our interprocedural analysis is uniquely designed to provide the same quality of information as if the program were analyzed as a single procedure, while managing the complexity of the analysis. We have implemented a robust system that has parallelized, completely automatically, loops containing over a thousand lines of code. This work makes possible the first comprehensive empirical evaluation of state-of-the-art automatic parallelization technology. This paper reports evaluation numbers on programs from standard benchmark suites. The results demonstrate that all the interprocedural analyses taken together can substantially advance the capability of current automatic parallelization technology. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/665/CSL-TR-95-665.pdf %R CSL-TR-95-673 %Z Thu, 27 Jul 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Informing Loads: Enabling Software to Observe and React to Memory Behavior %A Horowitz, Mark %A Martonosi, Margaret %A Mowry, Todd C. %A Smith, Michael D. %D July 1995 %X Memory latency is an important bottleneck in system performance that cannot be adequately solved by hardware alone. Several promising software techniques have been shown to address this problem successfully in specific situations. However, the generality of these software approaches has been limited because current architectures do not provide a fine-grained, low-overhead mechanism to observe memory behavior directly. To fill this need, we propose a new set of memory operations called informing memory operations, and in particular, we describe the design and functionality of an informing load instruction. This instruction serves as a primitive that allows the software to observe cache misses and to act upon this information inexpensively (i.e. under the miss, when the processor would typically be idle) within the current software context. Informing loads enable new solutions to several important software problems. We demonstrate this through examples that show their usefulness in (i) the collection of fine-grained memory profiles with high precision and low overhead and (ii) the automatic improvement of memory system performance through compiler techniques that take advantage of cache-miss information. Overall, we find that the apparent benefit of an informing load instruction is quite high, while the hardware cost of this functionality is quite modest. In fact, the bulk of the required hardware support is already present in today's high-performance processors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/673/CSL-TR-95-673.pdf %R CSL-TR-95-674 %Z Wed, 26 Jul 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Three Concepts of System Architecture %A Luckham, David C. %A Vera, James %A Meldal, Sigurd %D July 1995 %X An architecture is a specification of the components of a system and the communication between them. Systems are constrained to conform to an architecture. An architecture should guarantee certain behavioral properties of a conforming system, i.e., one whose components are configured according to the architecture. An architecture should also be useful in various ways during the process of building a system. This paper presents three alternative concepts of architecture: object connection architecture, interface connection architecture, and plug and socket architecture. We describe different concepts of interface and connection that are needed for each of the three kinds of architecture, and different conformance requirements of each kind. Simple examples are used to compare the usefulness of each kind of architecture in guaranteeing properties of conforming systems, and in correctly modifying a conforming system. In comparing the three architecture concepts the principle of communication integrity becomes central, and two new architecture concepts, duality of sub-interfaces (services) and connections of dual services (service connection), are introduced to define plug and socket architecture. We describe how these concepts reduce the complexity of architecture definitions, and can in many cases help guarantee that the components of a conforming system communicate correctly. The paper is presented independently of any particular formalism, since the concepts can be represented in widely differing architecture definition formalisms, varying from graphical languages to event-based simulation languages. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/674/CSL-TR-95-674.pdf %R CSL-TR-95-675 %Z Thu, 27 Jul 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Analysis of Division Algorithms and Implementations %A Oberman, Stuart F. %A Flynn, Michael J. %D July 1995 %X Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks forces processor designers to pay close attention to all aspects of floating-point computation. Many algorithms are suitable for implementing division in hardware. This paper presents four major classes of algorithms in a unified framework, namely digit recurrence, functional iteration, very high radix, and variable latency. Digit recurrence algorithms, the most common of which is SRT, use subtraction as the fundamental operator, and they converge to a quotient linearly. Division by functional iteration converges to a quotient quadratically using multiplication. Very high radix division algorithms are similar to digit recurrence algorithms, but they incorporate multiplication to reduce the latency. Variable latency division algorithms reduce the average latency to form the quotient. These algorithms are explained and compared in this work. It is found that for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable. An implementation of division by functional iteration can provide the lowest latency for typical multiplier latencies. Variable latency algorithms show promise for simultaneously minimizing average latency while also minimizing area. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/675/CSL-TR-95-675.pdf %R CSL-TR-95-676 %Z Thu, 31 Aug 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The COOL Parallel Programming Language: Design, Implementation, and Performance %A Chandra, Rohit %D January 1995 %X Effective utilization of multiprocessors requires that a program be partitioned for parallel execution, and that it execute with good data locality and load balance. Although automatic compiler-based techniques to address these concerns are attractive, they are often limited by insufficient information about the application. Explicit programmer participation is therefore necessary for programs that exploit unstructured task-level parallelism. However, support for such intervention must address the tradeoff between ease of use and providing a sufficient degree of control to the programmer. In this thesis we present the programming language COOL, that extends C++ with simple and efficient constructs for writing parallel programs. COOL is targeted towards programming shared-memory multiprocessors. Our approach emphasizes the integration of concurrency and synchronization with data abstraction. Concurrent execution is expressed through parallel functions that execute asynchronously when invoked. Synchronization for shared objects is expressed through monitors, and event synchronization is expressed through condition variables. This approach provides several benefits. First, integrating concurrency with data abstraction allows construction of concurrent data structures that have most of the complex details suitably encapsulated. Second, monitors and condition variables integrated with objects offer a flexible set of building blocks that can be used to build more complex synchronization abstractions. Synchronization operations are clearly identified through attributes and can be optimized by the compiler to reduce synchronization overhead. Finally, the object framework supports abstractions to improve the load distribution and data locality of the program. Besides these mechanisms for exploiting parallelism, COOL also provides support for the programmer to address the performance issues, in the form of abstractions that can be used to supply hints about the objects referenced by parallel tasks. These hints are used by the runtime system to schedule tasks close to the objects they reference, and thereby improve data locality. The hints are easily supplied by the programmer in terms of the objects in the program, while the details of task creation and scheduling are managed transparently within the runtime system. Furthermore, the hints do not affect the semantics of the program and allow the programmer to easily experiment with different optimizations. COOL has been implemented on several shared-memory machines, including the Stanford DASH multiprocessor. We have programmed a variety of applications in COOL, including many from the SPLASH parallel benchmark suite. Our experience has been promising: the applications are easily expressed in COOL, and perform as well as hand-tuned codes using lower-level primitives. Furthermore, supplying hints has proven to be an easy and effective way of improving program performance. This thesis therefore demonstrates that (a) the simple but powerful constructs in COOL can effectively exploit task-level parallelism across a variety of application programs, (b) an object-based approach improves both the expressiveness and the performance of parallel programs, and (c) improving data locality can be simple through a combination of programmer abstractions and smart scheduling mechanisms. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/676/CSL-TR-95-676.pdf %R CSL-TR-95-677 %Z Mon, 11 Sep 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T SPARC-V9 Architecture Specification with Rapide %A Santoro, Alexandre %A Park, Woosang %A Luckham, David %D September 1995 %X This report presents an approach to creating an executable standard for the SPARC-V9 instruction set architecture using Rapide-1.0, a language for modeling and prototyping distributed systems. It describes the desired characteristics of a formal specification of the architecture and shows how Rapide can be used to build a model with these characteristics. This is followed by the description of a simple prototype of the proposed model, and a discussion of the issues involved in building and testing the complete specification (with emphasis on some Rapide-specific features such as constraints, causality and mapping). The report concludes with a brief evaluation of the proposed model and suggestions on future areas of research. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/677/CSL-TR-95-677.pdf %R CSL-TR-95-679 %Z Thu, 30 Nov 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Measuring the Complexity of SRT Tables %A Oberman, Stuart F. %A Flynn, Michael J. %D November 1995 %X This paper presents an analysis of the complexity of quotient-digit selection tables in SRT division implementations. SRT dividers use a fixed number of partial remainder and divisor bits to consult a table to select the next quotient-digit in each iteration. The complexity of these tables is a function of the radix, the redundancy, and the number of bits in the estimates of the divisor and partial remainder. This analysis derives the allowable divisor and partial remainder truncations for radix 2 through radix 32, and it quantifies the relationship between table parameters and the number of product terms in the logic equations defining the tables. By mapping the tables to a library of standard-cells, delay and area values were measured and are presented for table configurations through radix 32. The results show that: 1) Gray-coding of the quotient-digits allows for the automatic minimization of the quotient-digit selection logic equations. 2) Using a short carry-assimilating adder with a few more input bits than output bits can reduce table complexity. 3) Reducing the number of bits in the partial remainder estimate and increasing the length of the divisor estimate increases the size and delay of the table, offsetting any performance gain due to the shorter external adder. 4) While delay increases nearly linearly with radix, area increases quadratically, limiting practical table implementations to radix 2 and radix 4. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/679/CSL-TR-95-679.pdf %R CSL-TR-95-681 %Z Wed, 24 Jan 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Netlist Processing for Custom VLSI via Pattern Matching %A Chanak, Thomas Stephen %D November 1995 %X A vast array of CAD tools are available to support the design of integrated circuits. Unfortunately, tool development lags advances in technology and design methodology - the newest, most aggressive custom chips confront design issues that were not anticipated by the currently available set of tools. When existing tools cannot fill a custom design's needs, a new tool must be developed, often in a hurry. This situation arises fairly often, and many of the tools created use, or imply, some method of netlist pattern recognition. If the pattern-oriented facet of these tools could be isolated and unified among a variety of tools, custom tool writers would have a useful building block to start with when confronted with the urgent need for a new tool. Starting with the UNIX pattern-matching, text-processing tool AWK as a model, a pattern-action processing environment was built to test the concept of writing CAD tools by specifying patterns and actions. After implementing a wide variety of netlist processing applications, the refined pattern-action system proved to be a useful and fast way to implement new tools. Previous work in this area had reached the same conclusion, demonstrating the usefulness of pattern recognition for electrical rules checking, simulation, database conversion, and more. Our experiments identified a software building block, the "pattern object", that can construct the operators proposed in other works while maintaining flexibility in the face of changing requirements through the decoupling of global control from a pattern matching engine. The implicit computation of subgraph isomorphism common to pattern matching systems was thought to be a potential runtime performance issue. Our experience contradicts this concern. VLSI netlists tend to be sparse enough that runtimes do not grow unreasonably when a sensible amount of care is taken. Difficulties with the verification of pattern based tools, not performance, present the greatest obstacle to pattern-matching tools. Pattern objects that modify netlists raise the prospect of order dependencies and subtle interactions among patterns, and this interaction is what causes the most difficult verification problems. To combat this problem, a technique that considers an application's entire set of pattern objects and a specific target netlist together can perform analyses that expose otherwise subtle errors. This technique, along with debugging tools built specifically for pattern objects and netlists, allows the construction of trustworthy applications. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/681/CSL-TR-95-681.pdf %R CSL-TR-95-682 %Z Wed, 07 Feb 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High Performance Cache Architectures to Support Dynamic Superscalar Microprocessors %A Wilson, Kenneth M. %A Olukotun, Kunle %D June 1995 %X Simple cache structures are not sufficient to provide the memory bandwidth needed by a dynamic superscalar computer, so more sophisticated memory hierarchies such as non-blocking and pipelined caches are required. To provide direction for the designers of modern high performance microprocessors, we investigate the performance tradeoffs of the combinations of cache size, blocking and non-blocking caches, and pipeline depth of caches within the memory subsystem of a dynamic superscalar processor for integer applications. The results show that the dynamic superscalar processor can hide about two-thirds of the additional latency of two and three pipelined caches, and that a non-blocking cache is always beneficial. A pipelined cache will only outperform a non-pipelined cache if the miss penalty and miss rates are large. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/682/CSL-TR-95-682.pdf %R CSL-TR-95-683 %Z Mon, 22 Jan 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Comparison of Hardware Prefetching Techniques For Multimedia Benchmarks %A Z ucker, Daniel F. %A Flynn, Michael J. %A Lee, Ruby B. %D December 1995 %X Data prefetching is a well known technique for improving cache performance. While several studies have examined prefetch strategies for scientific and commercial applications, no published work has studied the special memory requirements of multimedia applications. This paper presents data for three types of hardware prefetching schemes: stream buffers, stride prediction tables, and a hybrid combination of the two, the stream cache. Use of the stride prediction table is shown to eliminate up to 90% of the misses that would otherwise be incurred in a moderate or large sized cache with no prefetching hardware. The stream cache, proposed for the first time in this paper, has the potential to cut execution times by more than half by the addition of a relatively small amount of additional hardware. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/683/CSL-TR-95-683.pdf %R CSL-TR-95-684 %Z Mon, 22 Jan 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance/Area Tradeoffs in Booth Multipliers %A Al-Twaijry, Hesham %A Flynn, Michael J. %D November 1995 %X Booth encoding is a method of reducing the number of summands required to produce the multiplication result. This paper compares the performance/area tradeoffs for the different Booth algorithms when trees are used as the summation network. This paper shows that the simple non-Booth algorithm is not a viable design, and that currently Booth 2 is the best design. It also points out that in the future Booth 3 may offer the best performance/area ratio. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/684/CSL-TR-95-684.pdf %R CSL-TR-95-686 %Z Wed, 14 Feb 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Synthesis of Burst-Mode Asynchronous Controllers %A Nowick, Steven Mark %D December 1995 %X Asynchronous design has enjoyed a revival of interest recently, as designers seek to eliminate penalties of traditional synchronous design. In principle, asynchronous methods promise to avoid overhead due to clock skew, worst-case design assumptions and resynchronization of asynchronous external inputs. In practice, however, many asynchronous design methods suffer from a number of problems: unsound algorithms (implementations may have hazards), harsh restrictions on the range of designs that can be handled (single-input changes only), incompatibility with existing design styles and inefficiency in the resulting circuits. This thesis presents a new locally-clocked design method for the synthesis of asynchronous controllers. The method has been automated, is proven correct and produces high-performance implementations which are hazard-free at the gate-level. Implementations allow multiple-input changes and handle a relatively unconstrained class of behaviors (called "burst-mode" specifications). The method produces state-machine implementations with a minimal or near-minimal number of states. Implementations can be easily built in such common VLSI design styles as gate-array, standard cell and full-custom. Realizations typically have the latency of their combinational logic. A complete set of state and logic minimization algorithms has been developed and automated for the synthesis method. The logic minimization algorithm differs from existing algorithms since it generates two-level minimized logic which is also hazard-free. The synthesis program is used to produce competitive implementations for several published designs. In addition, a large real-world controller is designed as a case study: an asynchronous second-level cache controller for a new RISC processor. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/686/CSL-TR-95-686.pdf %R CSL-TR-95-678 %Z Mon, 11 Sep 95 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fast Volume Rendering Using a Shear-Warp Factorization of the Viewing Transformation %A Lacroute, Philippe %D September 1995 %X Volume rendering is a technique for visualizing 3D arrays of sampled data. It has applications in areas such as medical imaging and scientific visualization, but its use has been limited by its high computational expense. Early implementations of volume rendering used brute-force techniques that require on the order of 100 seconds to render typical data sets on a workstation. Algorithms with optimizations that exploit coherence in the data have reduced rendering times to the range of ten seconds but are still not fast enough for interactive visualization applications. In this thesis we present a family of volume rendering algorithms that reduces rendering times to one second. First we present a scanline-order volume rendering algorithm that exploits coherence in both the volume data and the image. We show that scanline-order algorithms are fundamentally more efficient than commonly-used ray casting algorithms because the latter must perform analytic geometry calculations (e.g. intersecting rays with axis-aligned boxes). The new scanline-order algorithm simply streams through the volume and the image in storage order. We describe variants of the algorithm for both parallel and perspective projections and a multiprocessor implementation that achieves frame rates of over 10 Hz. Second we present a solution to a limitation of existing volume rendering algorithms that use coherence accelerations: they require an expensive preprocessing step every time the volume is classified (i.e. when opacities are assigned to the samples), thereby limiting the usefulness of the algorithms for interactive applications. We introduce a data structure for encoding spatial coherence in unclassified volumes. When combined with our rendering algorithm this data structure allows us to build a fully-interactive volume visualization system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/678/CSL-TR-95-678.pdf %R CSL-TR-95-680 %Z Tue, 25 Jun 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Designing a Multicast Switch Scheduler %A Prabhakar, Balaji %A McKeown, Nick W. %D November 1995 %X This paper presents the design of the scheduler for an M x N input-queued switch. It is assumed that each input maintains a single queue for arriving multicast cells and that only the cell at the head of line (HOL) can be observed and scheduled at one time. The scheduler is required to be work-conserving, which means that no output port may be idle as long as there is an input cell destined to it. Furthermore, the scheduler is required to be fair, which means that no input cell may be held at HOL for more than M cell times (M is the number of input ports). The aim is to find a work-conserving, fair policy that delivers maximum throughput and minimizes input queue latency. When a scheduling policy decides which cells to schedule, contention may require that it leave a residue of cells to be scheduled in the next cell time. The selection of where to place the residue uniquely defines the scheduling policy. It is demonstrated that a policy which always concentrates the residue, subject to our fairness constraint, always outperforms all other policies. We present one such policy, called TATRA, and analyze it geometrically. We also present a heuristic round-robin policy called mRRM that is simple to implement in hardware, fair, and performs quite well when compared to a concentrating algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/680/CSL-TR-95-680.pdf %R CSL-TR-95-685 %Z Tue, 25 Jun 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Memory Consistency Models for Shared-Memory Multiprocessors %A Gharachorloo, Kourosh %D December 1995 %X The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the programmer, an implementation can exploit the full range of optimizations enabled by previous models. Furthermore, the same information enables automatic and efficient portability across a wide range of implementations. To expose the optimizations enabled by a model, we have developed a formal framework for specifying the low-level ordering constraints that must be enforced by an implementation. Based on these specifications, we present a wide range of architecture and compiler implementation techniques for efficiently supporting a given model. Finally, we evaluate the performance benefits of exploiting relaxed models based on detailed simulations of realistic parallel applications. Our results show that the optimizations enabled by relaxed models are extremely effective in hiding virtually the full latency of writes in architectures with blocking reads (i.e., processor stalls on reads), with gains as high as 80\%. Architectures with non-blocking reads can further exploit relaxed models to hide a substantial fraction of the read latency as well, leading to a larger overall performance benefit. Furthermore, these optimizations complement gains from other latency hiding techniques such as prefetching and multiple contexts. We believe that the combined benefits in hardware and software will make relaxed models universal in future multiprocessors, as is already evidenced by their adoption in several commercial systems. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/95/685/CSL-TR-95-685.pdf %R CSL-TR-96-687 %Z Thu, 08 Feb 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Latency Tolerance for Dynamic Processors %A Bennett, James E. %A Flynn, Michael J. %D January 1996 %X While a number of dynamically scheduled processors have recently been brought to market, work on hardware techniques for tolerating memory latency has mostly targeted statically scheduled processors. This paper attempts to remedy this situation by examining the applicability of hardware latency tolerance techniques to dynamically scheduled processors. The results so far indicate that the inherent ability of the dynamically scheduled processor to tolerate memory latency reduces the need for additional hardware such as stream buffers or stride prediction tables. However, the technique of victim caching, while not usually considered as a latency tolerating technique, proves to be quite effective in aiding the dynamically scheduled processor in tolerating memory latency. For a fixed size investment in microprocessor chip area, the victim cache outperforms both stream buffers and stride prediction. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/687/CSL-TR-96-687.pdf %R CSL-TR-96-688 %Z Tue, 13 Feb 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T OS Support for Improving Data Locality on CC-NUMA Compute Servers %A Verghese, Ben %A Devine, Scott %A Gupta, Anoop %A Rosenblum, Mendel %D February 1996 %X The dominant architecture for the next generation of cache-coherent shared-memory multiprocessors is CC-NUMA (cache-coherent non-uniform memory architecture). These machines are attractive as compute servers, because they provide transparent access to local and remote memory. However, the access latency to remote memory is 3 - 5 times the latency to local memory. Given the large remote access latencies, data locality is potentially the most important performance issue. In compute-server workloads, when moving processes between nodes for load balancing, to maintain data locality the OS needs to do page-migration and page-replication. Through trace-analysis and actual runs of realistic workloads, we study the potential improvements in performance provided by OS supported dynamic migration and replication. Analyzing our kernel-based implementation of the policy, we provide a detailed breakdown of the costs and point out the functions using the most time. We study alternatives to using full-cache miss information to drive the policy, and show that sampling of cache misses can be used to reduce cost without compromising performance, and that TLB misses are inconsistent as an approximation for cache misses. Finally, our workload runs show that OS supported dynamic page-migration and page-replication can substantially increase performance, as much as 29%, in some workloads. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/688/CSL-TR-96-688.pdf %R CSL-TR-96-689 %Z Tue, 20 Feb 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Variable Latency Pipelined Floating-Point Adder %A Oberman, Stuart F. %A Flynn, Michael J. %D February 1996 %X Addition is the most frequent floating-point operation in modern microprocessors. Due to its complex shift-add-shift-round dataflow, floating-point addition can have a long latency. To achieve maximum system performance, it is necessary to design the floating-point adder to have minimum latency, while still providing maximum throughput. This paper proposes a new floating-point addition algorithm which exploits the ability of dynamically-scheduled processors to utilize functional units which complete in variable time. By recognizing that certain operand combinations do not require all of the steps in the complex addition dataflow, the average latency is reduced. Simulation on SPECfp92 applications demonstrates that a speedup in average addition latency of 1.33 can be achieved using this algorithm, while still maintaining single cycle throughput. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/689/CSL-TR-96-689.pdf %R CSL-TR-96-691 %Z Wed, 13 Mar 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T PPP: A Gate-Level Power Simulator - A World Wide Web Application %A Bogliolo, Alessandro %A Benini, Luca %A DeMicheli, Giovanni %A Ricco, Bruno %D March 1996 %X Power consumption is an increasingly important constraint for complex ICs. Accurate and efficient power estimations are required at any level of abstraction to steer the design process. PPP is a Web-based integrated environment for synthesis and simulation of low-power CMOS circuits. We describe the simulation engine of PPP and we propose a new paradigm for tool integration. The simulation engine of PPP is a gate-level simulator that achieves accuracy comparable with electrical simulation, while keeping performance competitive with traditional gate-level techniques. This is done by using advanced symbolic models of the basic library cells, that exploit the understanding of the main phenomena involved in power consumption. In order to maintain full compatibility with gate-level design tools, we use VERILOG-XL as simulation platform. The accuracy obtained on benchmark circuits is always within 6% from SPICE also for single-gate/single-pattern power analysis, thus providing the local information needed to optimize the design. Interface and tool integration issues have been addressed using a Web-based approach. The graphical interface of PPP is a dynamically generated tree of interactive HTML pages that allow the user to access and execute the tool through the Internet by using his/her own Web-browser. No software installation is required and all the details of data transfer and tool communication are hidden to the user. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/691/CSL-TR-96-691.pdf %R CSL-TR-96-690 %Z Mon, 01 Apr 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Analysis and Synthesis of Concurrent Digital Systems Using Control-Flow Expressions %A Coelho, Claudionor Jose Nunes Jr. %D March 1996 %X We present in this thesis a modeling style and control synthesis technique for system-level specifications that are better described as a set of concurrent descriptions, their synchronizations and complex constraints. For these types of specifications, conventional synthesis tools will not be able to enforce design constraints because these tools are targeted to sequential components with simple design constraints. In order to generate controllers satisfying the constraints of system-level specifications, we propose a synthesis tool called Thalia that considers the degrees of freedom introduced by the concurrent models and by the system's environment. The synthesis procedure will be subdivided into the following steps: We first model the specification in an algebraic formalism called control-flow expressions, that considers most of the language constructs used to model systems reacting to their environment, i.e. sequential, alternative, concurrent, iterative, and exception handling behaviors. Such constructs are found in languages such as C, Verilog HDL, VHDL, Esterel and StateCharts. Then, we convert this model and a suitable representation for the environment into a finite-state machine, where the system is analyzed, and design constraints such as timing, resource and synchronization are incorporated. In order to generate the control-units for the design, we present two scheduling procedures. The first procedure, called static scheduling, attempts to find fixed schedules for operations satisfying system-level constraints. The second procedure, called dynamic scheduling, attempts to synchronize concurrent parts of a circuit description by dynamically selecting schedules according to a global view of the system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/690/CSL-TR-96-690.pdf %R CSL-TR-96-694 %Z Mon, 22 Apr 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Analysis and Synthesis of Concurrent Digital Circuits Using Control-Flow Expressions %A Coelho, Claudionor Nunes Jr. %A DeMicheli, Giovanni %D April 1996 %X We present in this paper a novel modeling style and control synthesis technique for system-level specifications that are better described as a set of concurrent descriptions, their synchronizations and constraints. The proposed synthesis procedure considers the degrees of freedom introduced by the concurrent models and by the environment in order to satisfy the design constraints. Synthesis is divided in two phases. In the first phase, the original specification is translated into an algebraic system, for which complex control-flow constraints and quantifiers of the design are introduced. In the second phase, we translate the algebraic formulation into a finite-state representation, and we derive an optimal control-unit implementation for each individual concurrent part. In the implementation of the controllers from the finite-state representation, we use flexible objective functions, which allows designers to better control the goals of the synthesis tool, and thus incorporate as much as possible their knowledge about the environment and the design. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/694/CSL-TR-96-694.pdf %R CSL-TR-96-696 %Z Mon, 10 Jun 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Computer Assisted Analysis of Multiprocessor Memory Systems %A Park, Seungjoon %D June 1996 %X In a shared memory multiprocessor architecture, a memory model describes the behavior of the memory system as observed at the user-level. A cache coherence protocol aims to conform to a memory model by maintaining consistency among the multiple copies of cached data and the data in main memory. Memory models and cache coherence protocols can be quite complex and subtle, creating a real possibility of misunderstandings and actual design errors. In this thesis, we will present solutions to these problems. Though weaker memory models for multiprocessor systems allow higher-performance implementation techniques, they are also very subtle. Hence, it is vital to specify memory models precisely and to verify that the programs running under a memory model satisfy desired properties. Our approach to these problems is to write an executable specification of the memory model using a high-level description language for concurrent systems. This executable description provides a precise specification of the machine architecture for implementors and programmers. Moreover, the availability of formal verification tools allows users to experiment with the effects of the memory model on small assembly-language routines. Running the verifier can be very effective at clarifying the subtle details of the models and synchronization routines. Cache coherence protocols, like other protocols for distributed systems, simulate atomic transactions in environments where atomic implementations are impossible. Based on this observation, we propose a verification method which compares an implementation with a specification representing the desired abstract behavior. The comparison is done through an aggregation function, which maps the sequence of implementation steps for each transaction to the corresponding transaction step in the specification. The aggregation approach is applied to verification of the cache coherence protocol in the FLASH multiprocessor system. The protocol, consisting of more than a hundred implementation steps, is proved to conform to a reduced description with six kinds of atomic transactions. From the reduced behavior, it is very easy to prove crucial properties of the protocol including data consistency of cached copies. The aggregation method is also used to prove that the reduced protocol satisfies a desired memory consistency model. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/696/CSL-TR-96-696.pdf %R CSL-TR-96-692 %Z Wed, 12 Jun 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Delay Balancing of Wave Pipelined Multiplier Counter Trees Using Pass Transistor Multiplexers %A Kishigami, Hidechika %A Nowka, Kevin J. %A Flynn, Michael J. %D January 1996 %X Wave pipelining is an attractive technique used in high-speed digital circuits to speed-up pipeline clock-rate by eliminating the synchronizing elements between pipeline stages. Wave-pipelining has been successfully applied to the design of CMOS multipliers which have demonstrated speed-ups of clock-rate 4 to 7 times over their non-pipelined design. In order to achieve high clock-rate by using wave-pipelining techniques, it is necessary to equalize (balance) all signal path delay of the circuit. In an earlier study a multiplier was designed by using only 2-inputs NAND gates and inverters as primitives in order to reduce delay variations of the circuit. Alternatively, there are several reports that use pass-transistor logic as primitives for multipliers to achieve very low latency. Pass-transistor logic seems attractive for reducing circuit delay variations. In this report we describe a design of wave-pipelined counter tree, which is a central part of parallel multiplier, and detail a method to balance the delay of (4,2) counter using pass-transistor multiplexers (PTMs) as primitives to achieve both higher clock-rate and smaller latency. Simulations of the wave-pipelined counter tree demonstrated 0.8ns clock-rate and 2.33ns latency through the use of pass-transistor multiplexers (PTMs) for a 0.8$\mu$m CMOS process. This data suggests that using pass-transistor multiplexers as primitives for wave-pipelined circuits is useful to achieve both higher clock-rate and lower latency. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/692/CSL-TR-96-692.pdf %R CSL-TR-96-693 %Z Wed, 12 Jun 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High-Performance CMOS System Design Using Wave Pipelining %A Nowka, Kevin J. %D January 1996 %X Wave pipelining, or maximum rate pipelining, is a circuit design technique that allows digital synchronous systems to be clocked at rates higher than can be achieved with conventional pipelining techniques. It relies on the predictable finite signal propagation delay through combinational logic for virtual data storage. Wave pipelining of combinational circuits has been shown to achieve clock rates 2 to 7-times those possible for the same circuits with conventional pipelining. Conventional pipelined systems allow data to propagate from a register through the combinational network to another register prior to initiating the subsequent data transfer. Thus, the maximum operating frequency is determined by the maximum propagation delay through the longest pipeline stage. Wave pipeline systems apply the subsequent data to the network as soon as it can be guaranteed that it will not interfere with the current data wave. The maximum operating frequency of a wave pipeline is therefore determined by the difference between the maximum propagation delay and the minimum propagation delay through the combinational logic. By minimizing variations in delay, the performance of wave pipelining is maximized. Data wave interference in CMOS VLSI circuits is the result of the variation in the propagation delay due to path length differences, differences in the state of the network inputs and intermediate nodes, and difference in fabrication and environmental conditions. To maximize the performance of wave pipelined circuits, the path length variations through the combinational logic must be minimized. A method of modifying the transistor geometries of individual static CMOS gates so as to tune their delays has been developed. This method is used by CAD tools that minimize the path length variation. These tools are used to equalize delays within a wave pipelined logic block and to synchronize separate wave pipelined units which share a common reference clock. This method has been demonstrated to limit the variation in delay of CMOS circuits to less than 20%. Delay models have demonstrated that temperature variation, supply power variations, and noise limit the number of concurrent waves in CMOS wave pipelined systems to three or less. Run-to-run process variation can have a significant impact on CMOS VLSI signal propagation delay. The ratio of maximum to minimum delay along the same path for seven different runs of a 0.8-micron feature size fabrication process was found to be 1.35. Unless this variation is controlled, the speedup of wave pipelining is limited to two to three to ensure that devices from any of these runs will operate. When aggregated with variations due to environmental factors, the maximum speed-up of a wave pipeline is less than two. To counteract the effects of process variation, an adaptive supply voltage technique has been developed. An on-chip detector circuit determines when delays are faster than the nominal delays and the power supply is lowered accordingly. In this manner, ICs fabricated with fast processes are run at a lower supply voltage to ensure correct operation at the design target frequency. To demonstrate that wave pipeline technology can be applied to VLSI system design, a CMOS wave pipelined vector unit has been developed. Extensive use of wave pipelining was employed to achieve high clock rates in the functional units. The VLSI processor consists of a wave pipelined vector register file, a wave pipelined adder, a wave pipelined multiplier, load and store units, an instruction buffer, a scoreboard, and control logic. The VLSI vector unit contains approximately 47000 transistors and occupies an area of 43 sq mm. It has been fabricated in a 0.8-micron CMOS technology. Tests indicate wave pipelined operation at a maximum rate of 303MHz. An equivalent vector unit design using traditional latch-based pipelining was designed and simulated. The latch-based design occupied 2% more die area, operated with a 35% longer clock period, and had multiply latency 8% longer and add latency 11% longer than the wave pipelined vector unit. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/693/CSL-TR-96-693.pdf %R CSL-TR-96-695 %Z Wed, 12 Jun 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Producer-Oriented versus Consumer-Oriented Prefetching: a Comparison and Analysis of Parallel Application Programs %A Ohara, Moriyoshi %D June 1996 %X Due to large remote-memory latencies, reducing the impact of cache misses is critical for large scale shared-memory multiprocessors. This thesis quantitatively compares two classes of software-controlled prefetch schemes for reducing the impact: consumer-oriented and producer-oriented schemes. Examining the behavior of these schemes leads us to characterize the communication behavior of parallel application programs. Consumer-oriented prefetch has been shown to be effective for hiding large memory latencies. Producer-oriented prefetch (called deliver), on the other hand, has not been extensively studied. Our implementation of deliver uses a hardware mechanism that tracks the set of potential consumers based on past sharing patterns. Qualitatively, deliver has an advantage since the producer sends the datum as soon as, but not before, it is ready for use. In contrast, prefetch may fetch the datum too early so that it is invalidated before use, or may fetch it too late so that the datum is not yet available when it is needed by the consumer. Our simulation results indeed show that the qualitative advantage of deliver can yield a slight performance advantage when the cache size and the memory latency are very large. Overall, however, deliver turns out to be less effective than prefetch for two reasons. First, prefetch benefits from a "filtering effect," and thus generates less traffic than deliver. Second, deliver suffers more from cache interference than prefetch. The sharing and temporal characteristics of a set of parallel applications are shown to account for the different behavior of the two prefetch schemes. This analysis shows the inherent difficulties in predicting future communication behavior of parallel applications from recent history of the application behavior. This suggests that cache accesses involved with coherency in general are much less predictable based on past behavior than other types of cache behavior. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/695/CSL-TR-96-695.pdf %R CSL-TR-96-698 %Z Tue, 16 Jul 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Technology Scaling Effects on Multipliers %A Al-Twaijry, Hesham %A Flynn, Michael J. %D July 1996 %X Booth encoding is a method of reducing the number of summands required to produce the multiplication result. This paper compares the performance/area tradeoffs for the different Booth algorithms when trees are used as the summation network. This paper shows that the simple non-Booth algorithm is not an efficient design, and that for small feature sizes the performance for the different Booth encoding schemes are comparable in terms of delay. The report also quantifies the effects of wires on the multiplier. As the feature size continues to decrease, wires will provide an ever increasing portion of the total delay. Booth 3 becomes more attractive since it is smaller. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/698/CSL-TR-96-698.pdf %R CSL-TR-96-700 %Z Tue, 23 Jul 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fast IEEE Rounding for Division by Functional Iteration %A Oberman, Stuart F. %A Flynn, Michael J. %D July 1996 %X A class of high performance division algorithms is functional iteration. Division by functional iteration uses multiplication as the fundamental operator. The main advantage of division by functional iteration is quadratic convergence to the quotient. However, unlike non-restoring division algorithms such as SRT division, functional iteration does not directly provide a final remainder. This makes fast and exact rounding difficult. This paper clarifies the methodology for correct IEEE compliant rounding for quadratically-converging division algorithms. It proposes an extension to previously reported techniques of using extended precision in the computation to reduce the frequency of back multiplications required to obtain the final remainder. Further, a technique applicable to all IEEE rounding modes is presented which replaces the final subtraction for remainder computation with very simple combinational logic. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/700/CSL-TR-96-700.pdf %R CSL-TR-96-699 %Z Mon, 09 Sep 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Efficient Multiprocessor Communications: Networks, Algorithms, Simulation, and Implementation %A Lu, Yen-Wen %D July 1996 %X As technology and processing power continue to improve, inter-processor communication becomes a performance bottleneck in a multiprocessor network. In this dissertation, an enhanced 2-D torus with segmented reconfigurable bus (SRB) to overcome the delay due to long distance communications was proposed and analyzed. A procedure of selecting an optimal segment length and segment alignment based on minimizing the lifetime of a packet and reducing the interaction between segments was developed to design a SRB network. Simulation shows that a torus with SRB is more than twice as efficient as a traditional torus. Efficient use of channel bandwidth is an important issue in improving network performance. The communication links between two adjacent nodes can be organized as a pair of opposite uni-directional channels, or combined into a single bi-directional channel. A modified channel arbitration scheme with hidden delay, called ``token-exchange,'' was designed for the bi-directional channel configuration. In spite of the overhead of channel arbitration, simulation shows that bi-directional channels have significantly better latency-throughput performance and can sustain higher data bandwidth relative to uni-directional channels of the same channel width. For example, under 2% hot-spot traffic, bi-directional channels can support 80% more bandwidth without saturation compared with uni-directional channels. An efficient, low power, wormhole data router chip for 2-D mesh and torus networks with bi-directional channels and token-exchange arbitration was designed and implemented. The token-exchange delay is fully hidden and no latency penalty occurs when there is no traffic contention; the token-exchange delay is also negligible when the contention is high. Distributed decoders and arbiters are provided for each of four IO ports, and a fully-connected 5x6 crossbar switch increases parallelism of data routing. The router also provides special hardware such as flexible header decoding and switching to support path-based multicasting. From measured results, multicasting with two destinations used only 1/3 of the energy required for unicasting. The wormhole router was fabricated using MOSIS/HP 0.6um technology. It delivers 1.6Gb/s (50MHz) @ Vdd=2.1V, consuming an average power of 15mW. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/699/CSL-TR-96-699.pdf %R CSL-TR-96-701 %Z Thu, 29 Aug 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Characterization of Quality and Traffic for Various Video Encoding Schemes and Various Encoder Control Schemes %A Dalgic, Ismail %A Tobagi, Fouad A. %D August 1996 %X Lossy video compression algorithms, such as those used in the H.261, MPEG, and JPEG standards, result in quality degradation seen in the form of digital tiling, edge busyness, and mosquito noise. The encoder parameters (typically, the so-called quantizer scale) can be adjusted to trade-off encoded video quality and bit rate. Clearly, when more bits are used to represent a given scene, the quality gets better. However, for a given set of encoder parameter values, both the generated traffic and the resulting quality depend on the scene content. Therefore, in order to achieve certain quality and traffic objectives at all times, the encoder parameters must be appropriately adjusted according to the scene content. Currently, two schemes exist for setting the encoder parameters. The most commonly used scheme today is called Constant Bit Rate (CBR), where the encoder parameters are controlled to achieve a target bit rate over time by considering a hypothetical rate control buffer at the encoder's output which is drained at the target bit rate; the buffer occupancy level is used as feedback to control the quantizer scale. In a CBR encoded video stream, the quality varies in time, since the quantizer scale is controlled to achieve a constant bit rate regardless of the scene complexity. In the other existing scheme, called Open-Loop Variable Bit Rate (OL-VBR), all encoder parameters are simply kept fixed at all times. The motivation behind this scheme is to presumably provide a more consistent video quality compared to CBR encoding. In this report, we characterize the traffic and quality for the CBR and OL-VBR schemes by using several video sequences of different spatial and temporal characteristics, encoded using the H.261, MPEG, and motion-JPEG standards. We investigate the effect of the controller parameters (i.e., for CBR, target bit rate and rate control buffer size, and for OL-VBR, the fixed quantizer scale) and video content on the resulting traffic and quality. We show that with the CBR and OL-VBR schemes, the encoder control parameters can be chosen so as to achieve or exceed a given quality objective at all times; however, this can only be done by producing more bits than needed during some of the scenes. In order to produce only as many bits as needed to achieve a given quality objective, we propose a video encoder control scheme which maintains the quality of the encoded video at a constant level, referred to as Constant Quality VBR (CQ-VBR). This scheme is based on a quantitative video quality metric which is used in a feedback control mechanism to adjust the encoder parameters. We determine the appropriate feedback functions for the H.261, MPEG, and motion-JPEG standards. We show that this scheme is indeed able to achieve a constant quality at all times; however, the resulting traffic occasionally contains bursts of relatively high-magnitude (5-10 times the average), but short duration (5-15 frames). We then introduce a modification to this scheme, where in addition to the quality, the peak rate of the traffic is also controlled. We show that with the modified scheme, it is possible to achieve nearly constant video quality while keeping the peak rate within 2-3 times the average. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/701/CSL-TR-96-701.pdf %R CSL-TR-96-702 %Z Thu, 05 Sep 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance Evaluation of Ethernets and ATM Networks Carrying Video Traffic %A Dalgic, Ismail %A Tobagi, Fouad A. %D August 1996 %X In this report the performance of Ethernets (10Base-T and 100Base-T) and ATM networks carrying multimedia traffic is presented. End-to-end delay requirements suitable for a wide range of multimedia applications are considered (ranging from 20 ms to 500 ms). Given the specific nature of the network considered and the maximum latency requirement, some data is lost. Data loss at the receiver causes quality degradations in the displayed video in the form of discontinuities, referred to as glitches. We define various quantities characterizing the glitches, namely, the total amount of information lost in glitches, their duration, and the rate at which glitches occur. We study these quantities for various network and traffic scenarios, using a computer simulation model driven by real video traffic generated by encoding video sequences. We also determine the maximum number of video streams that can be supported for given maximum delay requirement and glitch rate. We consider and compare the results for various types of video contents (video conferencing, motion pictures, commercials), two encoding schemes (H.261 and MPEG-1), and two encoder control schemes [Constant Bit Rate (CBR) and Constant-Quality Variable Bit Rate (CQ-VBR)], considering also scenarios where the traffic consists of various mixtures of the above. We show that when the video content is highly variable, both 100Base-T Ethernet and ATM can support many more CQ-VBR streams than CBR streams. When the video content is not much variable, as in a videoconferencing sequence, then the number of CBR and CQ-VBR streams that can be supported are comparable. For low values of end-to-end delay requirement, we show that ATM networks can support up to twice as many video streams of a given type as Ethernets for a channel capacity of 100Mb/s. For relaxed end-to-end delay requirements, both networks can support about the same number of video streams of a given type. We also determine the number of streams supportable for traffic scenarios consisting of mixtures of heterogeneous video traffic sources in terms of the video content, video encoding scheme and encoder control scheme, as well as the end-to-end delay requirement. We then consider multihop ATM network scenarios, and provide admission control guidelines for video when the network topology is an arbitrary mesh. Finally, we consider scenarios with mixtures of video and data traffic (with various degrees of burstiness), and determine the effect of one traffic type over the other. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/702/CSL-TR-96-702.pdf %R CSL-TR-96-705 %Z Mon, 09 Sep 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Rapide: A Language and Toolset for Simulation of Distributed Systems by Partial Orderings of Events. %A Luckham, David C. %D September 1996 %X This paper describes the RAPIDE concepts of system architecture, causal event simulation, and some of the tools for viewing and analysis of causal event simulations. Illustration of the language and tools is given by a detailed small example. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/705/CSL-TR-96-705.pdf %R CSL-TR-96-706 %Z Wed, 25 Sep 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimum Placement and Routing of Multiplier Partial Product Trees %A Al-Twaijry, Hesham %A Flynn, Michael J. %D September 1996 %X An algorithm that builds a multiplier under the constraint of a limited number of wiring tracks is designed. The algorithm has been implemented. The program is then used to compare several designs of an IEEE floating point multiplier using several delay models. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/706/CSL-TR-96-706.pdf %R CSL-TR-96-703 %Z Thu, 10 Oct 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Test Point Insertion for Non-Feedback Bridging Faults %A Touba, Nur A. %A McCluskey, Edward J. %D August 1996 %X This paper studies pseudo-random pattern testing of bridging faults. Although bridging faults are generally more random pattern testable than stuck-at faults, examples are shown to illustrate that some bridging faults can be much less random pattern testable than stuck-at faults. A fast method for identifying these random-pattern-resistant bridging faults is described. It is shown that state-of-the-art test point insertion techniques, which are based on the stuck-at fault model, are inadequate. Data is presented which indicates that even after inserting test points that result in 100% single stuck-at fault coverage, many bridging faults are still not detected. A test point insertion procedure that targets both single stuck-at faults and non-feedback bridging faults is presented. It is shown that by considering both types of faults when selecting the location for test points, higher fault coverage can be obtained with little or no increase in overhead. Thus, the test point insertion procedure described here is a low-cost way to improve the quality of built-in self-test. While this paper considers only non-feedback bridging faults, the techniques that are described can be applied to feedback bridging faults in a straightforward manner. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/703/CSL-TR-96-703.pdf %R CSL-TR-96-704 %Z Thu, 10 Oct 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Synthesis Techniques for Pseudo-Random Built-In Self-Test %A Touba, Nur A. %D August 1996 %X Built-in self-test (BIST) techniques enable an integrated circuit (IC) to test itself. BIST reduces test and maintenance costs for an IC by eliminating the need for expensive test equipment and by allowing fast location of failed ICs in a system. BIST also allows an IC to be tested at its normal operating speed which is very important for detecting timing faults. Despite all of these advantages, BIST has seen limited use in industry because of area and performance overhead and increased design time. This dissertation presents automated techniques for implementing BIST in a way that minimizes area and performance overhead. A low-overhead approach for BIST is to use a linear feedback shift register (LFSR) to apply pseudorandom test patterns to the circuit-under-test. Unfortunately, many circuits contain random-pattern-resistant faults which limit the fault coverage that can be obtained for pseudo-random BIST. Several different approaches for solving this problem are presented. A logic synthesis procedure that performs testability-driven factoring to generate a random pattern testable design is presented. By considering random pattern testability during the factoring process, the overhead can be minimized. For hand-designed circuits or circuits that are not synthesizable, an innovative test point insertion procedure is described for inserting test points to make the circuit random pattern testable. A path tracing procedure is used for test point placement. A few of the existing primary inputs are ANDed together to form signals that drive the control points. These innovations result in fewer test points than previous methods. If it is not possible or not desirable to modify the circuit-under-test, then a procedure is described for synthesizing mapping logic that can placed at the output of the LFSR to transform the pseudorandom patterns so that they provide the required fault coverage. Much less overhead is required compared with weighted pattern testing methods. Lastly, a technique is described for placing bitfixing logic at the serial output of an LFSR to embed deterministic test patterns for the random pattern resistant faults in the pseudorandom bit sequence. This method does not require any performance overhead beyond what is needed for scan. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/704/CSL-TR-96-704.pdf %R CSL-TR-96-711 %Z Thu, 12 Dec 96 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design Issues in High Performance Floating Point Arithmetic Units %A Oberman, Stuart Franklin %D December 1996 %X In recent years computer applications have increased in their computational complexity. The industry-wide usage of performance benchmarks, such as SPECmarks, forces processor designers to pay particular attention to implementation of the floating point unit, or FPU. Special purpose applications, such as high performance graphics rendering systems, have placed further demands on processors. High speed floating point hardware is a requirement to meet these increasing demands. This work examines the state-of-the-art in FPU design and proposes techniques for improving the performance and the performance/area ratio of future FPUs. In recent FPUs, emphasis has been placed on designing ever-faster adders and multipliers, with division receiving less attention. The design space of FP dividers is large, comprising five different classes of division algorithms: digit recurrence, functional iteration, very high radix, table look-up, and variable latency. While division is an infrequent operation even in floating point intensive applications, it is shown that ignoring its implementation can result in system performance degradation. A high performance FPU requires a fast and efficient adder, multiplier, and divider. The design question becomes how to best implement the FPU in order to maximize performance given the constraints of silicon die area. The system performance and area impact of functional unit latency is examined for varying instruction issue rates in the context of the SPECfp92 application suite. Performance implications are investigated for shared multiplication hardware, shared square root, on-the-fly rounding and conversion and fused functional units. Due to the importance of low latency FP addition, a variable latency FP addition algorithm has been developed which improves average addition latency by 33% while maintaining single-cycle throughput. To improve the performance and area of linear converging division algorithms, an automated process is proposed for minimizing the complexity of SRT tables. To reduce the average latency of quadratically-converging division algorithms, the technique of reciprocal caching is proposed, along with a method to reduce the latency penalty for exact rounding. A combination of the proposed techniques provides a basis for future high performance floating point units. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/711/CSL-TR-96-711.pdf %R CSL-TR-96-697 %Z Thu, 02 Jan 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Design of SMART: A Scheduler for Multimedia Applications %A Nieh, Jason %A Lam, Monica S. %D June 1996 %X We have created SMART, a Scheduler for Multimedia And Real-Time applications. SMART supports both real-time and conventional computations and provides flexible and accurate control over the sharing of processor time. SMART is able to satisfy real-time constraints in an optimal manner and provide proportional sharing across all real-time and conventional tasks. Furthermore, when not all real-time constraints can be met, SMART satisfies each real-time task's proportional share of deadlines, and adjusts its execution rate dynamically. This technique is especially important for multimedia applications that can operate at different rates depending on the loading condition. This paper presents the design of SMART and provides measured performance results of its effectiveness based on a prototype implementation in the Solaris operating system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/697/CSL-TR-96-697.pdf %R CSL-TR-96-707 %Z Tue, 11 Feb 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Reducing Cache Miss Rates Using Prediction Caches %A Bennett, James E. %A Flynn, Michael J. %D October 1996 %X Processor cycle times are currently much faster than memory cycle times, and the trend has been for this gap to increase over time. The problem of increasing memory latency, relative to processor speed, has been dealt with by adding high speed cache memory. However, it is difficult to make a cache both large and fast, so that cache misses are expected to continue to have a significant performance impact. Prediction caches use a history of recent cache misses to predict future misses, and to reduce the overall cache miss rate. This paper describes several prediction caches, and introduces a new kind of prediction cache, which combines the features of prefetching and victim caching. This new cache is shown to be more effective at reducing miss rate and improving performance than existing prediction caches. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/707/CSL-TR-96-707.pdf %R CSL-TR-96-708 %Z Mon, 31 Mar 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Validation Tools for Complex Digital Designs %A Ho, Chian-Min Richard %D December 1996 %X The functional validation of a complex digital design is a laborious, ad-hoc and open-ended task. Many circuits are too complex to be formally verified in their entirety. Instead, simulation of a register transfer level (RTL) model is used. This research explores techniques to make the validation task more systematic, automated and efficient. This can be accomplished by using information embedded in the RTL model to extract the set of "interesting behaviors" of the design, represented as interacting finite state machines (FSM). If all such interesting behaviors of the RTL could be tested in simulation, the degree of confidence that the design is correct would be substantially higher. This work provides two tools towards this goal. First, a test vector generator is described that uses this information to produce a series of test vectors that exercise all the implemented behaviors of the design in RTL simulation. Secondly, the information can be used as the basis for coverage analysis of a pre-existing test vector suite. Previous coverage metrics, such as toggles on a node in the circuit or code block execution counts, often give good first order indications of how thorough a circuit has been exercised but do not usually give an accurate picture of whether multiple or concurrent events have been exercised. In this thesis, a new method is proposed of analyzing test vector suite coverage based on projecting a minimized control state graph onto control signals that enter the datapath part of the design. The fundamental problem facing any technique that uses state exploration is state space explosion. Two techniques are proposed to minimize this problem; first, a dynamic state graph pruning algorithm based on static analysis of the model structure to provide an exact minimization and second, approximation of the state graph with an estimation of the state space in a more compact representation. These techniques help delay the onset of state explosion, allowing useful information to be obtained and utilized, even for complex designs. Results and practical experiences of applying these techniques to the design of the node controller (MAGIC) of the Stanford FLASH Multiprocessor project are given. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/708/CSL-TR-96-708.pdf %R CSL-TR-96-710 %Z Mon, 14 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Executable Formal Models of Distributed Transaction Systems Based on Event Processing %A Kenney, John %D November 1996 %X This dissertation presents formal models of distributed transaction processing (DTP) that are executable and testable. These models apply a new technology, Rapide, an object-oriented executable architecture description language designed for specifying and prototyping distributed, time-sensitive systems. This dissertation shows how the Rapide technology can be applied to specify, prototype, and test DTP models. In particular, this dissertation specifies a reference architecture for the X/Open DTP industry standard. The reference architecture, written in Rapide, defines architectures and behaviors of systems that comply with the X/Open standard. This dissertation also applies a technique developed previously by Gennart and Luckham for testing applications for conformance with reference architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/96/710/CSL-TR-96-710.pdf %R CSL-TR-97-717 %Z Mon, 07 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Synthesis of Sequential Circuits for Low Power Dissipation %A Benini, Luca %D February 1997 %X In high-performance digital CMOS systems, excessive power dissipation reduces reliability and increases the cost imposed by cooling systems and packaging. Power is obviously the primary concern for portable applications, since battery technology cannot keep the fast pace imposed by Moore's Law, and there is large demand for devices with light batteries and long time between recharges. Computer-Aided Engineering is probably the only viable paradigm for designing state-of-the art VLSI and ULSI systems, because it allows the designer to focus on the high-level trade-offs and to concentrate the human effort on the most critical parts of the design. We present a framework for the computer-aided design of low-power digital circuits. We propose several techniques for automatic power reduction based on paradigms which are widely used by designers. Our main purpose is to provide the foundation for a new generation of CAD tools for power optimization under performance constraints. In the last decade, the automatic synthesis and optimization of digital circuits for minimum area and maximum performance has been extensively investigated. We leverage the knowledge base created by such research, but we acknowledge the distinctive characteristics of power as optimization target. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/717/CSL-TR-97-717.pdf %R CSL-TR-97-713 %Z Thu, 17 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T From the Valley of Heart's Delight to Silicon Valley: A Study of Stanford University's Role in the Transformation %A Tajnai, Carolyn %D January 1997 %X This study examines the role of Stanford University in the transformation from the Valley of Heart's Delight to the Silicon Valley. At the dawn of the Twentieth Century, California's Santa Clara County was an agricultural paradise. Because of the benign climate and thousands of acres of fruit orchards, the area became known as the Valley of Heart's Delight. In the early 1890's, Leland and Jane Stanford donated land in the valley to build a university in memory of their son. Thus, Leland Stanford, Jr., University was founded. In the early 1930's, there were almost no jobs for young Stanford engineering graduates. This was about to change. Although there was no organized plan to help develop the economic base of the area around Stanford University, the concern about the lack of job opportunities for their graduates motivated Stanford faculty to begin the chain of events that led to the birth of Silicon Valley. Stanford University's role in the transformation of the Valley of Heart's Delight into Silicon Valley is history, but it is enduring history. Stanford continues to effect the local economy by spawning new and creative ideas, dreams, and ambitions. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/713/CSL-TR-97-713.pdf %R CSL-TR-97-714 %Z Thu, 17 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Parallelizing Compiler Techniques Based on Linear Inequalities %A Amarasinghe, Saman Prabhath %D January 1997 %X Shared-memory multiprocessors, built out of the latest microprocessors, are becoming a widely available class of computationally powerful machines. These affordable multiprocessors can potentially deliver supercomputer-like performance to the general public. To effectively harness the power of these machines it is important to find all the available parallelism in programs. The Stanford SUIF interprocedural parallelizer we have developed is capable of detecting coarser granularity of parallelism in sequential scientific applications than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of code, frequently requiring modifications to array data structures such as array privatization. Measurements from several standard benchmark suites demonstrate that aggressive interprocedural analyses can substantially advance the capability of automatic parallelization technology. However, locating parallelism is not sufficient in achieving high performance. It is critical to make effective use of the memory hierarchy. In parallel applications, false sharing and cache conflicts between processors can significantly reduce performance. We have developed the first compiler that automatically performs a full suite of data transformations (a combination of transposing, strip-mining and padding). The performance of many benchmarks improves drastically after the data transformations. We introduce a framework based on systems of linear inequalities for developing compiler algorithms. Many of the whole program analyses and aggressive optimizations in our compiler employ this framework. Using this framework general solutions to many compiler problems can be found systematically. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/714/CSL-TR-97-714.pdf %R CSL-TR-97-719 %Z Tue, 15 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Automatic Computation and Data Decomposition for Multiprocessors %A Anderson, Jennifer-Ann Monique %D March 1997 %X Memory subsystem efficiency is critical to achieving high performance on parallel machines. The memory subsystem organization of modern multiprocessor architectures makes their performance highly sensitive to both the distribution of the computation and the layout of the data. A key issue in programming these machines is selecting the computation and data decomposition, the mapping of the computation and data, respectively, across the processors of the machine. A popular approach to the decomposition problem is to require programmers to perform the decomposition analysis themselves, and to communicate that information to the compiler using language extensions. This thesis presents a new compiler algorithm that automatically calculates computation and data decompositions for dense-matrix scientific codes. The core of the algorithm is based on a linear algebra framework for expressing and calculating decompositions. Since the best decompositions may change as different phases of the program are executed, the algorithm also considers re-organizing the data dynamically. The analysis is performed both within and across procedure boundaries so that entire programs can be analyzed. We evaluated the effectiveness of the algorithm by applying it to a suite of benchmark programs. We found that our decomposition analysis and optimization can lead to significant increases in program performance. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/719/CSL-TR-97-719.pdf %R CSL-TR-97-720 %Z Thu, 17 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Simulation Study of IP Switching %A Lin, Steven %A McKeown, Nick %D April 1997 %X Recently there has been much interest in combining the speed of layer-2 switching with the features of layer-3 routing. This has been prompted by numerous proposals, including: IP Switching, Tag Switching, ARIS, CSR, and IP over ATM. In this paper, we study IP Switching and evaluate the performance claims made by Newman et al. In particular, using nine network traces, we study how well IP Switching performs with traffic found in campus, corporate, and Internet Service Provider (ISP) environments. Our main finding is that IP Switching will lead to a high proportion of datagrams that are switched; over 75% in all of the environments we studied. We also investigate the effects that different flow classifiers and various timer values have on performance, and note that some choices can result in a large VC space requirement. Finally, we present recommendations for the flow classifier and timer values, as a function of the VC space of the switch and the network environment being served. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/720/CSL-TR-97-720.pdf %R CSL-TR-97-723 %Z Tue, 22 Apr 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hierarchical Storage Systems for Interactive Video-On-Demand %A Chan, Shueng-Han Gary %A Tobagi, Fouad A. %D April 1997 %X On-demand video servers based on hierarchical storage systems are able to offer high-capacity and low-cost video storage. In such a system, video files are stored in the tertiary level and transferred to the secondary level to be displayed. Designing such servers allowing user interaction with the playbacked video is of great interest. We have conducted a comprehensive study on the architecture and operation of such a VOD server. Our objective is to understand its performance characteristics, so as to design a video server to meet specific application requirements. Applications of interest include distance-learning, movie-on-demand, interactive news, home-shopping, etc. The design of such a server actually involves many design choices pertaining to both architecture and operational procedures. We first study through simulation a baseline system which captures the essential performance characteristics of a hierarchical storage system. Then we extend our study beyond the baseline covering numerous other system variations in terms of architectural parameters and operational procedures. We have also examined various applications characteristics, such as file size and video popularity, on system performance. We demonstrate the usefulness of our results by applying them to the design of a video server taking into account current storage technologies. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/723/CSL-TR-97-723.pdf %R CSL-TR-97-718 %Z Wed, 17 Feb 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fault Tolerance: Methods of Rollback Recovery %A Sunada, Dwight %A Glasco, David %A Flynn, Michael %D March 1997 %X This paper describes the latest methods of rollback recovery for fault-tolerant distributed shared memory (DSM) multiprocessors. This report discusses (1) the theoretical issues that rollback recovery addresses, (2) the 3 major classes of methods for recovery, and (3) the relative merits of each class. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/718/CSL-TR-97-718.pdf %R CSL-TR-97-724 %Z Thu, 19 Jun 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T State Reduction Methods for Automatic Formal Verification %A Ip, C. Norris %D December 1996 %X Validation of industrial designs is becoming more challenging as technology advances. One of the most suitable debugging aids is automatic formal verification. This thesis presents several techniques for reducing the state explosion problem, that is, reducing the number of states that are examined. A major contribution of this thesis is the design of simple extensions to the Murphi description language, which enable us to convert two existing abstraction strategies into two fully automatic algorithms, making these strategies easy to use and safe to apply. These two algorithms rely on two facts about high-level designs: they frequently exhibit structural symmetry, and their behavior is often independent of the exact number of replicated components they contain. Another contribution is the design of a new state reduction algorithm, which relies on reversible rules (transitions that do not lose information) in a system description. This new reduction algorithm can be used simultaneously with the other two algorithms. These techniques, implemented in the Murphi verification system, have been applied to many applications, such as cache coherence protocols and distributed algorithms. In the cases of two important classes of infinite systems, infinite state graphs can be automatically converted to small finite state graphs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/724/CSL-TR-97-724.pdf %R CSL-TR-97-712 %Z Wed, 09 Jul 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hive: Operating System Fault Containment for Shared-Memory Multiprocessors %A Chapin, John %D July 1997 %X Reliability and scalability are major concerns when designing general-purpose operating systems for large-scale shared-memory multiprocessors. This dissertation describes Hive, an operating system with a novel kernel architecture that addresses these issues. Hive is structured as an internal distributed system of independent kernels called cells. This architecture improves reliability because a hardware or software error damages only one cell rather than the whole system. The architecture improves scalability because few kernel resources are shared by processes running on different cells. The Hive prototype is a complete implementation of UNIX SVR4 and is targeted to run on the Stanford FLASH multiprocessor. The research described in the dissertation makes three primary contributions: (1) it demonstrates that distributed system mechanisms can be used to provide fault containment inside a shared- memory multiprocessor; (2) it provides a specification for a set of hardware features, implemented in the Stanford FLASH, that are sufficient to support fault containment; and (3) it demonstrates how to take advantage of shared-memory hardware across cell boundaries at both application and kernel levels while preserving fault containment. The dissertation also analyzes the architectural and performance tradeoffs of multicellular kernels. Fault injection experiments conducted using the SimOS machine simulator demonstrate the reliability of the Hive prototype. Studies using both general-purpose and scientific workloads illustrate the performance tradeoffs of the multicellular kernel architecture. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/712/CSL-TR-97-712.pdf %R CSL-TR-97-730 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance Isolation and Resource Sharing on Shared-Memory Multiprocessors %A Verghese, Ben %A Gupta, Anoop %A Rosenblum, Mendel %D July 1997 %X Shared-memory multiprocessors are attractive as general-purpose compute servers. On the software side, they present programmers with the same programming paradigm as uniprocessors, and they can run unmodified uniprocessor binaries. On the hardware side, the tight coupling of multiple processors, memory, and I/O enables efficient fine-grain sharing of resources on these systems. This fine-grain sharing is important in compute servers because it allows idle resources to be easily utilized by active jobs leading to better system throughput. However, current SMP operating systems do not provide an important feature that users of workstations enjoy, namely the lack of interference from the jobs of unrelated users. We show that this lack of isolation is caused by the resource allocation model carried over from single-user workstations, which is inappropriate for multi-user multiprocessor systems. We propose "performance isolation", a new resource allocation model for multi-user multiprocessor compute servers. This model allows the isolation of the performance of groups of processes from the load on the rest of the system, provides performance comparable to a smaller system that corresponds to the resources used, and allows the sharing of idle resources for throughput comparable to a SMP OS. We implement the performance isolation model in the IRIX5.3 operating system for three important system resources: CPU time, memory, and disk bandwidth. Our implementation of fairness for disk bandwidth is novel. Running a number of workloads we show that this model is very successful at providing workstation-like latencies under heavy load and SMP-like latencies under light load. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/730/CSL-TR-97-730.pdf %R CSL-TR-97-729 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Remote Memory Access in Workstation Clusters %A Verghese, Ben %A Rosenblum, Mendel %D July 1997 %X Efficient sharing of memory resources in a cluster of workstations has the promise of greatly improving the performance and cost-effectiveness of the cluster when running large memory- intensive jobs. A point of interest is the hardware support required for good memory sharing performance. We evaluate the performance of two models: the software-only model that runs on a traditional distributed system configuration, and requires support from the operating system to access remote memory; and the hardware-intensive model that uses a specialized network interface to extend the memory system to allow direct access to remote memory. Using SimOS, we do a fair comparison of the performance of the two memory-sharing models for a set of interesting compute-server workloads. We find that the software-only model, with current remote page-fault latencies, does not provide acceptable memory-sharing performance. The hardware shared-memory system is able to provide stable performance across a range of latencies. If the remote page-fault latency can be reduced to 100 microseconds, the performance of the software- only model becomes acceptable for many, though not all, workloads. Considering the interconnection bandwidth required to sustain the software-only page-level memory sharing, our experiments show that a gigabit network is necessary for good performance. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/729/CSL-TR-97-729.pdf %R CSL-TR-97-725 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Designing reliable programs with RAPIDE %A Madhav, Neel %A Luckham, David C. %D April 1997 %X Rapide is a language for prototyping large, distributed systems. Rapide allows the scale design of a system to be constructed and analyzed before resources are applied to the construction of the actual system. Two important facets of designing reliable systems are (1) system architecture -- the components in the system and the communication paths between the componnts, and (2) system behavior -- the requirements on the components and the communication. Rapide facilitates the design of system architecture and behavior by (1) providing language features to realize system designs, (2) providing an expressive model for capturing the execution behavior of systems, and (3) providing techniques and tools for analyzing system execution behavior. This paper introduces the essential concepts of Rapide and gives an example of system design using Rapide. Rapide has 4 sublanguages -- (1) a type language, (2) an architecture definition language, (3) a constraint language and (4) an executable language. The paper introduces the Rapide architecture sublanguage and the Rapide constraint sublanguage. The Rapide model of system execution is a set of significant events partially ordered by causality (also called posets). This paper discusses Rapide execution models and compares them with totally ordered event based models. Rapide provides tools to check constraints on posets to browse posets and to animate events on a system architecture. This paper briefly discusses the Rapide analysis tools. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/725/CSL-TR-97-725.pdf %R CSL-TR-97-715 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Software and Hardware for Exploiting Speculative Parallelism with a Multiprocessor %A Oplinger, Jeffrey %A Heine, David %A Liao, Shih-Wei %A Nayfeh, Basem A. %A Lam, Monica S. %A Olukotun, Kunle %D February 1997 %X Thread-level speculation (TLS) makes it possible to parallelize general purpose C programs. This paper proposes software and hardware mechanisms that support speculative thread- level execution on a single-chip multiprocessor. A detailed analysis of programs using the TLS execution model shows a bound on the performance of a TLS machine that is promising. In particular, TLS makes it feasible to find speculative do across parallelism in outer loops that can greatly improve the performance of general-purpose applications. Exploiting speculative thread-level parallelism on a multiprocessor requires the compiler to determine where to speculate, and to generate SPMD (single program multiple data) code.We have developed a fully automatic compiler system that uses profile information to determine the best loops to execute speculatively, and to generate the synchronization code that improves the performance of speculative execution. The hardware mechanisms required to support speculation are simple extensions to the cache hierarchy of a single chip multiprocessor. We show that with our proposed mechanisms, thread-level speculation provides significant performance benefits. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/715/CSL-TR-97-715.pdf %R CSL-TR-97-728 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Defining a Security Reference Architecture %A Meldal, Sigurd %A Luckham, David %D June 1997 %X This report discusses the definition and modeling of reference architectures that specify the security aspects of distributed systems. NSA's MISSI (Multilevel Information System Security Initiative) security reference architecture is used as an illustrative example. We show how one would define such a reference architecture, and how one could use such a definition to model as well as check implementations for compliance with the reference. We demonstrate that an ADL should have not only the capability to specify interfaces, connections and operational constraints, but also to specify how it is related to other architectures or to implementations. A reference architecture such as MISSI is defined in Rapide [10] as a set of hierarchical interface connection architectures [9]. Each Rapide interface connection architecture is a reference architecture - an abstract architecture that allows a number of different implementations, but which enforces common structure and communication rules. The hierarchical reference architecture defines the MISSI policies at different levels - at the level of enclaves communicating through a network, at the level of each enclave being a local area network with firewalls and workstations and at the level of the individual workstations. The reference architecture defines standard components, communication patterns and policies common to MISSI compliant networks of computer systems. A network of computers may be checked for conformance against the reference architecture. The report also shows how one can generate architecture scenarios of networks of communicating computers. The scenarios are constructed as Rapide executable models, and the behaviors of the models can be checked for conformance with the reference architecture in these scenarios. The executable models demonstrate how the structure and security policies in the reference architecture may apply to networks of computers. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/728/CSL-TR-97-728.pdf %R CSL-TR-97-735 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Flexible Connectivity Management for Mobile Hosts %A Z hao, Xinhua %A Baker, Mary G. %D September 1997 %X Powerful light-weight portable computers, the availability of wireless networks, and the popularity of the Internet are driving the need for better networking support for mobile hosts. Users should be able to connect their portable computers to the Internet at any time and in any place, but the dynamic nature of such connectivity requires more flexible network management than has typically been available for stationary workstations. This report proposes techniques to address a unique feature of connectivity management on mobile hosts: its multiplicity, i.e. the need to support multiple packet delivery methods simultaneously and to support the use of multiple network devices for both availability and efficiency reasons. We have developed a set of techniques in the context of mobile IP for flexible, automatic network connectivity management for mobile hosts. We augment the routing layer of the network protocol stack with a Mobile Policy Table (MPT) to support multiple packet delivery mechanisms for different simultaneous flows based on the nature of the traffic. We also devise a set of mechanisms, including a backwards-compatible extension to the routing table, to facilitate the use of multiple network devices. We include performance results showing some of the potential benefits such increased flexibility provides for mobile hosts. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/735/CSL-TR-97-735.pdf %R CSL-TR-97-731 %Z Tue, 30 Sep 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Single Chip Multiprocessor Integrated with High Density DRAM %A Yamauchi, Tadaaki %A Hammond, Lance %A Olukotun, and Kunle %D August 1997 %X A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing memory latency and improving memory bandwidth. In this paper we evaluate the performance of a single chip multiprocessor integrated with DRAM when the DRAM is organized as on-chip main memory and as on-chip cache. We compare the performance of this architecture with that of a more conventional chip which only has SRAM-based on-chip cache. The DRAM-based architecture with four processors outperforms the SRAM-based architecture on floating point applications which are effectively parallelized and have large working sets.This performance difference is significantly better than that possible in a uniprocessor DRAM-based architecture, which performs only slightly faster than an SRAM-based architecture on the same applications. In addition, on multiprogrammed workloads, in which independent processes are assigned to every processor in a single chip multiprocessor,the large bandwidth of on-chip DRAM can handle the inter-access contention better. These results demonstrate that a multiprocessor takes better advantage of the large bandwidt provided by the on-chip DRAM than a uniprocessor. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/731/CSL-TR-97-731.pdf %R CSL-TR-97-726 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T LOW-POWER PROCESSOR DESIGN %A Gonzalez, Ricardo E. %D June 1997 %X Power has become an important aspect in the design of general purpose processors. This thesis explores how design tradeoffs affect the power and performance of the processor. Scaling the technology is an attractive way to improve the energy efficiency of the processor. In a scaled technology a processor would dissipate less power for the same performance or higher performance for the same power. Some micro-architectural changes, such as pipelining and caching, can significantly improve efficiency. Unfortunately many other architectural tradeoffs leave efficiency unchanged. This is because a large fraction of the energy is dissipated in essential functions and is unaffected by the internal organization of the processor. Another attractive technique for reducing power dissipation is scaling the supply and threshold voltages. Unfortunately this makes the processor more sensitive to variations in process and operating conditions. Design margins must increase to guarantee operation, which reduces the efficiency of the processor. One way to shrink these design margins is to use feedback control to regulate the supply and threshold voltages thus reducing the design margins. Adaptive techniques can also be used to dynamically trade excess performance for lower power. This results in lower average power and therefore longer battery life. Improvements are limited, however, by the energy dissipation of the rest of the system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/726/CSL-TR-97-726.pdf %R CSL-TR-97-737 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Stochastic Congestion Model for VLSI Systems %A Hung, Patrick %A Flynn, Michael J. %D October 1997 %X Designing with deep submicron feature size presents new challenges in complexity, performance, and productivity. Information on routing congestion and interconnect area are critical in the pre-RTL stage in order to forecast the whole die size, define the timing specifications, and evaluate the chip power consumption. In this report, we propose a stochastic model for VLSI interconnect routing, which can be used to estimate the routing congestion and the interconnect area in the pre-RTL stage. First, we define the uniform and geometric routing distributions, and introduce a simple and efficient algorithm to calculate the routing probabilities. We then derive the routing probabilities among multiple functional blocks, and investigate the effects of routing obstacles. Finally, we map the chip to a Cartesian coordinate system, and model routability based on the supply and demand distributions of routing channels. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/737/CSL-TR-97-737.pdf %R CSL-TR-97-727 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Towards an Abstraction Hierarchy for CAETI Architectures, and Possible Applications %A Luckham, David %A Vera, James %A Belz, Frank %D April 1997 %X This document proposes a four level abstraction hierarchy for CAETI systems architectures for review and discussion by the CAETI community. Some possible applications are described briefly. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/727/CSL-TR-97-727.pdf %R CSL-TR-97-732 %Z Thu, 20 Nov 97 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Efficient Exception Handling Techniques for High-Performance Processor Architectures %A Rudd, Kevin W. %D October 1997 %X Providing precise exceptions has driven much of the complexity in modern processor designs. While this complexity is required to maintain the illusion of a processor based on a sequential architectural model, it also results in reduced performance during normal execution. The existing notion of precise exceptions is limited to processors based on a sequential architectural model and there have been few techniques developed that are applicable to processors that are not based on this model. Processors with exposed pipelines (typical of VLIW processors) do not conform to the sequential execution model. These processors have explicit overlaps in operation execution and thus cannot support the traditional notion of precise exceptions; most exception handling techniques for these processors require restrictive software scheduling. In this report, we generalize the notion of a precise exception and extend the applicability of precise exceptions to a wider range of architectures. We propose precise exception handling techniques that solve the problem of efficient exception handling for both sequential architectures as well as exposed pipeline architectures. We also show how these techniques can provide efficient support for speculative execution past multiple branches for both architectures as well as latency tolerance for exposed pipeline architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/732/CSL-TR-97-732.pdf %R CSL-TR-97-738 %Z Mon, 12 Jan 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T On the Speedup Required for Combined Input and Output Queued Switching %A Prabhakar, Balaji %A McKeown, Nick %D November 1997 %X Architectures based on a non-blocking fabric, such as a crosspoint switch, are attractive for use in high-speed LAN switches, ATM switches and IP routers. These fabrics, coupled with memory bandwidth limitations, dictate that queues be placed at the input of the switch. But it is well known that input-queueing can lead to low throughput, and does not allow the control of latency through the switch. This is in contrast to output-queueing, which maximizes throughput, and permits the accurate control of packet latency through scheduling. We ask the question: Can a switch with combined input and output queueing be designed to behave identically to an output-queued switch? In this paper, we prove that if the switch uses virtual output queueing, and has an internal speedup of just four, it is possible for it to behave identically to an output queued switch, regardless of the nature of the arriving traffic. Our proof is based on a novel scheduling algorithm, known as Most Urgent Cell First. This result makes possible switches that perform as if they were output-queued, yet use memories that run more slowly. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/738/CSL-TR-97-738.pdf %R CSL-TR-97-734 %Z Mon, 12 Jan 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Packet Switching Photonic Network Switch Design and Routing Algorithm %A Lee, Hyuk-Jun %A Morf, Martin %A Flynn, Michael %D December 1997 %X Maturity of photonic technology makes it possible to construct all optical network switch to avoid optical-to-electrical signal conversion for routing. To realize all optical packet switching, our current network topology and routing algorithms have to be reexamined and modified to satisfy the necessities of all optical network switching such as a fast routing decision, consideration of hardware implementation, buffering etc. In this paper, first, we will review various switching architectures including crossbar, Benes and Batcher/Banyan. Secondly, optical implementation of a multiple output port network switch will be presented. In many levels of networking from multiprocessor interconnection to wide area networking, multiple latencies resulting from this scheme could improve the overall performance when combined with smart routing schemes. Finally, we present a interpretation of multistage network using a symmetric group. A Cayley graph for a symmetric group and its coset graphs suggest an interesting alternative way to construct a new multistage interconnection network. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/734/CSL-TR-97-734.pdf %R CSL-TR-97-748 %Z Wed, 21 Jan 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Decision Diagrams and Pass Transistor Logic Synthesis %A Bertacco, V. %A Minato, S. %A Verplaetse, P. %A Benini, L. %A Micheli, and G. De %D December 1997 %X Since the relative importance of interconnections increases as feature size decreases, standard-cell based synthesis becomes less effective when deep-submicron technologies become available. Intra-cell connectivity can be decreased by the use of macro-cells. In this work we present methods for the automatic generation of macro-cells using pass transistors and domino logic. The synthesis of these cells is based on BDD and Z BDD representations of the logic functions. We address specific problems associated with the BDD approach (level degradation, long paths) and the Z BDD approach (sneak paths, charge sharing, long paths). We compare performance of the macro-cells approach versus the conventional standard-cell approach based on accurate electrical simulation. This shows that the macro-cells perform well up to a certain complexity of the logic function. Functions of high complexity must be decomposed into smaller logic blocks that can directly be mapped to macro-cells. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/748/CSL-TR-97-748.pdf %R CSL-TR-97-739 %Z Wed, 18 Feb 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hardware/Software Co-Design of Run-Time Schedulers for Real-Time Systems %A Mooney, Vincent John III %A Micheli, Giovanni De %D November 1997 %X We present the SERRA Run-Time Scheduler Synthesis and Analysis Tool which automatically generates a run-time scheduler from a heterogeneous system-level specification in both Verilog HDL and C. Part of the run-time scheduler is implemented in hardware, which allows the scheduler to be predictable in being able to meet hard real-time constraints, while part is implemented in software, thus supporting features typical of software schedulers. SERRA's real-time analysis generates a priority assignment for the software tasks in the mixed hardware-software system. The tasks in hardware and software have precedence constraints, resource constraints, relative timing constraints, and a rate constraint. A heuristic scheduling algorithm assigns the static priorities such that a hard real-time rate constraint can be predictably met. SERRA supports the specification of critical regions in software, thus providing the same functionality as semaphores. We describe the task control/data-flow extraction, synthesis of the control portion of the run-time scheduler in hardware, real-time analysis and priority scheduler template. We also show how our approach fits into an overall tool flow and target architecture. Finally, we conclude with a sample application of the novel run-time scheduler synthesis and analysis tool to a robotics design example. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/739/CSL-TR-97-739.pdf %R CSL-TR-97-745 %Z Tue, 03 Aug 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Selection of Recent Advances in Computer Systems %A Mencer, Oskar %A Flynn, Michael %D July 1999 %X This paper presents a selection of recent research results in computer systems. The roadmap for CMOS technology for the next ten years shows a theoretical limit of 0.1 um for the channel of a MOSFET transistor, reached by 2007. Mainstream processors are adapting to multimedia applications with subword parallel instructions like Intel's MMX or HP's MAX instruction set extensions. Coprocessors and embedded processors are moving towards VLIW in order to save hardware costs. The memory system of the future is going to be the next generation of Rambus/RDRAM. Finally, Custom Computing Machines based on Field Programmable Gate Arrays are on of the promising future technologies for computing -- offering very high performance for highly parallelizable and pipelinable applications. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/745/CSL-TR-97-745.pdf %R CSL-TR-97-744 %Z Tue, 03 Mar 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The FLASH Multiprocessor: Designing a Flexible and Scalable System %A Kuskin, Jeffrey Scott %D November 1997 %X The choice of a communication paradigm, or protocol, is central to the design of a large-scale multiprocessor system. Unlike traditional multiprocessors, the FLASH machine uses a programmable node controller, called MAGIC, to implement all protocol processing. The architecture of the MAGIC chip allows FLASH to support multiple communication paradigms - in particular, cache-coherent shared memory and high-performance message passing - while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine's global memory, a port to the interconnection network, an I/O interface, and MAGIC, the custom node controller. The MAGIC chip handles all communication both within the node and among nodes, using hardwired data paths for efficient data movement and a programmable processor optimized for executing protocol operations. The result is a system that is flexible and scalable, yet competitive in performance with a traditional multiprocessor that implements a single communication paradigm completely in hardware. The focus of this dissertation is the architecture, design, and performance of FLASH. Much of the motivation behind the FLASH system and the MAGIC node controller design stems from an examination of the characteristics of protocol code and the architecture of the DASH system, the predecessor to FLASH. This examination led to two major design goals: development of a node controller architecture that can attain high protocol processing performance while still maintaining flexibility and a need to reduce the logic and memory overheads associated with cache coherence. The MAGIC design achieves these goals by implementing on a single chip a programmable protocol engine with an instruction set optimized for the characteristics of protocol code, along with dedicated support logic to alleviate the most serious protocol processing performance bottlenecks - data movement, message dispatch, and lack of close coupling to the node board components. The design of the FLASH node complements the MAGIC design, matching the close coupling and high bandwidth support in MAGIC to provide a balanced node architecture. Next, the dissertation investigates the performance of cache-coherence on FLASH. Performance results are presented from microbenchmarks run on the Verilog RTL of the MAGIC chip and from complete applications run on FlashLite, the FLASH system-level simulator. The microbenchmarks demonstrate that the architectural extensions added to the MAGIC design - particularly the instruction set optimizations to the programmable protocol processor - yield significantly lower latencies and protocol processor occupancies to service the most common types of memory operations. The application results are used to evaluate the performance costs of flexibility by comparing the performance of FLASH to that of a hardwired machine on representative parallel applications and multiprogramming workloads. These results show that poor application memory reference or load balancing characteristics cause the performance of the FLASH system to degrade more rapidly than the performance of the hardwired system; that is, FLASH's performance is less robust. For applications that incur a large number of remote misses or exhibit substantial hot-spotting, the increased remote access latencies or the occupancy of MAGIC lead to lower performance for the flexible design. Overall, however, the performance of FLASH can be competitive with the performance of the hardwired machine. Specifically, for a range of optimized parallel applications, the performance differences between the hardwired machine and FLASH are small, typically less than 10% at 32 processors and less than 15% at 64 processors. For these programs, either the processor cache miss rates are small or the latency of the programmable protocol processing can be hidden behind the memory access time. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/744/CSL-TR-97-744.pdf %R CSL-TR-97-733 %Z Thu, 05 Mar 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T New Methods for Surface Reconstruction from Range Images %A Curless, Brian Lee %D June 1997 %X The digitization and reconstruction of 3D shapes has numerous applications in areas that include manufacturing, virtual simulation, science, medicine, and consumer marketing. In this thesis, we address the problem of acquiring accurate range data through optical triangulation, and we present a method for reconstructing surfaces from sets of data known as range images. The standard methods for extracting range data from optical triangulation scanners are accurate only for planar objects of uniform reflectance. Using these methods, curved surfaces, discontinuous surfaces, and surfaces of varying reflectance cause systematic distortions of the range data. We present a new ranging method based on analysis of the time evolution of the structured light reflections. Using this spacetime analysis, we can correct for each of these artifacts, thereby attaining significantly higher accuracy using existing technology. When using coherent illumination such as lasers, however, we show that laser speckle places a fundamental limit on accuracy for both traditional and spacetime triangulation. The range data acquired by 3D digitizers such as optical triangulation scanners commonly consists of depths sampled on a regular grid, a sample set known as a range image. A number of techniques have been developed for reconstructing surfaces by integrating groups of aligned range images. A desirable set of properties for such algorithms includes: incremental updating, representation of directional uncertainty, the ability to fill gaps in the reconstruction, and robustness in the presence of outliers and distortions. Prior algorithms possess subsets of these properties. In this thesis, we present an efficient volumetric method for merging range images that possesses all of these properties. Using this method, we are able to merge a large number of range images (as many as 70) yielding seamless, high-detail models of up to 2.6 million triangles. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/733/CSL-TR-97-733.pdf %R CSL-TR-97-716 %Z Tue, 10 Mar 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Checking Experiments for Scan Chain Lathes and Flip-FLops %A Makar, Samy %D August 1997 %X New digital designs often include scan chains; high quality economical test is the reason. A scan chain allows easy access to internal combinational logic by converting bistable elements, latches and flip-flops, into a shift register. Test patterns are scanned in, applied to the internal circuitry, and the results are scanned out for comparison. While many techniques exist for testing the combinational circuitry, little attention has been paid to testing the bistable elements themselves. The bistable elements are typically tested by shifting in a sequence of zeroes and ones. This test can miss many defects inside the bistable elements. A checking experiment is a sequence of inputs and outputs that contains enough information to extract the functionality of the circuit. A new approach, based on such sequences, can significantly reduce the number of defects missed. Simulation results show that as many as 20 percent of the faults in bistable elements can be missed by typical tests; essentially all of these missed faults are detected by checking experiments. Since the checking experiment is a functional test, it is independent of the implementation of the bistable element. This is especially useful since designers often use different implementations of bistable elements to optimize their circuits for area and performance. Another benefit of a functional test is that it avoids the need for generating test patterns at the transistor level. Applying a complete checking experiment to a bistable element embedded inside a circuit can be very difficult, if not impossible. The new approach breaks up the checking experiment into a set of small sub-sequences. For each of these sub-sequences a test pattern is generated. These test patterns are scanned in, as in the case of the tests for combinational logic, appropriate changes to the control inputs of the bistable elements are applied, and the results are scanned out. The process of generating the patterns is automated by modifying an existing stuck-at test generator. A designer or test engineer need only provide a gate level description of the circuit to generate tests that guarantee a checking experiment for each bistable element in the design. Test size is an important economic factor in circuit design. The size of the checking-experiment-based test increases with circuit size at about the same rate as the traditional test, indicating that it is practical for large circuits. Checking-experiment-based tests are an effective economic means for testing the bistable elements in scan chain designs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/97/716/CSL-TR-97-716.pdf %R CSL-TR-98-751 %Z Wed, 18 Feb 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Vitis Propulsion: Theory and Practice %A Baker, Mary %A Honig, Sue %A Kercheval, Berry %A Seltzer, Margo %D February 1998 %X We have proof that red grapes scoot around more than green grapes when microwaved. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/751/CSL-TR-98-751.pdf %R CSL-TR-98-749 %Z Mon, 09 Mar 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture %A Hammond, Lance %A Olukotun, Kunle %D February 1998 %X As more transistors are integrated onto larger dies, single-chip multiprocessors integrated with large amounts of cache memory will soon become a feasible alternative to the large, monolithic uniprocessors that dominate today's microprocessor marketplace. Hydra offers a promising way to build a small-scale MP-on-a-chip using a fairly simple design that still maintains excellent performance on a wide variety of applications. This report examines key parts of the Hydra design -- the memory hierarchy, the on-chip buses, and the control and arbitration mechanisms -- and explains the rationale for some of the decisions made in the course of finalizing the design of this memory system, with particular emphasis given to applications that stress the memory system with numerous memory accesses. With the balance between complexity and performance that we obtain, we feel Hydra offers a promising model for future MP-on-a-chip designs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/749/CSL-TR-98-749.pdf %R CSL-TR-98-758 %Z Fri, 26 Feb 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Matching Output Queueing with a Combined Input Output Queued Switch %A Chuang, Shang-Tse %A Goel, Ashish %A McKeown, Nick %A Prabhakar, Balaji %D April 1998 %X The Internet is facing two problems simultaneously: we need a faster switching/routing infrastructure, and we need to introduce guaranteed qualities of service (QoS). As a community, we have solutions to each: we can make the routers faster by using input-queued crossbard, instead of shared memory systems; and we can introduce QoS using WFQ-based packet scheduling. But we don't know how to do both at the same time. Until now, the two solutions have been mutually exclusive - all of the work on WFQ- based scheduling algorithms has required that switches/routers use output- queueing, or centralized shared memory. We demonstrate that a Combined Input Output Queueing (CIOQ) switch running twice as fast as an input-queued switch can provide precise emulation of a broad class of packet scheduling algorithms, including WFQ and strict priorities. More precisely, we show that a "speedup" of 2 - 1/N is both necessary and sufficient for this precise emulation. We introduce a variety of algorithms that configure the crossbar so that emulation is achieved with a speedup of two, and consider their running time and implementation complexity. We believe that, in the future, these results will make possible the support of QoS in very high bandwidth routers. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/758/CSL-TR-98-758.pdf %R CSL-TR-98-753 %Z Fri, 26 Feb 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Resource Management Issues for Shared-Memory Multiprocessors %A Verghese, Ben %D March 1998 %X Shared-memory multiprocessors (SMPs) are attractive as general-purpose compute servers. On the software side, they present the same programming paradigm as uniprocessors, and they can run unmodified uniprocessor binaries. On the hardware side, the tight coupling of multiple processors, memory, and I/O provides enormous computing power in a single system, and enables the efficient sharing of these resources. As a compute server, this power can be exploited both by a collection of uniprocessor programs and by explicitly or automatically parallelized applications. This thesis addresses two important performance-related issues encountered in such systems, performance isolation and data locality. The solutions presented in this dissertation address these issues through careful resource management in the operating system. Current shared-memory multiprocessor operating systems provide very few controls for sharing the resources of the system among the active tasks or users. This is a serious limitation for a compute server that is to be used for multiple tasks or by multiple users. The current unconstrained sharing scheme allows the load placed by one user or task to adversely affect the performance seen by another. We show that this lack of isolation is caused by the resource allocation scheme (or lack thereof) carried over from single-user workstations. Multi-user multiprocessor systems require more sophisticated resource management, and we propose "performance isolation", a new resource management scheme for such systems. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/753/CSL-TR-98-753.pdf %R CSL-TR-98-759 %Z Thu, 23 Apr 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine %A Heinlein, John %D March 1998 %X In recent years, multiprocessor designs have converged towards a unified hardware architecture despite supporting different communication abstractions. The implementation of these communication abstractions and the associated protocols in hardware is complex, inflexible, and error prone. For these reasons, some recent designs have employed a programmable controller to manage system communication. One particular focus of these designs is implementing cache coherence protocols in software. This dissertation argues that a programmable communication controller that provides cache coherence can also effectively support block transfer and synchronization protocols. This research is part of the FLASH project, a major focus of which is exploring the integration of multiple communication protocols in a single multiprocessor architecture. In our analysis, we examine the needs of protocols other than cache coherence to identify the requirements they share. The interface between the processor and controller is one critical issue in these protocols, so we propose techniques to export such protocols reliably, at low overhead, and without system calls. Unlike most prior studies, our approach supports a modern operating system with features like multiprogramming, protection, and virtual memory. Our study focuses in detail on two classes of communication that are important for large scale multiprocessors: block transfer and synchronization using locks and barriers. In particular, we attempt to improve the performance of these classes of communication as compared to implementations using only software on top of shared memory. For each protocol we identify the critical metrics of performance, explore the limitations of existing techniques, then present our implementation, which is tailored to leverage the programmable communication controller. We evaluate each protocol in isolation, in the context of microbenchmarks, and within a variety of applications. We find that embedding advanced communication and synchronization features in a programmable controller has a number of advantages. For example, the block transfer protocol improves transfer performance in some cases, enables the processor to perform other work in parallel, and reduces processor cache pollution caused by the transfer. The synchronization protocols reduce overhead and eliminate bottlenecks associated with synchronization primitives implemented using software on top of shared memory. Simulations of scientific applications running on FLASH show that, in many cases, synchronization support improves performance and increases the range of machine sizes over which the applications scale. Our study shows that embedded programmability is a convenient approach for supporting block transfer and synchronization, and that the FLASH system design effectively supports this approach. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/759/CSL-TR-98-759.pdf %R CSL-TR-98-760 %Z Mon, 25 May 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High Performance Inter-Chip Signalling %A Sidiropoulos, Stefanos %D April, 1998 %X The achievable off-chip bandwidth of digital IC's is a crucial and often limiting factor in the performance of digital systems. In intra-system interfaces where both latency and bandwidth are important, source-synchronous parallel channels have been adopted as the most effective solution. This work investigates receiver and clocking circuit design techniques for increasing the signalling rate and robustness of such channels. One of the main problems arising in the reception of high speed signals is the adverse effects of high frequency noise. To alleviate these effects, a new class of receiver structures that utilize current integration is proposed. The integration of current on a capacitor based on the incoming signal polarity effectively averages the signal over its valid time period, therefore filtering out high frequency noise. An experimental transceiver prototype utilizing current integrating receivers was designed and fabricated in a 0.8 (Mu)m CMOS technology. The prototype achieves a signaling rate of 740 Mbps/pin operating from a 3.3-V supply with a bit error rate of less than 10 (SUP -14). The second major challenge of inter-chip communication is the design of clock generation and synchronization circuits. Delay locked loops are an attractive alternative to VCO-based phase locked loops due to their simpler design, intrinsic stability, and absence of phase error accumulation. One of their main problems however is their limited phase capture range. A dual loop architecture that eliminates this problem is proposed. This architecture employs a core loop to generate finely spaced clock edges, which are then used by a peripheral loop to generate the output clock through phase interpolation. Due to its digital control, the dual loop can offer great flexibility in the implementation of phase acquisition algorithms. A dual DLL prototype was fabricated in a 0.8 (Mu)m CMOS technology. The prototype achieves 80KHz-400MHz operating range, 12-ps rms jitter and 0.4-ps/mV jitter supply sensitivity. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/760/CSL-TR-98-760.pdf %R CSL-TR-98-755 %Z Fri, 06 Aug 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T ABSS v2.0: a SPARC Simulator %A Sunada, Dwight %A Glasco, David %A Flynn, Michael %D April 1998 %X This paper describes various aspects of the augmentation-based SPARC simulator (ABSS). We discuss (1) the problems that we solved in porting AugMINT to the SPARC platform to create ABSS, (2) the major sections of ABSS, and (3) the limitations of ABSS. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/755/CSL-TR-98-755.pdf %R CSL-TR-98-762 %Z Thu, 11 Jun 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hardware/Software Co-Design of Run-Time Systems %A Mooney, Vincent John III %D June 1998 %X Trends in system-level design show a clear move towards core-based design, where processors, controllers and other proprietary cores are reused and constitute essential building blocks. Thus, areas such as embedded system design and system-on-a-chip design are changing dramatically, requiring new design methodologies and Computer-Aided Design (CAD) tools. This thesis presents a novel system-level scheduling methodology and CAD environment, the SERRA Run-Time Scheduler Synthesis and Analysis Tool. Unlike previous approaches to run-time scheduling, we split our run-time scheduler between hardware and software, as opposed to placing the scheduler all in one or the other. Thus, given an already partitioned input system specification in an HDL and a software language, SERRA automatically generates a run-time scheduler partly in hardware and partly in software, for a target architecture of a microprocessor core together with multiple hardware cores or modules. A heuristic scheduling algorithm solves for priorities of software tasks executing on a single microprocessor with a custom priority scheduler, interrupt service routine, and context switch code. Real-time analysis takes into account the split hardware/software implementation both of the scheduler and of the tasks. The scheduler supports standard requirements of both domains, such as relative timing constraints in hardware and semaphores in software. A designer who uses the SERRA CAD tool gains the advantage of efficient satisfaction of timing constraints for hardware/software systems within a framework that enables different hardware/software partitions to be quickly evaluated. Thus, a hardware/software partitioning tool could easily sit on top of SERRA, which would generate run-time systems for different hardware/software partitions chosen for evaluation. In addition, SERRA's more efficient design space exploration can improve time-to-market for a product. Finally, we present two case studies. First, we show a full analysis, synthesis, and simulation of a hardware/software implementation of a robotics control system for a PUMA arm. Second, we describe a sample prototype of the split run-time scheduler in an actual design, a force-feedback real-time Haptic robot. For this application, the hardware part of the scheduler was implemented on programmable logic communicating with software using a standard communication protocol. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/762/CSL-TR-98-762.pdf %R CSL-TR-98-752 %Z Tue, 21 Jul 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Smoothness of Stationary Subdivision on Irregular Meshes %A Z orin, Denis %D January 1998 %X We derive necessary and sufficient conditions for tangent plane and C(superscript k)-continuity of stationary subdivision schemes near extraordinary vertices. Our criteria generalize most previously known conditions. We introduce a new approach to analysis of subdivision surfaces based on the idea of the universal surface. Any subdivision surface can be locally represented as a projection of the universal surface, which is uniquely defined by the subdivision scheme. This approach provides us with a more intuitive geometric understanding of subdivision near extraordinary vertices. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/752/CSL-TR-98-752.pdf %R CSL-TR-98-764 %Z Tue, 21 Jul 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Method for Analysis of C(superscript 1)-Continuity of Subdivision Surfaces %A Z orin, Denis %D May 1998 %X A sufficient condition for C(superscript 1)-continuity of subdivision surfaces was proposed by Rief [17] and extended to a more general setting in [22]. In both cases, the analysis of C(superscript 1)-continuity is reduced to establishing injectivity and regularity of a characteristic map. In all known proofs of C(superscript 1)- continuity, explicit representation of the limit surface on an annular region was used to establish injectivity. We propose a new approach to this problem: we show that for a general class of subdivision schemes, regularity can be inferred from the properties of a sufficiently close linear approximation, and injectivity can be verified by computing the index of a curve. An additional advantage of our approach is that it allows us to prove C(superscript 1)-continuity for all valences of vertices, rather than for an arbitrarily large, but finite number of valences. As an application, we use our method to analyze C(superscript 1)-continuity of most stationary subdivision schemes known to us, including interpolating Butterfly and Modified Butterfly schemes, as well as the Kobbelt's interpolating scheme for quadrilateral meshes. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/764/CSL-TR-98-764.pdf %R CSL-TR-98-761 %Z Wed, 12 Aug 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Output Encoding Problem and A Solution Technique %A Mitra, Subhasish %A LaNae, J. Avra %A McCluskey, Edward J. %D November 1997 %X We present a new output encoding problem as follows: Given a specification table, such as a truth table or a finite state machine state table, where some of the outputs are specified in terms of 1s, 0s and don't cares, and others are specified symbolically, determine a binary code for each symbol of the symbolically specified output column such that the total number of output functions to be implemented after encoding the symbolic outputs and compacting the output columns is minimum. There are several applications of this output encoding problem, one of which is to reduce the area overhead while implementing scan or pseudo-random BIST in a circuit with one-hot signals. This algorithm can also be used as a pre-processing step during FSM state encoding. In this paper, we develop an exact algorithm to solve the above problem, prove its correctness, analyze the worst case time complexity of the algorithm and present experimental data to validate the claim that our encoding strategy helps to reduce the area of a synthesized circuit. In addition, we have investigated the possibility of using elementary gates to facilitate further merging of the output functions generated by the encoding bits with the output functions generated by the elementary gates. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/761/CSL-TR-98-761.pdf %R CSL-TR-98-768 %Z Thu, 13 Aug 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T EJAVA - Causal Extensions for Java %A Santoro, Alexandre %A Mann, Walter %A Madhav, Neel %A Luckham, David %D August 1998 %X Programming languages like Jave provide designers with a variety of classes that simplify the process of building multithreaded programs. Though useful, especially in the creation of reactive systems, multithreaded programs present challenging problems such as race conditions and synchronization issues. Validating these programs against a specification is also not trivial since Java does not clearly indicate thread interaction. These problems can be solved by modifying Java so that it produces computations, collections of events with both causal and temporal ordering relations defined for them. Specifically, the causal ordering is ideal for identifying thread interaction. This paper presents eJava, an extension to Java that is both event based and causally aware, and shows how it simplifies the process of understanding and debugging multithreaded programs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/768/CSL-TR-98-768.pdf %R CSL-TR-98-756 %Z Thu, 13 Aug 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Hardware-assisted Algorithms for Checkpoints %A Sunada, Dwight %A Glasco, David %A Flynn, Michael %D July 1998 %X We can classify the algorithms for establishing checkpoints on distributed-shared-memory multiprocessors DSMMs into 3 broad classes: tightly synchronized method TSM loosely synchronized method LSM, unsynchronized method USM. TSM-type algorithms force the immediate establishment of a checkpoint whenever a dependency between 2 processors arises. LSM-type algorithms record this dependency and, hence, do not require the immediate establishment of a checkpoint if a dependency does arise; when a processor chooses to establish a checkpoint, the processor will query the dependency records to determine other processors that must also establish a checkpoint. USM-type algorithms allow a processor to establish a checkpoint without regard to any other processor. Within this framework, we developed 4 hardware-based algorithms: distributed recoverable shared memory (DRSM), DRSM for communication checkpoints (DRSM-C), DRSM with a hybrid method (DRSM-H), and DRSM with logs (DRSM-L). DRSM-C is a TSM-type algorithm, and DRSM and DRSM-H are LSM-type algorithms. DRSM-L is a USM-type algorithm and is the first of its kind for a tightly-coupled DSMM where hardware in the form of a directory maintains cache coherence. We find that DRSM has the best performance in terms of minimizing the impact of establishing checkpoints (or logs) on the running applications, but DRSM along with DRSM-C has the most expensive hardware requirements. DRSM-L has the second best performance but has the least expensive hardware requirement. We conclude that DRSM-L is the best algorithm in terms of cost and performance. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/756/CSL-TR-98-756.pdf %R CSL-TR-98-769 %Z Mon, 24 Aug 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Resource Discovery in Ad hoc Networks %A Tang, Diane %A Chang, Chih-Yuan %A Tanaka, Kei %A Baker, Mary %D August 1998 %X Much of the current research in mobile networking investigates how to support a mobile user within an established infrastructure of routers and servers. Ad hoc networks come into play when no such established infrastructure exits. This paper presents a two-stage protocol to solve the resource discovery problem in ad hoc networks: how hosts discover what resources are available in the network and how they discover how to use the resources. This protocol does not require any established servers or other infrastructure. It only requires routing capabilities in the network. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/769/CSL-TR-98-769.pdf %R CSL-TR-98-772 %Z Mon, 23 Nov 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Designing a Partitionable Multiplier %A Lee, Hyuk-Jun %A Flynn, Michael %D October 1998 %X This report presents the design of a 64-bit integer multiplier core that can perform 32-bit or 16-bit parallel integer multiplications(PMUL) and 32-bit or 16-bit parallel integer multiplications followed by additions(PMADD). The proposed multiplier removes sign and constant bits from its core and projects them to the boundaries to minimize the complexity of base cells. It also adopts an array-of-arrays architecture with unequal array sizes by decoupling partial product generation from carry save addition. This makes it possible to achieve high speed for 64-bit multiplication. Two architectures, which are done in dual-rail domino, are tested for functionality in Verilog and simulated in HSPICE for TSMC 0.35um process. The first architecture is capable of both PMUL and PMADD. The estimated delay is 4.9 ns (excluding a final adder) at 3.3V supply and 25c and its estimated area is 6.5 mm*2. The estimated delay of the second architecture, only capable of PMUL, is 4.5 ns. Its estimated area is 5.2 mm*2. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/772/CSL-TR-98-772.pdf %R CSL-TR-98-773 %Z Wed, 16 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Scalable Services for Video-On-Demand %A Chan, Shuen-Han Gary %A Tobagi, Fouad A. %D December 1998 %X Video-on-demand (VOD) refers to video services in which users can request any video program from a server at any time. VOD has important applications in entertainment, education, information, and advertising, such as movie-on-demand, distance learning, home shopping, interactive news, etc. In order to provide VOD services accommodating a large number of video titles and concurrent users, a VOD system has to be scalable -- scalable in storage and scalable in streaming capacity. Our goal is to design such a system with low cost, low complexity, and offering high level of service quality (in terms of, for example, user delay experienced or user loss rate). Storage scalability is achieved by using a hierarchical storage system, in which video files are stored in tertiary libraries or jukeboxes and transferred to a secondary level (of magnetic or optical disks) for display. We address the design of such a system by specifying the required architectural parameters (the bandwidth and storage capacity in each level) and operational procedures (such as request scheduling and file replacement schemes) in order to meet certain performance goals. Scalability in streaming capacity can be achieved by means of request batching, in which requests for a video arriving within a period of time are grouped together (i.e., "batched") and served with a single multicast stream. The goal here is to achieve the trade-off between the multicasting cost and user delay in the system. We study a number of batching schemes (in terms of user delay experienced, the number of users collected in each batch, etc.), and how system profit can be maximized given user's reneging behaviour. Both storage and streaming scalabilities can be achieved with a distributed servers architecture, in which video files are accessed from servers distributed in a network. We examine a number of caching schemes in terms of their requirements in storage and streaming bandwidth. Given certain cost functions in storage and streaming, we address when and how much a video file should be cached in order to minimize the system cost. We show that a distributed servers architecture can achieve great cost savings while offering users low start-up delay. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/773/CSL-TR-98-773.pdf %R CSL-TR-98-775 %Z Tue, 22 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Design of High-Speed Serial Links in CMOS %A Yang, Chih-Kong Ken %D December 1998 %X Demand for bandwidth in serial links has been increasing as the communications industry demand higher quantity and quality of information. Whereas traditional gigabit per second links has been in bipolar or GaAs, this research aims to push the use of CMOS process technology in such links. Intrinsic gate speed limitations are overcome by parallelizing the data. The on-chip frequency is maintained at a fraction (1/16) of the off-chip data rate. Clocks with carefully controlled phases tapped from a local ring oscillator are driven to a bank of input samplers to convert the serial bit stream into parallel data. Similarly, the overlap of multiple-phased clocks are used to synchronize the multiplexing of the parallel data onto the transmission line. To perform clock/data recovery, data is further oversampled with finer phase separation and passed to digital logic. The digital logic operates upon the samples to detect transitions in the bit stream to track the bit boundaries. This tracking can operate at the cycle rate of the digital logic allowing robustness to systematic phase noise. The challenge lies in the capturing of the high frequency data stream and generating low jitter, accurately spaced clock edges. A test chip is built demonstrating the transmission and recovery of a 4.0-Gb/s bit streams with < 10 (minus superscript 14) bit-error rate using a 3x oversampled system in a 0.5-um MOSIS CMOS process. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/775/CSL-TR-98-775.pdf %R CSL-TR-98-774 %Z Wed, 30 Dec 98 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Fault-Tolerant Systems in A Space Environment: The CRC ARGOS Project %A Shirvani, Philip P. %A McCluskey, Edward J. %D December 1998 %X This report describes the ARGOS project at Stanford CRC. The primary goals of this project are to collect data on the errors that occur in digital integrated circuits in a space environment, to determine the tradeoffs between fault-avoidance and fault-tolerance, and to see if radiation hardening can be avoided by using fault tolerance techniques. Our experiments will be carried out on two processor boards on the ARGOS experimental satellite. One of the boards uses radiation-hardened components while the other uses only commercial off-the-shelf (COTS) parts. Programs and data can be uploaded to the boards during the mission. This capability allows us to evaluate different software fault-tolerance techniques. This report reviews various error detection techniques. Software techniques that do not require any special hardware are discussed. The framework of the software that we are developing for error data collection is presented. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/98/774/CSL-TR-98-774.pdf %R CSL-TR-99-776 %Z Thu, 11 Feb 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Novel Checkpointing Algorithm for Fault Tolerance on a Tightly-Coupled Multiprocessor %A Sunada, Dwight %A Glasco, David %A Flynn, Michael %D January 1999 %X The tightly-coupled multiprocessor (TCMP), where specialized hardware maintains the image of a single shared memory, offers the highest performance in a computer system. In order to deploy a TCMP in the commercial world, the TCMP must be fault tolerant. Researchers have designed various checkpointing algorithms to implement fault tolerance in a TCMP. To date, these algorithms fall into 2 principal classes, where processors can be checkpoint dependent on each other. We introduce a new apparatus and algorithm that represents a 3rd class of checkpointing scheme. Our algorithm is distributed recoverable shared memory with logs (DRSM-L) and is the first of its kind for TCMPs. DRSM-L has the desirable property that a processor can establish a checkpoint or roll back to the last checkpoint in a manner that is independent of any other processor. In this paper, we describe DRSM-L, show the optimal value of its principal design parameter, and present results indicating its performance under simulation. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/776/CSL-TR-99-776.pdf %R CSL-TR-99-777 %Z Thu, 11 Feb 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The Mobile People Architecture %A Appenzeller, Guido %A Lai, Kevin %A Maniatis, Petros %A Roussopoulos, Mema %A Swierk, Edward %A Z hao, Xinhua %A Baker, Mary %D January 1999 %X People are the outsiders in the current communications revolution. Computer hosts, pager terminals, and telephones are addressable entities throughout the Internet and telephony systems. Human beings, however, still need application-specific tricks to be identified, like email addresses, telephone numbers, and ICQ IDs. The key challenge today is to find people and communicate with them personally, as opposed to communicating merely with their possibly inaccessible machines---cell phones that are turned off, or PCs on faraway desktops. We introduce the Mobile People Architecture, designed to meet this challenge. The main goal of this effort is to put the person, rather than the devices that the person uses, at the endpoints of a communication session. This architecture introduces the concept of routing between people. To that effect, we define the Personal Proxy, which has a dual role: as a Tracking Agent, the proxy maintains the list of devices or applications through which a person is currently accessible; as a Dispatcher, the proxy directs communications and uses Application Drivers to massage communication bits into a format that the recipient can see immediately. It does all this while protecting the location privacy of the recipient from the message sender. Finally, we substantiate our architecture with ideas about a future prototype that allows the easy integration of new application protocols. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/777/CSL-TR-99-777.pdf %R CSL-TR-99-778 %Z Tue, 09 Mar 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Analysis of HTTP/1.1 Performance on a Wireless Network %A Cheng, Stephen %A Lai, Kevin %A Baker, Mary %D February 1999 %X We compare the performance of HTTP/1.0 and 1.1 on a high latency, low bandwidth wireless network. HTTP/1.0 is known to have low throughput and consume excessive network and server resources on today's graphics-intensive web pages. A high latency, low bandwidth network only magnifies these problems. HTTP/1.1 was developed to remedy these problems. We show that on a Ricochet wireless network, HTTP/1.1 doubles throughput over HTTP/1.0 and decreases the number of packets sent by 60%. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/778/CSL-TR-99-778.pdf %R CSL-TR-99-779 %Z Mon, 22 Mar 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T CHOKe - A simple approach for providing Quality of Service through stateless approximation of fair queueing %A Pan, Rong %A Prabhakar, Balaji %D March 1999 %X We consider the problem of providing a fair bandwidth allocation to each of n flows that share an outgoing link at a congested router. The buffer at the outgoing link is a simple FIFO, commonly shared by packets belonging to the n flows. We devise a simple packet dropping scheme, CHOKe, that discriminates against the flows which submit more packets/sec than is allowed by their fair share. By doing this, the scheme aims to approximate the fair queueing policy. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/779/CSL-TR-99-779.pdf %R CSL-TR-99-780 %Z Mon, 19 Apr 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Coarse Grain Carry Architecture for FPGA %A Lee, Hyuk-Jun %A Flynn, Michael %D February 1999 %X In this report we investigated several methods to improve the performance of FPGA for general purpose computing. In the early stage of this research we identified the fine grain size of current FPGA as the major performance bottleneck. To increase the grain size, we introduced coarse grain carry architecture that can increase the granularity of arithmetic operations including addition and multiplication. We used throughput density as a cost/performance metric to justify the benefit of the new architecture. We could achieve roughly up to 5 times larger throughput density for selected applications. Along with that we also introduced a dual-rail carry structure to improve the performance of a carry chain, which usually set the cycle time of a FPGA design. A carry select adder built from the dual-rail carry structure reduces the carry chain delay by a factor of two. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/780/CSL-TR-99-780.pdf %R CSL-TR-99-781 %Z Mon, 19 Apr 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T An Architecture for Distributed, Interactive, Multi-Stream, Multi-Participant Audio and Video %A Schmidt, Brian K. %D April 1999 %X Today's computer users are becoming increasingly sophisticated, demanding richer and fuller machine interfaces. This is evidenced by the fact that viewing and manipulating a single stream of full-size video along with its associated audio stream is becoming commonplace. However, multiple media streams will become a necessity to meet the increasing demands of future applications. An example which requires multiple media streams is an application that supports multi-viewpoint audio and video, which allows users to observe a remote scene from many different perspectives so that a sense of immersion is experienced. Although desktop audio and video open many exciting possibilities, their use in a computer environment only becomes interesting when computational resources are expended to manipulate them in an interactive manner. We feel that user interaction will also become increasingly complex. In addition, future applications will make significant demands on the network in terms of bandwidth, quality of service guarantee, latency, and connection management. Based on these trends we feel that an architecture designed to support future multimedia applications must provide support for several key features. The need for numerous media streams is clearly the next step forward in terms of creating a richer environment. Support for non-trivial, fine-grain interaction with the media data is another important requirement, and distributing the system across a network is imperative so that multiple participants can become involved. Finally, as a side effect of the network and multi-participant requirements, integral support for and use of multicast will be a prime architectural component. The goal of our work is to design and implement a complete system architecture capable of supporting applications with these requirements. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/781/CSL-TR-99-781.pdf %R CSL-TR-99-782 %Z Fri, 16 Apr 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T A Compiler for Creating Evolutionary Software and Application Experience %A Schmidt, Brian K. %A Lam, Monica S. %D April 1999 %X Recent studies have shown that significant amounts of value repetition occur in modern applications. Due to global initialized data, immediate values, address calculations, redundancy in external input, etc.; the same value is used at the same program point as much as 80% of the time. Naturally, attention has begun to focus on how compilers and specialized hardware can take advantage of this value locality. Unfortunately, there is significant overhead associated with dynamically recognizing predictable values and optimizing for them; and all too, this cost dramatically outweighs the benefits. There are various levels at which value locality can be observed and used for optimization, ranging from register value re-use to function memorization. We are concerned with predictability of program variable values across multiple runs of a given program. In this paper we present a complete system that automatically translates ordinary sequential programs into evolutionary software, software that evolves to improve its performance using execution information from previous runs. This concept can have a significant impact on software engineering, as it can be used to replace the manual performance tuning phase in the application development lifecycle. Not only does it alleviate the developer from a tedious and error-prone task, but it also has the important side effect of keeping applications free from obscure hand optimizations which muddle the code and make it difficult to maintain or port. This concept can also be used to produce efficient applications where static performance tuning is not adequate. Our system automatically identifies targets for program specializations and instruments the code to gather high-level profiling information. Upon completion, the program automatically re-compiles itself when the new profile information suggests that it is profitable. The programmer is completely unaware of this process, as the software tailors itself to its environment. We have demonstrated the utility of our system by using it to optimize graphics applications that are built upon a general-purpose graphics library. While much of this work is based on well-established techniques, this is the first practical system which takes advantage of predictability in a way such that the overhead does not overwhelm the benefit. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/782/CSL-TR-99-782.pdf %R CSL-TR-99-784 %Z Tue, 03 Aug 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T The M-log-Fraction Transform (MFT) for Computer Arithmetic %A Mencer, Oskar %A Flynn, Michael J. %A Morf, Martin %D July 1999 %X State-of-the-art continued fraction(CF) arithmetic enables us to compute rational functions so that input and output values are represented as simple continued fractions. The main problem of previous work is the conversion between simple continued fractions and binary numbers. The M-log-Fraction Transform(MFT), introduced in this work, enables us to instantly convert between binary numbers and M-log-Fractions. Conversion is related to the distance between the '1's of the binary number. Applying M-log-Fractions to continued fraction arithmetic algorithms reduces the complexity of the CF algorithm to shift-and-add structures, and more specifically, digit-serial arithmetic algorithms for (homographic) rational functions. We show two applications of the MFT: (1) a high radix rational arithmetic unit computing (ax+b)/(cx+d) in a shift-and-add structure. (2) the evaluation of rational approximations (or continued fraction approximations) in a multiplication-based structure. In (1) we obtain algebraic formulations of the entire computation, including the next-digit-selection function. For high radix operation, we can therefore partition the selection table into arithmetic blocks, making high radix implementations feasible. (2) overlaps the final division of a rational approximation with the multiply-add iterations. The MFT bridges the gap between continued fractions and the binary number representation, enabling the design of a new class of efficient rational arithmetic units and efficient evaluation of rational approximations. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/784/CSL-TR-99-784.pdf %R CSL-TR-99-785 %Z Fri, 06 Aug 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Checkpointing Apparatus and Algorithms for Fault-Tolerant Tightly-Coupled Multiprocessors %A Sunada, Dwight %D July 1999 %X The apparatus and algorithms for establishing checkpoints on a tightly-coupled multiprocessor (TCMP) fall naturally into three broad classes: tightly synchronized method, loosely synchronized method, and unsynchronized method. The algorithms in the class of the tightly synchronized method force the immediate establishment of a checkpoint whenever a dependency between two processors arises. The algorithms in the class of the loosely synchronized method record this dependency and, hence, do not require the immediate establishment of a checkpoint if a dependency does arise; when a processor chooses to establish a checkpoint, the processor will query the dependency records to determine other processors that must also establish a checkpoint. The algorithms in the class of the unsynchronized method allow a processor to establish a checkpoint without regard to any other processor. Within this framework, we develop four apparatus and algorithms: distributed recoverable shared memory (DRSM), DRSM for communication checkpoints (DRSM-C), DRSM with half of the memory (DRSM-H), and DRSM with logs (DRSM-L). DRSM-C is an algorithm in the class of the tightly synchronized method, and DRSM and DRSM-H are algorithms in the class of the loosely synchronized method. DRSM-L is an algorithm in the class of the unsynchronized method and is the first of its kind for a TCMP. DRSM-L has the best performance in terms of minimizing the impact of establishing checkpoints (or logs) on the running applications and has the least expensive hardware. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/785/CSL-TR-99-785.pdf %R CSL-TR-99-786 %Z Fri, 12 Nov 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T High-Speed Interconnect Schemes for a Pipelined FPGA %A Lee, Hyuk-Jun %A Flynn, Michael J. %D August 1999 %X This paper presents two high-speed interconnect schemes for a pipelined FPGA utilizing a locally synchronized postcharging technique. By avoiding a global synchronized clock, we reduce the power consumption significantly. Through postcharging the interconnect and overlapping the postcharging delay with the logic delay, we successfully hide the postcharge time. The long channel devices reduce the area penalty due to delay elements significantly. The timing simulation is done using Hspice for a TSMC 0.35 um and area is measured by drawing key elements in MAGIC and using the area model developed in [2]. The postcharge scheme shows a 30% delay reduction over the precharge scheme and up to 310% and 230% delay reductions over the conventiaonal NMOS pass transistor scheme and the tri-state buffer scheme. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/786/CSL-TR-99-786.pdf %R CSL-TR-99-788 %Z Fri, 12 Nov 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Managing Event Processing Networks %A Perrochon, Louis %A Kasriel, Stephane %A Luckham, David C. %D October 1999 %X The technical report presents Complex Event Processing, CEP is a fundamental new technology that will enable the next generation of middleware based distributed applications. CEP gains information on distributed systems and uses this knowledge for monitoring, failure analysis or prediction of activities. A very promising route in CEP research is that of Event Processing Networks, which is one of the main areas of research of the Program and Analysis Group at Stanford University. Event Processing Networks are one way of describing and building CEP, by successively filtering meaningful information and aggregating the corresponding events into higher levels of abstraction. This report describes in detail the foundations and aims of Complex Event Processing. Then we will introduce the concept of Event Processing. Finally, we will describe the architecture of the CEP system. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/788/CSL-TR-99-788.pdf %R CSL-TR-99-787 %Z Tue, 16 Nov 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T CHOKe - A stateless queue management scheme for approximating fair bandwidth allocation %A Pan, Rong %A Prabhakar, Balaji %A Psounis, Konstantios %D September 1999 %X We investigate the problem of providing a fair bandwidth allocation to each of n flows that share an outgoing link at a congested router. The buffer at the outgoing link is a simple FIFO, commonly shared by packets belonging to the n flows. We devise a simple packet dropping scheme, CHOKe, that discriminates against the flows which submit more packets/sec than is allowed by their fair share. By doing this, the scheme aims to approximate the fair queueing policy. Since it is stateless and easy to implement, CHOKe controls unresponsive or misbehaving flows with a minimum overhead. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/787/CSL-TR-99-787.pdf %R CSL-TR-99-789 %Z Tue, 16 Nov 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Flexible Use of Memory for Replication/Migration in Cache-Coherent DSM Multiprocessors %A Soundararajan, Vijayaraghavan %D November 1999 %X Shared-memory multiprocessors are being used increasingly as compute servers. These systems enable efficient usage of computing resources through the aggregation and tight coupling of CPUs, memory, and I/O. One popular design for such machines is a bus-based architecture. However, as processors get faster, the shared bus becomes a bandwidth bottleneck. CC-NUMA (Cache-Coherent with Non-Uniform Memory Access time) machines remove this architectural limitation and provide a scalable shared- memory architecture. One significant characteristic of the CC-NUMA architecture is that the latency to access remote data is considerably larger than the latency to accesslocal data. On such machines, good data locality can reduce memory stall time and is therefore critical for high performance. In this thesis we study the various options available to system designers to transparently decrease the fraction of data misses serviced remotely. This work is done in the context of the Stanford FLASH multiprocessor. We utilize the programmability of the FLASH memory controller to explore a number of techniques for improving data locality: base cache-coherence (CC); a Remote Access Cache (RAC), in which a portion of local memory is used to cache remotely-allocated data at cache-line granularity; a Cache-Only Memory Architecture (COMA-F), in which all of local memory is used as a cache under hardware control; and OS-assisted page migration/replication (MigRep), in which the operating system migrates or replicates pages according to observed cache miss patterns. We then propose a novel hybrid scheme, MIGRAC, that combines the benefits of RAC and MigRep. We evaluate complete implementations of these schemes on the same platform using compute-server workloads (including OS effects), thereby providing a more consistent and detailed evaluation than has been done before. We find that a simple RAC can improve performance significantly over CC (up to 64% gains). COMA-F improves locality but its additional complexity limits its gains versus CC (only 14% improvement). MigRep performs well (up to 33% gains) but does not handle fine-grain sharing as effectively as RAC or COMA-F. Finally, our MIGRAC approach performs well relative to RAC (up to 57% faster) and MigRep (up to 24% faster) and is robust. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/789/CSL-TR-99-789.pdf %R CSL-TR-99-783 %Z Mon, 29 Nov 99 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Optimum Instruction-level Parallelism (ILP) for Superscalar and VLIW Processors %A Hung, Patrick %A Flynn, Michael J. %D July 1999 %X Modern superscalar and VLIW processors fetch, decode, issue, execute, and retire multiple instructions per cycle. By taking advantage of instruction-level parallelism (ILP), processor performance can be improved substantially. However, increasing the level of ILP may eventually result in diminishing and negative returns due to control and data dependencies among subsequent instructions as well as resource conflicts within a processor. Moreover, the additional ILP complexity can have significant overload in cycle time and latency. This technical report uses a generic processor model to investigate the optimum level of ILP for superscalar and VLIW processors. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/99/783/CSL-TR-99-783.pdf %R CSL-TR-00-790 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Reciprocal Approximation Theory with Table Compensation %A Liddicoat, Albert A. %A Flynn, Michael J. %D January 2000 %X [Sch93] demonstrates the reuse of a multiplier partial product array (PPA) to approximate higher order functions such as the reciprocal, division, and square root. Schwarz generalizes this technique to any higher order function that can be expressed as A*B=C. Using this technique, the height of the PPA increases exponentially to increase the result precision. Schwarz added compensation terms within the PPA to reduce the worst case error. This work investigates the approximation theory of higher order functions without the bounds of multiplier reuse. Additional techniques are presented to increase the worst case precision for a fixed height PPA. A compensation table technique is presented in this work. This technique combines the approximation computation with a compensation table to produce a result with fixed precision. The area-time tradeoff for three design points is studied. Increasing the computation decreases the area needed to implement the function but also increases the latency. Finally, the applicability of this technique to the bipartite ROM reciprocal table is discussed. We expect that this technique can be applied to the bipartite ROM reciprocal table to significantly reduce the hardware area needed at a minimal increase in latency. In addition, this work focuses on hardware reconfigurability and the ability of the hardware unit to be used to perform multiple higher order functions efficiently. The PPA structure can be used to approximate several higher order functions that can be expressed as a multiply. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/00/790/CSL-TR-00-790.pdf %R CSL-TR-00-791 %Z Mon, 14 Feb 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Precision of Semi-Exact Redundant Continued Fraction Arithmetic for VLSI %A Mencer, Oskar %A Morf, Martin %A Flynn, Michael J. %D February 2000 %X Continued fractions (CFs) enable straightforward representation of elementary functions and rational approximations. We improve the positional algebraic algorithm, which computes homographic functions. The improved algorithm for the linear fractional transformation produces exact results, given regular continued fraction input. In case the input is in redundant continued fraction form, our improved linear algorithm increases the percentage of exact results with 12-bit state registers from 78% to 98%. The maximal error of non-exact results is improved. Indeed, by detecting a small number of cases, we can add a final correction step to improve the guaranteed accuracy of non-exact results. We refer to the fact that a few results may not be exact as "Semi-Exact" arithmetic. We detail the adjustments to the positional algebraic algorithm concerning register overflow, the virtual singularities that occur during the computation, and the errors due to non-regular, redundant CF inputs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/00/791/CSL-TR-00-791.pdf %R CSL-TR-00-792 %Z Tue, 15 Feb 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Performance of Data-Intensive Algorithms on FPGAs in an Object-oriented Programming Environment %A Mencer, Oskar %A Morf, Martin %A Flynn, Michael J. %D February 2000 %X Recently, we see academic and industrial efforts to combine traditional computing environments with reconfigurable logic. Each application, or part of an application, has an optimal implementation within the design space of microprocessors, reconfigurable logic, and hardwired VLSI circuits. Programmability, Performance, and Power are the major metrics that have to be taken into account when deciding between the available technologies. Performance advantages of FPGAs over processors for specific applications have been shown in previous research. We show the potential of current low-power FPGAs to outperform current state-of-the-art processors in Performance over Power by more than half an order of magnitude. Programmability remains a tough issue. As a starting point, we define a hardware object interface in C++, PAM-Blox. PAM-Blox is an open, object-oriented environment for programming FPGAs that encourages design sharing and code reuse. PAM-Blox simplifies the creation of optimized high-performance designs. Encouraging a distributed effort to share hardware objects over the internet in the spirit of open software, is a first step towards improving the programmability of FPGAs. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/00/792/CSL-TR-00-792.pdf %R CSL-TR-00-793 %Z Thu, 24 Feb 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T Allocation and Interface Synthesis Algorithms for Component-based Design %A Smith, James %D February 2000 %X Since 1965, the size of transistors has been halved and their speed of operation has been doubled, every 18 to 24 months, a phenomenon known as Moore's Law. This has allowed rapid increases in the amount of circuitry that can be included on a single die. However, as the availability of hardware real estate escalates at an exponential rate, the complexity involved in creating circuitry that utilizes that real estate grows at an exponential, or higher, rate. Component-based design methodologies promise to reduce the complexity of this task and the time required to design integrated circuits by raising the level of abstraction at which circuitry is specified, synthesized, verified, or physically implemented. This thesis develops algorithms for synthesizing integrated circuits by mapping high-level specifications onto existing components. To perform this task, word- level polynomial representations are introduced as a mechanism for canonically and compactly representing the functionality of complex components. Polynomial representations can be applied to a broad range of circuits, including combinational, sequential, and datapath dominated circuits. They provide the basis for efficiently comparing the functionality of a circuit specification and a complex component. Once a set of existing components is determined to be an appropriate implementation of a specification, interfaces between these components must be designed. This thesis also presents an algorithm for automatically deriving an HDL model of an interface between two or more components given an HDL model of those components. The combination of polynomial representations and interface synthesis algorithms provides the basis for a component-based design methodology. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/00/793/CSL-TR-00-793.pdf %R CSL-TR-00-804 %Z Tue, 05 Sep 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %T IdentiScape: Tackling the Personal Online Identity Crisis %A Maniatis, Petros %A Baker, Mary %D June 2000 %X Traditional systems refer to a mobile person using the name or address of that person's communication device. As personal communications become more diverse and popular, this solution is no longer adequate, since mobile people frequently move between different devices and use different communications applications. This lack of identifiers for mobile people causes problems ranging from the inconvenient to the downright dangerous: to locate a person, callers must use potentially multiple email addresses, cell phone numbers, land line phone numbers or instant messaging IDs; callers leave sensitive messages on shared voicemail boxes; and they send communications intended for the previous owner of a telephone number to the next owner. To solve this naming problem, we should be able to name people as the ultimate endpoints of personal communications, regardless of the applications or devices they use. In this paper, we develop a naming scheme for mobile people: we derive its requirements and describe its design and implementation in the context of personal communications. IdentiScape, our prototype personal naming scheme, includes a name service which provides globally available identifiers that persist over time and an online identity repository service which can be locally owned and managed. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/00/804/CSL-TR-00-804.pdf %R CSL-TR-00-807 %Z Tue, 05 Sep 00 00:00:00 GMT %I Stanford University, Computer Systems Laboratory %TSUIF Explorer: An Interactive and Interprocedural Parallelizer %A Liao, Shih-Wei %D August 2000 %X Shared-memory multiprocessors that use the latest microprocessors are becoming widely used both as compute servers and as desktop computers. But the difficulty in developing parallel software is a major obstacle to the effective use of the multiprocessors to solve a single task. To increase the productivity of multiprocessor programmers, we developed an interactive interprocedural parallelizer called SUIF Explorer. Our experience with SUIF Explorer also helps to identify missing interprocedural analyses that can significantly improve an automatic parallelizer. As a parallel programming tool, the Explorer actively guides the programmers in the parallelization process using a set of advanced static and dynamic analyses and visualization techniques. Our interprocedural program analyses provide high- quality information that restricts the need for user assistance. The Explorer is also the first tool to apply slicing analysis to aid the programmer in uncovering program properties for interactive parallelization. These static and dynamic analyses minimize the number of lines of code requiring programmer assistance to produce parallel codes for real-world applications. As a tool for finding missing compiler techniques, SUIF Explorer helps the compiler researchers design the next-generation parallelizer. Our experience with the Explorer shows that interprocedural array liveness analysis is an enabler of several important optimizations, such as privatization and array contraction. We developed and evaluated an efficient context-sensitive and flow-sensitive interprocedural array liveness algorithm and integrated it into the parallelizer. We use the liveness information to enable contraction of arrays that are not live at loop exits, which results in a smaller memory footprint and better cache utilization. The resulting codes run faster on both uni- and multi- processors. Another key interprocdural analysis which we developed and evaluated is the array reduction analysis. Our reduction algorithm extends beyond previous approaches in its ability to locate reductions to array regions, even in the presence of arbitrarily complex data dependences. To exploit the multiprocessors effectively, the algorithm can locate interprocedural reductions, reduction operations that span multiple procedures. In summary, we successfully apply the Explorer to help the user develop parallel codes effectively and to help the compiler researcher develop the next-generation parallelizer. %U ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/00/807/CSL-TR-00-807.pdf %R KSL-TR-89-68 %Z Mon, 25 Apr 94 00:00:00 GMT %I Stanford University, Department of Computer Science, Knowledge Systems Laboratory %T The Parallel Solution of Classification Problems %A Maegawa, Hirotoshi %D April 1994 %X We developed a problem solving framework called ConClass capable of classifying continuous real-time problems dynamically and concurrently on a distributed system. ConClass provides an efficient development environment for describing and decomposing a classification problem and synthesizing solutions. In ConClass, designed concurrency of decomposed subproblems effectively corresponds to the actual distributed computation components. This scheme is useful for designing and implementing efficient distributed processing, making it easier to anticipate and evaluate the system behavior. ConClass system has an object replication feature in order to prevent a particular object from being overloaded. An efficient execution mechanism is implemented without using schedulers or synchronization schemes liable to be bottlenecks. In order to deal with an indeterminate amount of problem data, ConClass dynamically creates object networks to justify hypothesized solutions and thus achieves a dynamic load distribution. We confirmed the efficiency of parallel distributed processing and load balancing of ConClass with an experimental application. %U ftp://reports.stanford.edu/pub/cstr/reports/ksl/tr/89/68/KSL-TR-89-68.pdf %R NA-M-80-02 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T A generalized eigenvalue approach for solving Riccati equations %A Van Dooren, Paul M. %D July 1980 %X A numerically stable algorithm is derived to compute orthonormal bases for any deflating subspace of a regular pencil $\lambda$B-A. The method is based on an update of the QZ -algorithm, in order to obtain any desired ordering of eigenvalues in the quasi-triangular forms constructed by this algorithm. As applications we discuss a new approach to solve Riccati equations arising in linear system theory. The computation of deflating subspaces with specified spectrum in shown to be of crucial importance here. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/80/02/NA-M-80-02.pdf %R NA-M-80-03 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Computation of zeros of linear multivariable systems %A Emami-Naeini, Abbas %A Van Dooren, Paul M. %D July 1980 %X Several algorithms have been proposed in the literature for the computation of the zeros of a linear system described by a state-space model {$\lambda$I - A,B,C,D}. In this report we discuss the numerical properties of a new algorithm and compare it with some earlier techniques of computing zeros. The new approach to shown to handle both nonsquare and/or degenerate systems without difficulties whereas earlier methods would either fail or would require special treatment fo r these cases. The method is also shown to be backward stable in a rigorous sense. Several numerical examples are given in order to compare speed and accuracy of the algorithm with its nearest competitors. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/80/03/NA-M-80-03.pdf %R NA-M-80-05 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Efficient solution of the biharmonic equation %A Bjorstad, Petter E. %D September 1980 %X A new method for the numerical solution of the first biharmonic problem in a rectangular region is outlined. The theoretical complexity of the method is $N^2$ + O(N) storage and O($N^2$) arithmetic operations. (In order to achieve a prescribed accuracy on an N by N grid.) Numerical results from a computer code that requires a$N^2$ + b$N^2$logN + O(N) operations with b << a, are presented using both a scalar and a vector computer. Extensions and some applications of the method for solving eigenvalue problems and certain nonlinear problems are mentioned. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/80/05/NA-M-80-05.pdf %R NA-M-80-06 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T A new implementation of sparse Gaussian elimination %A Schreiber, Robert S. %D September 1980 %X An implementation of sparse ${LDL}^T$ and LU factorization and back-substitution, based on a new scheme for storing sparse matrices, is presented. The new method appears to be as efficient in terms of work and storage as existing schemes. It is more amenable to efficient implementation on fast pipelined scientific computers. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/80/06/NA-M-80-06.pdf %R NA-M-80-08 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Rational Chebyshev approximation on the unit disk %A Trefethen, Lloyd N. %D October 1980 %X In a recent paper we showed that error curves in polynomial Chebyshev approximation of analytic functions on the unit disk tend to approximate perfect circles about the origin. Making use of a theorem of Caratheodory and Fejer, we derived in the process a method for calculating near-best approximations rapidly by finding the principal singular value and corresponding singular vector of a complex Hankel matrix. This paper extends these developments to the problem of Chebyshev approximation by rational functions, where non-principal singular values and vectors of the same matrix turn out to be required. The theory is based on certain extensions of the Caratheodory-Fejer result which are also currently finding application in the fields of digital signal processing and linear systems theory. It is shown among other things that if f($\epsilon z$) is approximated by a rational function of type (m,n) for $\epsilon$ > 0, then under weak assumptions the corresponding error curves deviate from perfect circles of winding number M + N + 1 by a relative magnitude O(${\epsilon}^{m+n+2}$) as $\epsilon\ \rightarrow\ 0$. The "CF approximation" that our method computes approximates the true best approximation to the same high relative order. A numerical procedure for computing such approximations is described and shown to give results that confirm the asymptotic theory. Approximation of $e^z$ on the unit disk is taken as a central computational example. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/80/08/NA-M-80-08.pdf %R NA-M-80-09 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Finite-difference methods for singular perturbation and Navier-Stokes problems %A Schreiber, Robert S. %D November 1980 %X The linear equation $\epsilon u_{xx} + xu_x$ = 0, 0 < x < 1, is proposed as a model for investigating interesting features of the behavior of difference methods for realistic multidimensional nonlinear elliptic problems, especially Navier-Stokes problems. We give an analytic and experimental comparison of several difference schemes for this model problem. An unusual scheme for the Navier-Stokes equations is suggested by these results. An experiment shows that this scheme performs better than a more obvious one. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/80/09/NA-M-80-09.pdf %R NA-M-81-11 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Bifurcation problems for discrete variational inequalities %A Mittelmann, Hans Detlef %D April 1981 %X The buckling of a beam or a plate which are subject to obstacles is typical for the variational inequalities that are considered here. Bifurcation is known to occur from the first eigenvalue of the linearized problem. For a discretization the bifurcation point and the bifurcating braches may be obtained by solving a constrained optimization problem. An algorithm is proposed and its convergence is proved. The buckling of a clamped beam subject to point obstacles is considered in the continuous case and some numerical results for this problem are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/81/11/NA-M-81-11.pdf %R NA-M-81-12 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Group velocity in finite difference schemes %A Trefethen, Lloyd N. %D April 1981 %X The relevance of group velocity to the behavior of finite difference models of time-dependent partial differential equations is surveyed and illustrated. Applications involve the propagation of wave packets in one and two dimensions, numerical dispersion, the behavior of parasitic waves, and the stability analysis of initial boundary-value problems. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/81/12/NA-M-81-12.pdf %R NA-M-81-13 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Large time step shock-capturing techniques for scalar conservation laws %A LeVeque, Randall J. %D July 1981 %X For a scalar conservation law $u_t\ = {f(u)}_x\ with f"$ of constant sign, the first order upwind difference scheme is a special case of Godonov's method. The method is equivalent to solving a sequence of Riemann problems at each step and averaging the resulting solution over each cell in order to obtain the numerical solution at the next time level. The difference scheme is stable (and the solutions to the associated sequence of Riemann problems do not interact) provided the Courant number $\nu$ is less than 1. By allowing and explicitly handling such interactions, it is possible to obtain a generalized method which is stable for $\nu$ much larger than 1. In many cases the resulting solution is considerably more accurate than solutions obtained by other numerical methods. In particular, shocks can be correctly computed with virtually no smearing. The generalized method is rather unorthodox and still has some problems associated with it. Nonetheless, preliminary results are quite encouraging. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/81/13/NA-M-81-13.pdf %R NA-M-81-14 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T An efficient algorithm for bifurcation problems of variational inequalities %A Mittelmann, Hans Detlef %D September 1981 %X For a class of variational inequalities on a Hilbert space $H$ bifurcating solutions exist and may be characterized as critical points of a functional with respect to the intersection of the level surfaces of another functional and a closed convex subset $K$ of $H$. In a recent paper we have used a gradient-projection type algorithm to obtain the solutions for discretizations of the variational inequalities. A related but Newton-based method is given here. Global and asymptotically quadratic convergence is proved. Numerical results show that it may be used very efficiently in following the bifurcating branches and that it compares favorably with several other algorithms. The method is also attractive for a class of nonlinear eigenvalue problems ($K = H$) for which it reduces to a generalized Rayleigh-quotient iteration. So some results are included for the path following in turning point problems. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/81/14/NA-M-81-14.pdf %R NA-M-81-16 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Numerical methods based on additive splittings for hyperbolic partial differential equations %A LeVeque, Randall J. %A Oliger, Joseph E. %D October 1981 %X We derive and analyze several methods for systems of hyperbolic equations with wide ranges of signal speeds. These techniques are also useful for problems whose coefficients have large mean values about which they oscillate with small amplitude. Our methods are based on additive splittings of the operators into components that can be approximated independently on the different time scales, some of which are sometimes treated exactly. The efficiency of the splitting methods is seen to depend on the error incurred in splitting the exact solution operator. This is analyzed and a technique is discussed for reducing this error through a simple change of variables. A procedure for generating the appropriate boundary data for the intermediate solutions is also presented. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/81/16/NA-M-81-16.pdf %R NA-M-82-03 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Generalized iterative methods for semidefinite linear systems %A Schreiber, Robert S. %D June 1982 %X In this paper, we consider iterative solution procedures for solving singular linear systems Ax = b, b $\varepsilon$ Range (A) where A is an n by n, Hermitian, positive semidefinite matrix. Our aim is to consider variants of the block Jacobi, SOR, and SSOR iterations. The fundamental paper of Keller ([1965]) considers methods based on splittings A = B - C with B a nonsingular matrix. Here we allow B to be singular. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/82/03/NA-M-82-03.pdf %R NA-M-83-01 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Stability analysis of finite difference schemes for the advection-diffusion equation %A Chan, Tony F. %D January 1983 %X We present a collection of stability results for finite difference approximations to the advection-diffusion equation $u_t\ = a u_x\ + b u_{xx}$. The results are for centered difference schemes in space and include explicit and implicit schemes in time up to fourth order and schemes that use different space and time discretizations for the advective and diffusive terms. The results are derived from a uniform framework based on the Schur-Cohn theory of Simple von Neumann Polynomials and are necessary and sufficient for the stability of the Cauchy problem. Some of the results are believed to be new. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/83/01/NA-M-83-01.pdf %R NA-M-83-02 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Adaptive mesh refinement for hyperbolic partial differential equations %A Berger, Marsha J. %A Oliger, Joseph E. %D March 1983 %X We present an adaptive method based on the idea of multiple, component grids for the solution of hyperbolic partial differential equations using finite difference techniques. Based upon Richardson-type estimates of the truncation error, refined grids are created or existing ones removed to attain a given accuracy for a minimum amount of work. Our approach is recursive in that fine grids can themselves contain even finer grids. The grids with finer mesh width in space also have a smaller mesh width in time, making this a mesh refinement algorithm in time and space. We present the algorithm, data structures and grid generation procedure, and conclude with numerical examples in one and two space dimensions. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/83/02/NA-M-83-02.pdf %R NA-M-83-27 %Z Mon, 09 Oct 00 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The use of pre-conditioning over irregular regions %A Golub, Gene H. %A Mayers, David F. %D June 1983 %X Some ideas and techniques for solving elliptic pde.'s over irregular regions are discussed. The basic idea is to break up the domain into subdomains and then to use the pre-conditioned conjugate gradient method for obtaining the solution over the entire domain. The solution of Poisson's equation over a $T$-shaped region is described in some detail and a numerical example is given. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/83/27/NA-M-83-27.pdf %R NA-M-84-30 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Mesh-independent spectra in the moving finite element equations %A Wathen, Andrew J. %D August 1984 %X We derive the Moving Finite Element (MFE) equations for the solution of a scalar evolutionary equation in $d$ space dimensions ($d \geq\ 1$) and introduce the elementwise approach to MFE. This approach yields a decomposition of the mesh- and solution-dependent matrix $A$ in the (semi-discretised) non-linear system of ordinary differential equations $A(y)y = g(y)$ which forms the basis for proofs of eigenvalue clustering. With a simple, specific block diagonal preconditioner, $D$, it is shown that the eivenvalue spectrum of the preconditioned MFE matrix $D^{-1} A$ is [$\frac{1}{2} , 1 + \frac{d}{2}$] independently of the mesh configuration, the solution and the number of nodes. A more specific result is established for the case $d$ = 1. These results guarantee extremely rapid solution techniques using, for example, conjugate gradient methods. We show how the analysis extends to systems of partial differential equations when a separate moving mesh is used for each component. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/84/30/NA-M-84-30.pdf %R NA-M-85-32 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Simultaneous computation of stationary probabilities with estimates of their sensitivity %A Golub, Gene H. %A Meyer, Carl D. Jr. %D March 1985 %X For an n-state finite, homogeneous, ergodic Markov chain with transition matrix P = [$p_{ij}$], the stationary distribution is the unique row vector $\pi$ satisfying $\pi P = \pi , \sum {\pi}_i\ = 1$. Letting $A_{n\times n}$ and $e_{n\times 1}$ denote the matrices A = I - P and e = ${[1, 1, ..., 1]}^T$, the stationary distribution $\pi$ can be characterized as the unique solution to the linear system of equations defined by $\pi$A = 0 and $\pi$e = 1. The theory of finite Markov chains has long been a fundamental tool in the analysis of social and biological phenomena. More recently the ideas embodied in Markov chain models along with the analysis of a stationary distribution have proven to be useful in applications which do not fall directly into the traditional Markov chain setting. Some of these applications include the analysis of queueing networks (Kaufman [1984]), the analysis of compartmental ecological models (Funderlic and Mankin [1981]), and least squares adjustment of geodedic networks (Brandt [1983]). Recently, the behavior of the numerical solution of systems of nonlinear reaction-diffusion equations has been analyzed by making use of the stationary distribution of a finite Markov chain in conjunction with the concept of group matrix inversion (Galeone [1983]). An ergodic chain manifests itself in the transition matrix P which must be row stochastic and irreducible. Of central importance is the sensitivity of the stationary distribution $\pi$ to perturbations in the transition probabilities in P. The sensitivity of $\pi$ is most easily gauged by considering the transition probabilities in P to be differentiable functions. One approach, adopted by Conlisk [preprint, 1983], Schweitzer [1968], and Funderlic and Heath [1971] is to examine partial derivatives $\delta\pi /\delta p_{ij}$. Our strategy is to consider the transition probabilities $p_{ij}$(t) as differentiable functions of a single parameter t and study the stationary distribution $\pi$(t) as a function of t. We present a new and very simple formulation for the derivative, d$\pi$(t)/dt, of the stationary distribution directly in terms of the derivatives d$p_{ij}$(t)/dt and entries from $\pi$(t) and a matrix $A^#$(t), called the group inverse of A(t) = I - P(t). After the derivative d$\pi$(t)/dt has been obtained, we demonstrate its applicability by using it to deduce the relative sensitivity of a discrete Markov chain. This is followed by a first order perturbation analysis. Finally, it is demonstrated how a QR factorization can be used to simultaneously compute $\pi$ along with estimates which gauge the sensitivity of $\pi$ to perturbations in P. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/85/32/NA-M-85-32.pdf %R NA-M-85-33 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Multitasking the conjugate gradient on the CRAY X-MP/48 %A Meurant, Gerard A. %D August 1985 %X We show how to efficiently implement the preconditioned conjugate gradient method on a four processors computer CRAY X-MP/48. We solve block tridiagonal systems using block preconditioners well suited to parallel computation. Numerical results are presented that exhibit nearly optimal speed-up and high Mflops rates. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/85/33/NA-M-85-33.pdf %R NA-M-86-36 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The truncated SVD as a method for regularization %A Hansen, Per Christian %D October 1986 %X The truncated singular value decomposition (SVD) is considered as a method for regularization of ill-posed linear least squares problems. In particular, the truncated SVD solution is compared with the usual regularized solution. Necessary conditions are defined in which the two methods will yield similar results. This investigation suggests the truncated SVD as a favorable alternative to standard-form regularization in case of ill-conditioned matrices with a well-determined rank. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/86/36/NA-M-86-36.pdf %R NA-M-86-37 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T A survey of matrix inverse eigenvalue problems %A Boley, Daniel L. %A Golub, Gene H. %D November 1986 %X In this paper, we present a survey of some recent results regarding direct methods for solving certain symmetric inverse eigenvalue problems. The problems we discuss in this paper are those of generating a symmetric matrix, either Jacobi, banded, or some variation thereof, given only some information on the eigenvalues of the matrix itself and some of its principal submatrices. Much of the motivation for the problems discussed in this paper came about from an interest in the inverse Sturm-Liouville problem. A preliminary version of this report was issued as a technical report of the Computer Science Department, University of Minnesota, TR 86-20, May 1986. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/86/37/NA-M-86-37.pdf %R NA-M-87-01 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The convergence of inexact Chebyshev and Richardson iterative methods for solving linear systems %A Golub, Gene H. %A Overton, Michael L. %D February 1987 %X The Chebyshev and second-order Richardson methods are classical iterative schemes for solving linear systems. We consider the convergence analysis of these methods when each step of the iteration is carried out inexactly. This has many applications, since a preconditioned iteration requires, at each step, the solution of linear system which may be solved inexactly using an "inner" iteration. We derive an error bound which applies to the general nonsymmetric inexact Chebyshev iteration. We show how this simplifies slightly in the case of a symmetric or skew-symmetric iteration, and we consider both the cases of underestimating and overestimating the spectrum. We show that in the symmetric case, it is actually advantageous to underestimate the spectrum when the spectral radius and the degree of inexactness are both large. This is not true in the case of the skew-symmetric iteration. We show how similar results apply to the Richardson iteration. Finally, we describe numerical experiments which illustrate the results and suggest that the Chevyshev and Richardson methods, with reasonable param eter choices, may be more effective than the conjugate gradient method in the presence of inexactness. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/87/01/NA-M-87-01.pdf %R NA-M-87-02 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Estimates of eigenvalues for iterative methods %A Golub, Gene H. %A Kent, Mark D. %D February 1987 %X We describe procedures for determining estimates of the eigenvalues of operators used in various iterative methods for the solution of linear systems of equations. We also show how to determine upper and lower bounds for the error in the approximate solution of linear equations using essentially the same information as that needed for the eigenvalue calculations. The methods described depend strongly upon the theory of moments and Gauss quadrature. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/87/02/NA-M-87-02.pdf %R NA-M-87-04 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The convergence rate of inexact preconditioned steepest descent algorithms for solving linear systems %A Munthe-Kaas, Hans %D March 1987 %X The steepest descent algorithm is a classical iterative method for solving a linear system Ax=b, where A is a positive definite symmetric matrix. A common way to accelerate an iterative scheme is to precondition the method, i.e. to solve a simpler system Mz=r in each stage of the iteration. We analyze the effect of solving the preconditioner inexactly. A lower bound for the convergence rate is derived, and we show under what conditions this lower bound is obtained. Finally we describe some numerical experiments which show that in practical situations the lower bound may be too pessimistic. An amusing result is that in some cases small errors may lead to $\underline{higher}$ convergence rates than if the preconditioner is solved exactly! %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/87/04/NA-M-87-04.pdf %R NA-M-87-05 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Some history of the conjugate gradient and Lanczos algorithms: 1948-1976 %A Golub, Gene H. %A O'Leary, Dianne P. %D June 1987 %X This manuscript gives some of the history of the conjugate gradient and Lanczos algorithms and an annotated bibliography for the period 1948-1976. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/87/05/NA-M-87-05.pdf %R NA-M-87-06 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Numerical assessment of the validity of two-dimensional plate models %A Miara, Bernadette %D June 1987 %X The objective of this paper is to verify numerically the convergence of the solution to the three-dimensional problem of a clamped plate towards the solution to the corresponding "limit" two-dimensional problem when the thickness of the plate goes to zero. Standard finite elements discretization of the three-dimensional problem fails to show this convergence [M. Vidrascu, 1978] as they lead to ill-conditioned linear systems when the discretization parameter is of the order of the thickness. We will therefore use a spectral approximation of the solution of the three-dimensional problem. First, we shall review the three-dimensional and two-dimensional linear models of a clamped plate and give the convergence results obtained by P.-G. Ciarlet and P. Destuynder [1979], [1981]. Then we will discuss two kinds of spectral approximations: the Galerkin and Tau approximations. Finally we give the numerical results obtained by Tau approximation. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/87/06/NA-M-87-06.pdf %R NA-M-89-01 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Iterative methods for cyclically reduced non-self-adjoint linear systems %A Elman, Howard C. %A Golub, Gene H. %D February 1989 %X We study iterative methods for solving linear systems of the type arising from two-cyclic discretizations of non-self-adjoint two-dimensional elliptic partial differential equations. A prototype is the convection-diffusion equation. The methods consist of applying one step of cyclic reduction, resulting in a "reduced system" of half the order of the original discrete problem, combined with a reordering and a block iterative technique for solving the reduced system. For constant coefficient problems, we present analytic bounds on the spectral radii of the iteration matrices in terms of cell Reynolds numbers that show the methods to be rapidly convergent. In addition, we describe numerical experiments that supplement the analysis and that indicate that the methods compare favorably with methods for solving the "unreduced" system. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/01/NA-M-89-01.pdf %R NA-M-89-03 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The restricted singular value decomposition: properties and applications %A De Moor, Bart L. R. %A Golub, Gene H. %D April 1989 %X The restricted singular value decomposition (RSVD) is the factorization of a given matrix, relative to two other given matrices. It can be interpreted as the ordinary singular value decomposition with different inner products in row and column spaces. Its properties and structure are investigated in detail as well as its connection to generalized eigenvalue problems, canonical correlation analysis and other generalizations of the singular value decomposition. Applications that are discussed include the analysis of the extended shorted operator, unitarily invariant norm minimization with rank constraints, rank minimization in matrix balls, the analysis and solution of linear matrix equations, rank minimization of a partitioned matrix and the connection with generalized Schur complements, constrained linear and total linear least squares problems, with mixed exact and noisy data, including a generalized Gauss-Markov estimation scheme. Two constructive proofs of the RSVD in terms of other generalizations of the ordinary singular value decomposition are provided as well. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/03/NA-M-89-03.pdf %R NA-M-89-05 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Generalized singular value decompositions: a proposal for a standardized nomenclature %A De Moor, Bart L. R. %A Golub, Gene H. %D April 1989 %X An alphabetic and mnemonic system of names for several matrix decompositions related to the singular value decomposition is proposed: the OSVD, PSVD, QSVD, RSVD, SSVD, TSVD. The main purpose of this note is to propose a standardization of the nomenclature and the structure of these matrix decompositions. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/05/NA-M-89-05.pdf %R NA-M-89-06 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T On the structure and geometry of the product singular value decomposition %A De Moor, Bart L. R. %D May 1989 %X The product singular value decomposition is a factorization of two matrices, which can be considered as a generalization of the ordinary singular value decomposition, at the same level of generality as the quotient (generalized) singular value decomposition. A constructive proof of the product singular value decomposition is provided, which exploits the close relation with a symmetric eigenvalue problem. Several interesting properties are established. The structure and the non-uniqueness properties of the so called contragredient transformation, which appears as one of the factors in the product singular value decomposition, are investigated in detail. Finally, a geometrical interpretation of the structure is provided in terms of principal angles between subspaces. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/06/NA-M-89-06.pdf %R NA-M-89-07 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Iterative methods for cyclically reduced non-self-adjoint linear systems II %A Elman, Howard C. %A Golub, Gene H. %D June 1989 %X We perform an analytic and experimental study of line iterative methods for solving linear systems arising from finite difference discretizations of non-self-adjoint elliptic partial differential equations on two-dimensional domains. The methods consist of performing one step of cyclic reduction, followed by solution of the resulting reduced system by line relaxation. We augment previous analyses of one-line methods, and we derive a new convergence analysis for two-line methods, showing that both classes of methods are highly effective for solving the convection-diffusion equation. In addition, we compare the experimental performance of several variants of these methods, and we show that the methods can be implemented efficiently on parallel architectures. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/07/NA-M-89-07.pdf %R NA-M-89-09 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T On generating polynomials which are orthogonal over several intervals %A Golub, Gene H. %A Fischer, Bernd %D August 1989 %X We consider the problem of generating the recursion coefficients of orthogonal polynomials for a given weight function. The weight function is assumed to be the weighted sum of weight functions, each supported on its own interval. Some of these intervals may coincide, overlap or are contiguous. We discuss three algorithms. Two of them are based on modified moments, whereas the other is based on an explicit expression for the desired coefficients. Several examples, illustrating the numerical performance of the various methods, are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/09/NA-M-89-09.pdf %R NA-M-89-12 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Backward error assertions for checking solutions to systems of linear equations %A Boley, Daniel L. %A Golub, Gene H. %A Makar, Samy R. %A Saxena, Nirmal R. %A McCluskey, Edward J. %D November 1989 %X This paper presents an assertion scheme based on the backward error analysis for error detection in algorithms that solve a system of linear equations, Ax = b. This Backward Error Assertion Model can be easily instrumented in a Watchdog processor environment. The complexity of verifying assertions is O($n^2$) compared to the O($n^3$) complexity of algorithms solving Ax = b. Unlike other proposed error detection methods, this assertion model does not require any encoding of matrix A. Experimental results under various error models are presented to validate the effectiveness of these assertions. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/89/12/NA-M-89-12.pdf %R NA-M-90-01 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Line iterative methods for cyclically reduced discrete convection-diffusion problems %A Elman, Howard C. %A Golub, Gene H. %D February 1990 %X We perform an analytic and empirical study of line iterative methods for solving the discrete convection-diffusion equation. The methodology consists of performing one step of the cyclic reduction method, followed by iteration on the resulting reduced system using line orderings of the reduced grid. Two classes of iterative methods are considered: block stationary methods, such as the block Gauss-Seidel and SOR methods, and preconditioned generalized minimum residual methods with incomplete LU preconditioners. New analysis extends convergence bounds for constant coefficient problems to problems with separable variable coefficients. In addition, analytic results show that iterative methods based on incomplete LU preconditioners have faster convergence rates than block Jacobi relaxation methods. Numerical experiments examine additional properties of the two classes of methods, including the effects of direction of flow, discretization, and grid ordering on performance. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/90/01/NA-M-90-01.pdf %R NA-M-90-06 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The nonsymmetric Lanczos algorithm and controllability %A Boley, Daniel L. %A Golub, Gene H. %D May 1990 %X We give a brief description of a non-symmetric Lanczos algorithm that does not require strict bi-orthogonality among the generated vectors. We show how the vectors generated are algebraically related to "Controllable Space" and "Observable Space" for a related linear dynamical system. The algorithm described is particularly appropriate for large sparse systems. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/90/06/NA-M-90-06.pdf %R NA-M-90-07 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Adaptive Lanczos methods for recursive condition estimation %A Ferng, William R. %A Golub, Gene H. %A Plemmons, Robert J. %D June 1990 %X Estimates for the condition number of a matrix are useful in many areas of scientific computing, including: recursive least squares computations, optimization, eigenanalysis, and general nonlinear problems solved by linearization techniques where matrix modification techniques are used. The purpose of this paper is to propose an adaptive Lanczos estimator scheme, which we call ale, for tracking the condition number of the modified matrix over time. Applications to recursive least squares (RLS) computations using the covariance method with sliding data windows are considered. ale is fast for relatively small n - parameter problems arising in RLS methods in control and signal processing, and is adaptive over time, i.e., estimates at time t are used to produce estimates at time t + 1. Comparisons are made with other adaptive and non-adaptive condition estimators for recursive least squares problems. Numerical experiments are reported indicating that ale yields a very accurate recursive condition estimator. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/90/07/NA-M-90-07.pdf %R NA-M-91-05 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Iterative solution of linear systems %A Freund, Roland W. %A Golub, Gene H. %A Nachtigal, Noel M. %D November 1991 %X Recent advances in the field of iterative methods for solving large linear systems are reviewed. The main focus is on developments in the area of conjugate gradient-type algorithms and Krylov subspace methods for non-Hermitian matrices. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/91/05/NA-M-91-05.pdf %R NA-M-91-06 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T How to generate unknown orthogonal polynomials out of known orthogonal polynomials %A Golub, Gene H. %A Fischer, Bernd %D November 1991 %X We consider the problem of generating the three-term recursion coefficients of orthogonal polynomials for a weight function $v(t) = r(t)w(t)$, obtained by modifying a given weight function $w$ by a rational function $r$. Algorithms for the construction of the orthogonal polynomials for the new weight $v$ in terms of those for the old weight $w$ are presented. All the methods are based on modified moments. As applications we present Gaussian quadrature rules for integrals in which the integrand has singularities close to the interval of integration, and the generation of orthogonal polynomials for the (finite) Hermite weight $e^{-t^{2}}$, supported on a finite interval [$-b,b$]. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/91/06/NA-M-91-06.pdf %R NA-M-91-03 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Direct block tridiagonalization of single-input single-output systems %A Golub, Gene H. %A Kagstrom, Bo T. %A Dooren, Paul M. Van %D July 1991 %X In this paper we derive a direct method for block tridiagonalizing a single-input single-output system triple $/{A,b,c/}$. The method is connected to the nonsymmetric Lanczos procedure developed in [Wilkinson, 1965][Boley/Golub, 1990][Boley/Elhay/Golub/Gutknecht, 1990] and also leads to canonical representations of such triples. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/91/03/NA-M-91-03.pdf %R NA-M-91-04 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Fast iterative solution of stabilised Stokes systems. Part I: Using simple diagonal preconditioners %A Wathen, Andrew J. %A Silvester, David J. %D October 1991 %X Mixed finite element approximation of the classical Stokes problem describing slow viscous incompressible flow gives rise to symmetric indefinite systems for the discrete velocity and pressure variables. Iterative solution of such indefinite systems is feasible and is an attractive approach for large problems. The use of stabilisation methods for convenient (but unstable) mixed elements introduces stabilisation parameters. We show how these can be chosen to obtain rapid iterative convergence. We propose a conjugate gradient-like method (the method of preconditioned conjugate residuals) which is applicable to symmetric indefinite problems, describe the effects of stabilisation on the algebraic structure of the discrete Stokes operator and derive estimates of the eigenvalue spectrum of this operator on which the convergence rate of the iteration depends. Here we discuss the simple case of diagonal preconditioning. Our results apply to both locally and globally stabilised mixed elements as well as to elements which are inherently stable. We demonstrate that convergence rates comparable to that achieved using the diagonally scaled conjugate gradient method applied to the discrete Laplacian are approachable for the Stokes problem. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/91/04/NA-M-91-04.pdf %R NA-M-92-01 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T A look-ahead algorithm for the solution of general Hankel systems %A Freund, Roland W. %A Z ha, Hongyuan %D January 1992 %X The solution of linear systems of equations with Hankel coefficient matrices can be computed with only $O(n^2)$ arithmetic operations, as compared to $O(n^3)$ operations for the general case. However, the classical Hankel solvers require the nonsingularity of all leading principal submatrices of the Hankel matrix. The known extensions of these algorithms to general Hankel systems can handle only exactly singular submatrices, but not ill-conditioned ones, and hence they are numerically unstable. In this paper, a stable procedure for solving general nonsingular Hankel systems is presented, using a look-ahead technique to skip over singular or ill-conditioned submatrices. The proposed approach is based on a look-ahead variant of the nonsymmetric Lanczos process that was recently developed by Freund, Gutknecht, and Nachtigal. We first derive a somewhat more general formulation of this look-ahead Lanczos algorithm in terms of formally orthogonal polynomials, which then yields the look-ahead Hankel solver as a special case. We prove some general properties of the resulting look-ahead algorithm for formally orthogonal polynomials. These results are then utilized in the implementation of the Hankel solver. We report some numerical experiments for Hankel systems with ill-conditioned submatrices. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/01/NA-M-92-01.pdf %R NA-M-92-02 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Recent advances in Lanczos-based iterative methods for nonsymmetric linear systems %A Freund, Roland W. %A Golub, Gene H. %A Nachtigal, Noel M. %D January 1992 %X In recent years, there has been a true revival of the nonsymmetric Lanczos method. On the one hand, the possible breakdowns in the classical algorithm are now better understood, and so-called look-ahead variants of the Lanczos process have been developed, which remedy this problem. On the other hand, various new Lanczos-based iterative schemes for solving nonsymmetric linear systems have been proposed. This paper gives a survey of some of these recent developments. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/02/NA-M-92-02.pdf %R NA-M-92-04 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T On the convergence of line iterative methods for cyclically reduced non-symmetrizable linear systems %A Elman, Howard C. %A Golub, Gene H. %A Starke, Gerhard C. %D May 1992 %X We derive analytic bounds on the convergence factors associated with block relaxation methods for solving the discrete two-dimensional convection-diffusion equation. The analysis applies to the reduced systems derived when one step of block Gaussian elimination is performed on red-black ordered two-cyclic discretizations. We consider the case where centered finite difference discretization is used and one cell Reynolds number is less than one in absolute value and the other is greater than one. It is shown that line ordered relaxation exhibits very fast rates of convergence. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/04/NA-M-92-04.pdf %R NA-M-92-05 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Adaptive Chebyshev iterative methods for nonsymmetric linear systems based on modified moments %A Calvetti, Daniela %A Golub, Gene H. %A Reichel, Lothar %D May 1992 %X Large, sparse nonsymmetric systems of linear equations with a matrix whose eigenvalues lie in the right half plane may be solved by an iterative method based on Chebyshev polynomials for an interval in the complex plane. Knowledge of the convex hull of the spectrum of the matrix is required in order to choose parameters upon which the iteration depends. Adaptive Chebyshev algorithms, in which these parameters are determined by using eigenvalue estimates computed by the power method or modifications thereof, have been described by Manteuffel [1978]. This paper presents adaptive Chebyshev iterative methods, in which eigenvalue estimates are computed from modified moments determined during the iterations. The computation of eigenvalue estimates from modified moments requires less computer storage than when eigenvalue estimates are computed by a power method and yields faster convergence for many problems. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/05/NA-M-92-05.pdf %R NA-M-92-09 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T An implementation of a generalized Lanczos procedure for structural dynamic analysis on distributed memory computers %A Mackay, David R. %A Law, Kincho H. %D August 1992 %X This paper describes a parallel implementation of a generalized Lanczos procedure for structural dynamic analysis on a distributed memory parallel computer. One major cost of the generalized Lanczos procedure is the factorization of the (shifted) stiffness matrix and the forward and backward solution of triangular systems. In this paper, we discuss load assignment of a sparse matrix and propose a strategy for inverting the principal block submatrix factors to facilitate the forward and backward solution of triangular systems. We also discuss the different strategies in the implementation of mass matrix-vector multiplication on parallel computers and how they are used in the Lanczos procedure. The Lanczos procedure implemented includes partial and external selective reorthogonalizations and spectral shifts. Experimental results are presented to illustrate the effectiveness of the parallel generalized Lanczos procedure. The issues of balancing the computations among the basic steps of the Lanczos procedure on distributed memory computers are discussed. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/09/NA-M-92-09.pdf %R NA-M-92-10 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T A parallel row-oriented sparse solution method for finite element structural analysis %A Law, Kincho H. %A Mackay, David R. %D August 1992 %X This paper describes a parallel implementation of $LDL^T$ factorization on a distributed memory parallel computer. Specifically, the parallel $LDL^T$ factorization procedure is based on a row-oriented sparse storage scheme. In addition, a strategy is proposed for the parallel solution of triangular systems of equations. The strategy is to compute the inverses of the dense principal diagonal block submatrices of the factor $L$, stored in a row-oriented structure. Experimental results for a number of finite element models are presented to illustrate the effectiveness of the parallel solution schemes. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/10/NA-M-92-10.pdf %R NA-M-92-11 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T A new approach for solving perturbed symmetric eigenvalue problems %A Carey, Cheryl M. M. %A Chen, Hsin-Chu %A Golub, Gene H. %A Sameh, Ahmed H. %D September 1992 %X In this paper, we present a new approach for the solution to a series of slightly perturbed symmetric eigenvalue problems $(A + BS_{i}B^{T}) x = \lambda\ x, 0 \leq\ i \leq\ m$, where $A = A^T\ \in\ R^{n\times n}, B \in\ R^{n\times p}$, and $S_i\ = S_{i}^{T}\ \in\ R^{p\times p}, p \ll\ n$. The matrix $B$ is assumed to have full column rank. The main idea of our approach lies in a specific choice of starting vectors used in the block Lanczos algorithm so that the effect of the perturbations is confined to lie in the first diagonal block of the block tridiagonal matrix that is produced by the block Lanczos algorithm. Subsequently, for the perturbed eigenvalue problems under our consideration, the block Lanczos scheme needs be applied to the original (unperturbed) matrix only once and then the first diagonal block updated for each perturbation so that for low-rank perturbations, the algorithm presented in this paper results in significant savings. Numerical examples based on finite element vibration analysis illustrated the advantages of this approach. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/11/NA-M-92-11.pdf %R NA-M-92-12 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Matrix shapes invariant under the symmetric QR algorithm %A Arbenz, Peter %A Golub, Gene H. %D September 1992 %X It is shown, which zero patterns of symmetric matrices are preserved under the QR algorithm. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/12/NA-M-92-12.pdf %R NA-M-92-13 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T The canonical correlations of matrix pairs and their numerical computation %A Golub, Gene H. %A Z ha, Hongyuan %D September 1992 %X This paper is concerned with the analysis of canonical correlations of matrix pairs and their numerical computation. We first develop a decomposition theorem for matrix pairs having the same number of rows which explicitly exhibits the canonical correlations. We then present a perturbation analysis of the canonical correlations, which compares favorably with the classical first order perturbation analysis. Then we propose several numerical algorithms for computing the canonical correlations of general matrix pairs; emphasis is placed on the case of large sparse or structured matrices. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/13/NA-M-92-13.pdf %R NA-M-92-14 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Cyclic reduction/multigrid %A Golub, Gene H. %A Tuminaro, Ray S. %D September 1992 %X We consider the use of the multigrid method in conjunction with a cyclic reduction preconditioner for convection-diffusion equations. This preconditioner corresponds to algebraically eliminating all the unknowns associated with the red points on a standard mesh colored in a checker-board fashion. It is shown that the multigrid method applied to the resulting operator often converges much faster than when applied to the original equations. Fourier analysis of a constant coefficient model problem as well as numerical results for nonconstant coefficient examples are used to validate the conclusions. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/14/NA-M-92-14.pdf %R NA-M-92-15 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Fast solution of the Helmholtz equation with radiation condition by imbedding %A Ernst, Oliver G. %D October 1992 %X No abstract available. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/15/NA-M-92-15.pdf %R NA-M-92-16 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Model problems in numerical stability theory for initial value problems %A Stuart, Andrew M. %A Humphries, Antony R. %D November 1992 %X In the past numerical stability theory for initial value problems in ordinary differential equations has been dominated by the study of problems with essentially trivial dynamics. Whilst this has resulted in a coherent and self-contained body of knowledge, it has not thoroughly addressed the problems of real interest in applications. Recently there have been a number of studies of numerical stability for wider classes of problems admitting more complicated dynamics. This on-going work is unified and possible directions for future work are outlined. In particular, striking similarities between this new developing stability theory and the classical non-linear stability theory are emphasised. The classical theories of $A$, $B$, and algebraic stability for Runge-Kutta methods are briefly reviewed, and it is emphasised that the classes of equations to which these theories apply - linear decay and contractive problems - only admit trivial dynamics. Four other categories of equations - gradient, dissipative, conservative and Hamiltonian systems - are considered. Relationships and differences between the possible dynamics in each category, which range from multiple competing equilibria to fully chaotic solutions, are highlighted and it is stressed that the wide range of possible behaviour allows a large variety of applications. Runge-Kutta schemes which preserve the dynamical structure of the underlying problem are sought, and indications of a strong relationship between the developing stability theory for these new categories and the classical existing stability theory for the older problems are given. Algebraic stability, in particular, is seen to play a central role. The effects of error control are considered, and multi-step methods are discussed briefly. Finally, various open problems are described. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/16/NA-M-92-16.pdf %R NA-M-92-18 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T An analysis of local error control for dissipative, contractive and gradient dynamical systems %A Stuart, Andrew M. %A Humphries, Antony R. %D November 1992 %X The dynamics of numerical methods with local error control are studied for three classes of ordinary differential equations: dissipative, contractive and gradient systems. Dissipative dynamical systems are characterised by having a bounded absorbing set $\cal B$ which all trajectories eventually enter and remain inside. The exponentially contractive problems studied have a unique, globally attracting equilibrium point and thus they are also dissipative since the absorbing set $\cal B$ may be chosen to be a ball of arbitrarily small radius around the equilibrium point. The gradient systems studied are those for which the set of equilibria comprises isolated points and all trajectories are bounded so that each trajectory converges to an equilibrium point as $t \rightarrow\ \infty$. If the set of equilibria is bounded then the gradient systems are also dissipative. The aim is to find conditions under which numerical methods with local error control replicate these large-time dynamical features. The results are proved without recourse to asymptotic expansions for the truncation error. Standard embededed Runge-Kutta pairs are analysed together with several non-standard error control strategies. These non-standard strategies are easy to implement and have desirable properties within certain of the classes of problems studied. Both error per step and error per unit step strategies are considered. Certain embedded pairs are identified for which the sequence generated can be viewed as coming from a small perturbation of an algebraically stable scheme, with the size of the perturbation proportional to the tolerance $\tau$. Such embedded pairs are defined to be algebraically stable and explicit algebraically stable pairs are identified. Conditions on the tolerance $\tau$ are identified under which appropriate discrete analogues of the properties of the underlying differential equation may be proved for certain algebraically stable embedded pairs. In particular, it is shown that for dissipative problems the discrete dynamical system has an absorbing set ${\cal B}_{\tau}$ and is hence dissipative. For exponentially contractive problems the radius of ${\cal B}_{\tau}$ is proved to be proportional to a positive power of $\tau$. For gradient systems the numerical solution enters and remains in a small ball about one of the equilibria and the radius of the ball $\rightarrow$ 0 as $\tau\ \rightarrow$ 0. Thus the local error control mechanisms confer desirable global properties on the numerical solution. It is shown that for error per unit step strategies the conditions on the tolerance $\tau$ are independent of initial data whilst for error per step strategies the conditions are initial data dependent. Thus error per unit step strategies are considerably more robust. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/18/NA-M-92-18.pdf %R NA-M-92-20 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T Use of linear algebra kernels to build an efficient finite element solver %A Elman, Howard C. %A Lee, Dennis K.-Y. %D December 1992 %X For scientific codes to achieve good performance on computers with hierarchical memories, it is necessary that the ratio of memory references to arithmetic operations be low. In this paper, we show that Level 3 BLAS linear algebra kernels can be used to satisfy this requirement to produce an efficient implementation of a parallel finite element solver on a shared memory parallel computer with a fast cache memory. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/20/NA-M-92-20.pdf %R NA-M-92-21 %Z Sun, 28 Jan 96 00:00:00 GMT %I Stanford University, Department of Computer Science, Numerical Analysis Project %T On the error computation for polynomial based iteration methods %A Fischer, Bernd %A Golub, Gene H. %D December 1992 %X In this note we investigate the Chebyshev iteration and the conjugate gradient method applied to the system of linear equations $Ax = f$ where $A$ is a symmetric, positive definite matrix. For both methods we present algorithms which approximate during the iteration process the $kth$ error $\varepsilon_k = \l x - x_k\l A$. The algorithms are based on the theory of modified moments and Gaussian quadrature. The proposed schemes are also applicable for other polynomial iteration schemes. Several examples, illustrating the performance of the described methods, are presented. %U ftp://reports.stanford.edu/pub/cstr/reports/na/m/92/21/NA-M-92-21.pdf %R SEL-TR-83-003 %Z Thu, 03 Dec 98 00:00:00 GMT %I Stanford University, Stanford Electronic Laboratories %T Timing Models for MOS Circuits %A Horowitz, Mark A. %D December 1983 %X Performance is an important aspect of integrated circuit design, and depends in part on the speed of the underlying circuits. This thesis presents a new method of analyzing MOS circuit delay, based on a single-time-constant approximation. The timing models characterize the circuit by a single parameter, which depends on the resistance and capacitance of the circuit elements. To ensure the single- time-constant approximation is valid for a particular circuit, the timing models provide both an estimate and bounds for the output waveform. For circuits where the bounds are poor, an improved timing model is derived. These simple models provide insight about circuit performance issues, as well as determining the circuit delay. The timing models are first developed for linear networks and then are extended to model MOS circuits driven by a step input. By using the single-time-constant approximation, the output waveform of a complex MOS circuit can be modelled by the output of a circuit consisting of a single MOS transistor and a single capacitor. Finally, a new circuit model of a gate is used to derive the output waveform of a circuit driven by an arbitrary input. The resulting timing model does not depend strongly on the shape of the input: the output waveform only depends on the input's slope at the gate's switching voltage. %U ftp://reports.stanford.edu/pub/cstr/reports/sel/tr/83/003/SEL-TR-83-003.pdf