FIRST COURSE IN DATABASE SYSTEMS -- Third Edition Jeffrey D. Ullman Jennifer Widom TABLE OF CONTENTS 1 The Worlds of Database Systems 1.1 The Evolution of Database Systems 1.1.1 Early Database Management Systems 1.1.2 Relational Database Systems 1.1.3 Smaller and Smaller Systems 1.1.4 Bigger and Bigger Systems 1.1.5 Information Integration 1.2 Overview of a Database Management System 1.2.1 Data-Definition Language Commands 1.2.2 Overview of Query Processing 1.2.3 Storage and Buffer Management 1.2.4 Transaction Processing 1.2.5 The Query Processor 1.3 Outline of Database-System Studies 1.4 References for Chapter 1 PART I: Relational Database Modeling 2 The Relational Model of Data 2.1 An Overview of Data Models 2.1.1 What is a Data Model? 2.1.2 Important Data Models 2.1.3 The Relational Model in Brief 2.1.4 The Semistructured Model in Brief 2.1.5 Other Data Models 2.1.6 Comparison of Modeling Approaches 2.2 Basics of the Relational Model 2.2.1 Attributes 2.2.2 Schemas 2.2.3 Tuples 2.2.4 Domains 2.2.5 Equivalent Representations of a Relation 2.2.6 Relation Instances 2.2.7 Keys of Relations 2.2.8 An Example Database Schema 2.2.9 Exercises for Section 2.2 2.3 Defining a Relation Schema in SQL 2.3.1 Relations in SQL 2.3.2 Data Types 2.3.3 Simple Table Declarations 2.3.4 Modifying Relation Schemas 2.3.5 Default Values 2.3.6 Declaring Keys 2.3.7 Exercises for Section 2.3 2.4 An Algebraic Query Language 2.4.1 Why Do We Need a Special Query Language? 2.4.2 What is an Algebra? 2.4.3 Overview of Relational Algebra 2.4.4 Set Operations on Relations 2.4.5 Projection 2.4.6 Selection 2.4.7 Cartesian Product 2.4.8 Natural Joins 2.4.9 Theta-Joins 2.4.10 Combining Operations to Form Queries 2.4.11 Naming and Renaming 2.4.12 Relationships Among Operations 2.4.13 A Linear Notation for Algebraic Expressions 2.4.14 Exercises for Section 2.4 2.5 Constraints on Relations 2.5.1 Relational Algebra as a Constraint Language 2.5.2 Referential Integrity Constraints 2.5.3 Key Constraints 2.5.4 Additional Constraint Examples 2.5.5 Exercises for Section 2.5 2.6 Summary of Chapter 2 2.7 References for Chapter 2 3 Design Theory for Relational Databases 3.1 Functional Dependencies 3.1.1 Definition of Functional Dependency 3.1.2 Keys of Relations 3.1.3 Superkeys 3.1.4 Exercises for Section 3.1 3.2 Rules About Functional Dependencies 3.2.1 Reasoning About Functional Dependencies 3.2.2 The Splitting/Combining Rule 3.2.3 Trivial Functional Dependencies 3.2.4 Computing the Closure of Attributes 3.2.5 Why the Closure Algorithm Works 3.2.6 The Transitive Rule 3.2.7 Closing Sets of Functional Dependencies 3.2.8 Projecting Functional Dependencies 3.2.9 Exercises for Section 3.2 3.3 Design of Relational Database Schemas 3.3.1 Anomalies 3.3.2 Decomposing Relations 3.3.3 Boyce-Codd Normal Form 3.3.4 Decomposition into BCNF 3.3.5 Exercises for Section 3.3 3.4 Decomposition: The Good, Bad, and Ugly 3.4.1 Recovering Information from a Decomposition 3.4.2 The Chase Test for Lossless Join 3.4.3 Why the Chase Works 3.4.4 Dependency Preservation 3.4.5 Exercises for Section 3.4 3.5 Third Normal Form 3.5.1 Definition of Third Normal Form 3.5.2 The Synthesis Algorithm for 3NF Schemas 3.5.3 Why the 3NF Synthesis Algorithm Works 3.5.4 Exercises for Section 3.5 3.6 Multivalued Dependencies 3.6.1 Attribute Independence and Its Consequent Redundancy 3.6.2 Definition of Multivalued Dependencies 3.6.3 Reasoning About Multivalued Dependencies 3.6.4 Fourth Normal Form 3.6.5 Decomposition into Fourth Normal Form 3.6.6 Relationships Among Normal Forms 3.6.7 Exercises for Section 3.6 3.7 An Algorithm for Discovering MVD's 3.7.1 The Closure and the Chase 3.7.2 Extending the Chase to MVD's 3.7.3 Why the Chase Works for MVD's 3.7.4 Projecting MVD's 3.7.5 Exercises for Section 3.7 3.8 Summary of Chapter 3 3.9 References for Chapter 3 4 High-Level Database Models 4.1 The Entity/Relationship Model 4.1.1 Entity Sets 4.1.2 Attributes 4.1.3 Relationships 4.1.4 Entity-Relationship Diagrams 4.1.5 Instances of an E/R Diagram 4.1.6 Multiplicity of Binary E/R Relationships 4.1.7 Multiway Relationships 4.1.8 Roles in Relationships 4.1.9 Attributes on Relationships 4.1.10 Converting Multiway Relationships to Binary 4.1.11 Subclasses in the E/R Model 4.1.12 Exercises for Section 4.1 4.2 Design Principles 4.2.1 Faithfulness 4.2.2 Avoiding Redundancy 4.2.3 Simplicity Counts 4.2.4 Choosing the Right Relationships 4.2.5 Picking the Right Kind of Element 4.2.6 Exercises for Section 4.2 4.3 Constraints in the E/R Model 4.3.1 Keys in the E/R Model 4.3.2 Representing Keys in the E/R Model 4.3.3 Referential Integrity 4.3.4 Degree Constraints 4.3.5 Exercises for Section 4.3 4.4 Weak Entity Sets 4.4.1 Causes of Weak Entity Sets 4.4.2 Requirements for Weak Entity Sets 4.4.3 Weak Entity Set Notation 4.4.4 Exercises for Section 4.4 4.5 From E/R Diagrams to Relational Designs 4.5.1 From Entity Sets to Relations 4.5.2 From E/R Relationships to Relations 4.5.3 Combining Relations 4.5.4 Handling Weak Entity Sets 4.5.5 Exercises for Section 4.5 4.6 Converting Subclass Structures to Relations 4.6.1 E/R-Style Conversion 4.6.2 An Object-Oriented Approach 4.6.3 Using Null Values to Combine Relations 4.6.4 Comparison of Approaches 4.6.5 Exercises for Section 4.6 4.7 Unified Modeling Language 4.7.1 UML Classes 4.7.2 Keys for UML classes 4.7.3 Associations 4.7.4 Self-Associations 4.7.5 Association Classes 4.7.6 Subclasses in UML 4.7.7 Aggregations and Compositions 4.7.8 Exercises for Section 4.7 4.8 From UML Diagrams to Relations 4.8.1 UML-to-Relations Basics 4.8.2 From UML Subclasses to Relations 4.8.3 From Aggregations and Compositions to Relations 4.8.4 The UML Analog of Weak Entity Sets 4.8.5 Exercises for Section 4.8 4.9 Object Definition Language 4.9.1 Class Declarations 4.9.2 Attributes in ODL 4.9.3 Relationships in ODL 4.9.4 Inverse Relationships 4.9.5 Multiplicity of Relationships 4.9.6 Types in ODL 4.9.7 Subclasses in ODL 4.9.8 Declaring Keys in ODL 4.9.9 Exercises for Section 4.9 4.10 From ODL Designs to Relational Designs 4.10.1 From ODL Classes to Relations 4.10.2 Complex Attributes in Classes 4.10.3 Representing Set-Valued Attributes 4.10.4 Representing Other Type Constructors 4.10.5 Representing ODL Relationships 4.10.6 Exercises for Section 4.10 4.11 Summary of Chapter 4 4.12 References for Chapter 4 PART II: Relational Database Programming 5 Algebraic and Logical Query Languages 5.1 Relational Operations on Bags 5.1.1 Why Bags? 5.1.2 Union, Intersection, and Difference of Bags 5.1.3 Projection of Bags 5.1.4 Selection on Bags 5.1.5 Product of Bags 5.1.6 Joins of Bags 5.1.7 Exercises for Section 5.1 5.2 Extended Operators of Relational Algebra 5.2.1 Duplicate Elimination 5.2.2 Aggregation Operators 5.2.3 Grouping 5.2.4 The Grouping Operator 5.2.5 Extending the Projection Operator 5.2.6 The Sorting Operator 5.2.7 Outerjoins 5.2.8 Exercises for Section 5.2 5.3 A Logic for Relations 5.3.1 Predicates and Atoms 5.3.2 Arithmetic Atoms 5.3.3 Datalog Rules and Queries 5.3.4 Meaning of Datalog Rules 5.3.5 Extensional and Intensional Predicates 5.3.6 Datalog Rules Applied to Bags 5.3.7 Exercises for Section 5.3 5.4 Relational Algebra and Datalog 5.4.1 Boolean Operations 5.4.2 Projection 5.4.3 Selection 5.4.4 Product 5.4.5 Joins 5.4.6 Simulating Multiple Operations with Datalog 5.4.7 Comparison Between Datalog and Relational Algebra 5.4.8 Exercises for Section 5.4 5.5 Summary of Chapter 5 5.6 References for Chapter 5 6 The Database Language SQL 6.1 Simple Queries in SQL 6.1.1 Projection in SQL 6.1.2 Selection in SQL 6.1.3 Comparison of Strings 6.1.4 Pattern Matching in SQL 6.1.5 Dates and Times 6.1.6 Null Values and Comparisons Involving {\tt NULL} 6.1.7 The Truth-Value {\tt UNKNOWN} 6.1.8 Ordering the Output 6.1.9 Exercises for Section 6.1 6.2 Queries Involving More Than One Relation 6.2.1 Products and Joins in SQL 6.2.2 Disambiguating Attributes 6.2.3 Tuple Variables 6.2.4 Interpreting Multirelation Queries 6.2.5 Union, Intersection, and Difference of Queries 6.2.6 Exercises for Section 6.2 6.3 Subqueries 6.3.1 Subqueries that Produce Scalar Values 6.3.2 Conditions Involving Relations 6.3.3 Conditions Involving Tuples 6.3.4 Correlated Subqueries 6.3.5 Subqueries in {\tt FROM}\ Clauses 6.3.6 SQL Join Expressions 6.3.7 Natural Joins 6.3.8 Outerjoins 6.3.9 Exercises for Section 6.3 6.4 Full-Relation Operations 6.4.1 Eliminating Duplicates 6.4.2 Duplicates in Unions, Intersections, and Differences 6.4.3 Grouping and Aggregation in SQL 6.4.4 Aggregation Operators 6.4.5 Grouping 6.4.6 Grouping, Aggregation, and Nulls 6.4.7 {\tt HAVING} Clauses 6.4.8 Exercises for Section 6.4 6.5 Database Modifications 6.5.1 Insertion 6.5.2 Deletion 6.5.3 Updates 6.5.4 Exercises for Section 6.5 6.6 Transactions in SQL 6.6.1 Serializability 6.6.2 Atomicity 6.6.3 Transactions 6.6.4 Read-Only Transactions 6.6.5 Dirty Reads 6.6.6 Other Isolation Levels 6.6.7 Exercises for Section 6.6 6.7 Summary of Chapter 6 6.8 References for Chapter 6 7 Constraints and Triggers 7.1 Keys and Foreign Keys 7.1.1 Declaring Foreign-Key Constraints 7.1.2 Maintaining Referential Integrity 7.1.3 Deferred Checking of Constraints 7.1.4 Exercises for Section 7.1 7.2 Constraints on Attributes and Tuples 7.2.1 Not-Null Constraints 7.2.2 Attribute-Based {\tt CHECK} Constraints 7.2.3 Tuple-Based {\tt CHECK} Constraints 7.2.4 Comparison of Tuple- and Attribute-Based Constraints 7.2.5 Exercises for Section 7.2 7.3 Modification of Constraints 7.3.1 Giving Names to Constraints 7.3.2 Altering Constraints on Tables 7.3.3 Exercises for Section 7.3 7.4 Assertions 7.4.1 Creating Assertions 7.4.2 Using Assertions 7.4.3 Exercises for Section 7.4 7.5 Triggers 7.5.1 Triggers in SQL 7.5.2 The Options for Trigger Design 7.5.3 Exercises for Section 7.5 7.6 Summary of Chapter 7 7.7 References for Chapter 7 8 Views and Indexes 8.1 Virtual Views 8.1.1 Declaring Views 8.1.2 Querying Views 8.1.3 Renaming Attributes 8.1.4 Exercises for Section 8.1 8.2 Modifying Views 8.2.1 View Removal 8.2.2 Updatable Views 8.2.3 Instead-Of Triggers on Views 8.2.4 Exercises for Section 8.2 8.3 Indexes in SQL 8.3.1 Motivation for Indexes 8.3.2 Declaring Indexes 8.3.3 Exercises for Section 8.3 8.4 Selection of Indexes 8.4.1 A Simple Cost Model 8.4.2 Some Useful Indexes 8.4.3 Calculating the Best Indexes to Create 8.4.4 Automatic Selection of Indexes to Create 8.4.5 Exercises for Section 8.4 8.5 Materialized Views 8.5.1 Maintaining a Materialized View 8.5.2 Periodic Maintenance of Materialized Views 8.5.3 Rewriting Queries to Use Materialized Views 8.5.4 Automatic Creation of Materialized Views 8.5.5 Exercises for Section 8.5 8.6 Summary of Chapter 8 8.7 References for Chapter 8 9 SQL in a Server Environment 9.1 The Three-Tier Architecture 9.1.1 The Web-Server Tier 9.1.2 The Application Tier 9.1.3 The Database Tier 9.2 The SQL Environment 9.2.1 Environments 9.2.2 Schemas 9.2.3 Catalogs 9.2.4 Clients and Servers in the SQL Environment 9.2.5 Connections 9.2.6 Sessions 9.2.7 Modules 9.3 The SQL/Host-Language Interface 9.3.1 The Impedance Mismatch Problem 9.3.2 Connecting SQL to the Host Language 9.3.3 The {\tt DECLARE} Section 9.3.4 Using Shared Variables 9.3.5 Single-Row Select Statements 9.3.6 Cursors 9.3.7 Modifications by Cursor 9.3.8 Protecting Against Concurrent Updates 9.3.9 Dynamic SQL 9.3.10 Exercises for Section 9.3 9.4 Stored Procedures 9.4.1 Creating PSM Functions and Procedures 9.4.2 Some Simple Statement Forms in PSM 9.4.3 Branching Statements 9.4.4 Queries in PSM 9.4.5 Loops in PSM 9.4.6 For-Loops 9.4.7 Exceptions in PSM 9.4.8 Using PSM Functions and Procedures 9.4.9 Exercises for Section 9.4 9.5 Using a Call-Level Interface 9.5.1 Introduction to SQL/CLI 9.5.2 Processing Statements 9.5.3 Fetching Data From a Query Result 9.5.4 Passing Parameters to Queries 9.5.5 Exercises for Section 9.5 9.6 JDBC 9.6.1 Introduction to JDBC 9.6.2 Creating Statements in JDBC 9.6.3 Cursor Operations in JDBC 9.6.4 Parameter Passing 9.6.5 Exercises for Section 9.6 9.7 PHP 9.7.1 PHP Basics 9.7.2 Arrays 9.7.3 The PEAR DB Library 9.7.4 Creating a Database Connection Using DB 9.7.5 Executing SQL Statements 9.7.6 Cursor Operations in PHP 9.7.7 Dynamic SQL in PHP 9.7.8 Exercises for Section 9.7 9.8 Summary of Chapter 9 9.9 References for Chapter 9 10 Advanced Topics in Relational Databases 10.1 Security and User Authorization in SQL 10.1.1 Privileges 10.1.2 Creating Privileges 10.1.3 The Privilege-Checking Process 10.1.4 Granting Privileges 10.1.5 Grant Diagrams 10.1.6 Revoking Privileges 10.1.7 Exercises for Section 10.1 10.2 Recursion in SQL 10.2.1 Defining Recursive Relations in SQL 10.2.2 Problematic Expressions in Recursive SQL 10.2.3 Exercises for Section 10.2 10.3 The Object-Relational Model 10.3.1 From Relations to Object-Relations 10.3.2 Nested Relations 10.3.3 References 10.3.4 Object-Oriented Versus Object-Relational 10.3.5 Exercises for Section 10.3 10.4 User-Defined Types in SQL 10.4.1 Defining Types in SQL 10.4.2 Method Declarations in UDT's 10.4.3 Method Definitions 10.4.4 Declaring Relations with a UDT 10.4.5 References 10.4.6 Creating Object ID's for Tables 10.4.7 Exercises for Section 10.4 10.5 Operations on Object-Relational Data 10.5.1 Following References 10.5.2 Accessing Components of Tuples with a UDT 10.5.3 Generator and Mutator Functions 10.5.4 Ordering Relationships on UDT's 10.5.5 Exercises for Section 10.5 10.6 On-Line Analytic Processing 10.6.1 OLAP and Data Warehouses 10.6.2 OLAP Applications 10.6.3 A Multidimensional View of OLAP Data 10.6.4 Star Schemas 10.6.5 Slicing and Dicing 10.6.6 Exercises for Section 10.6 10.7 Data Cubes 10.7.1 The Cube Operator 10.7.2 The Cube Operator in SQL 10.7.3 Exercises for Section 10.7 10.8 Summary of Chapter 10 10.9 References for Chapter 10 PART III: Modeling and Programming for Semistructured Data 11 The Semistructured-Data Model 11.1 Semistructured Data 11.1.1 Motivation for the Semistructured-Data Model 11.1.2 Semistructured Data Representation 11.1.3 Information Integration Via Semistructured Data 11.1.4 Exercises for Section 11.1 11.2 XML 11.2.1 Semantic Tags 11.2.2 XML With and Without a Schema 11.2.3 Well-Formed XML 11.2.4 Attributes 11.2.5 Attributes That Connect Elements 11.2.6 Namespaces 11.2.7 XML and Databases 11.2.8 Exercises for Section 11.2 11.3 Document Type Definitions 11.3.1 The Form of a DTD 11.3.2 Using a DTD 11.3.3 Attribute Lists 11.3.4 Identifiers and References 11.3.5 Exercises for Section 11.3 11.4 XML Schema 11.4.1 The Form of an XML Schema 11.4.2 Elements 11.4.3 Complex Types 11.4.4 Attributes 11.4.5 Restricted Simple Types 11.4.6 Keys in XML Schema 11.4.7 Foreign Keys in XML Schema 11.4.8 Exercises for Section 11.4 11.5 Summary of Chapter 11 11.6 References for Chapter 11 12 Programming Languages for XML 12.1 XPath 12.1.1 The XPath Data Model 12.1.2 Document Nodes 12.1.3 Path Expressions 12.1.4 Relative Path Expressions 12.1.5 Attributes in Path Expressions 12.1.6 Axes 12.1.7 Context of Expressions 12.1.8 Wildcards 12.1.9 Conditions in Path Expressions 12.1.10 Exercises for Section 12.1 12.2 XQuery 12.2.1 XQuery Basics 12.2.2 FLWR Expressions 12.2.3 Replacement of Variables by Their Values 12.2.4 Joins in XQuery 12.2.5 XQuery Comparison Operators 12.2.6 Elimination of Duplicates 12.2.7 Quantification in XQuery 12.2.8 Aggregations 12.2.9 Branching in XQuery Expressions 12.2.10 Ordering the Result of a Query 12.2.11 Exercises for Section 12.2 12.3 Extensible Stylesheet Language 12.3.1 XSLT Basics 12.3.2 Templates 12.3.3 Obtaining Values From XML Data 12.3.4 Recursive Use of Templates 12.3.5 Iteration in XSLT 12.3.6 Conditionals in XSLT 12.3.7 Exercises for Section 12.3 12.4 Summary of Chapter 12 12.5 References for Chapter 12