CS-145: Introduction to Databases

Course Information

Announcements

12/14/2002 Finals have been graded. A copy of the final is available in Postscript or PDF. Also, here are the Solutions. Statistics for the midterm and final are available here.
12/8/2002 PDA #6 has been graded. Everybody who submitted PDA 6 should have received an email with their grade. If you did not get your grade, send the TA's an email.
12/2/2002 Final Exam will be in Room B01, Gates Bldg on Thursday Dec. 12, 12:15-3:15PM. The exam room is next door to the regular classroom). It will cover everything up to but not including dead week. It will be written, not OTC, and all local TV students will have to come to Stanford. Remote TV students will have the exam delivered to you and it should be taken approximately at the same time as the class takes it. For bugs or questions, call 650.906.5142 (This number is only available from 12.15-3.15 PST on 12/12). There will be no early exam, and no make-up exam except that people with a documented 3 finals in 24 hours may request a 24-hour postponement of the CS145 exam. The exam is open book/notes, but no on-line access or computer use is permitted. Suggestion: Before the final, take a look at the Instructions and familiarize yourself with the ground rules.

Time and Place

Tuesdays and Thursdays, 9:30 - 10:45AM, in B03 Gates.

Course Goals

CS145 is an introduction to the design and use of database systems --- systems that manage very large amounts of data. There are two important approaches to organizing and querying (asking questions about) data: the "relational model," which uses a two-dimensional table (relation) as its primary structure, and the "semistructured model," which uses trees as its fundamental structure. The relational model underlies the major commercial database systems. We cover relational design using the entity-relationship model, followed by an overview of the relational model, how to convert E/R models to relations, and how one uses a relational database system to create a database. SQL (Structured Query Language), the standard query language for relational databases, will be learned and experienced.

We shall also learn some other database languages, both concrete and abstract, including relational algebra, Datalog, ODL/OQL PSM (really Oracle's procedural PL/SQL), and JDBC (the Java interface to SQL databases). In addition, we study recent object-oriented influences on the relational model, including the object-oriented database standard ODL/OQL. The semistructured model is newer, but beginning to have significant influence, especially as people try to integrate data and share data over the Web. We shall learn XML, the standard for structuring data as trees. We also shall meet XPATH, a rudimentary query language for XML data, and XQUERY, a new, more SQL-like query language for XML. It is not our goal to study database system implementation (e.g., how to build a system that processes SQL queries efficiently). Study of that very important subject begins in CS245.

Prerequisites

CS107 (programming languages) and CS103 (introductory CS theory) are expected. Please discuss the matter with the instructor if you do not have something like these courses.

Programming assignments will use the Oracle relational database management system and the C or C++ programming language. Java is an alternative. The Oracle system can be accessed via any of the Unix workstations on the second floor of Sweet Hall, e.g., the ``elaines'' or ``epics.'' To open an account on these machines, type open at the login: prompt and follow the instructions.

We shall assume that students already are proficient with Unix and C.

SITN students can access the Unix workstations remotely via dial-in (try 650-325-1010) or telnet. If you have access to an Oracle-9 system including PL/SQL and Pro*C, you may use that. We have to be sticky about what system you use not because we love Oracle, but because we are going to be exploring some very specific capabilities of this system, and it will present problems for you and us both, if you do not have all these features. We cannot make any exceptions for problems incurred by using your own computing facilities rather than those provided by Stanford.

Everyone must have a leland account in order to use the class Oracle database system for the PDA. To obtain a leland ID, telnet to open.stanford.edu and use login name open. If you are an SITN student but do not yet have a Stanford ID, you need to talk to your SITN contact and get one before trying to open a leland account.

Changes from Previous Years

There will be several changes from the usual way CS145 has been offered in the fall.

  1. Jeff Ullman, the usual instructor, is being joined by Anand Rajaraman. Anand was a founder of Junglee, and currently works at Cambrian Ventures. He is a former student of Jeff's.

  2. We will use a set of slides with voiceover that Jeff Ullman has been developing. These slides will let us go fairly quickly through the basic material, leaving us much more time for class questions and some group exercises than in the past.

  3. We have also been developing a system called "on-line testing center" (OTC). It gives you a Web interface to homeworks and tests. This work will include some multiple-choice questions and some SQL programming, all of which is graded automatically. There are safeguards to discourage collaboration, but of course the honor code prevents your doing so anyway.

One thing that will not change is the individual project ("personal database application" or PDA). This project will be done using the Stanford Oracle installation, not the OTC (which also accepts queries in the Oracle version of SQL, so you don't have to learn two different dialects). Some work for this project will be due each week, starting with the beginning of the third week of class, and is distinct from the OTC work.

Course Requirements

1. Project

A feature (or bug?) of CS145 is that everyone writes their own "personal database application" (PDA). You do some work on the project each week, beginning with selecting your application, designing the database, obtaining and loading your data into a real database management system, and finally writing a number of SQL queries, C programs with embedded SQL queries, and exercising other features of SQL.

The first PDA assignment will be due Thursday, Oct. 10, but must be preceded by a review of your design by one of the course staff. Subsequent parts will generally be due on Thursdays, with the exception of Thanksgiving.

No late work will be accepted. However, each student is allowed one extension of at most 48 hours. This amount of time cannot be divided among assignments; it applies to one assignment only.

2. OTC Homework

We are going to use the On-Line Testing Center (OTC) to give periodic assignments. These will be either smultiple-choice questions to answer, or later in the course, SQL queries to write. You will be given a week to log in and do each assignment. We may break weekly assignments into small pieces, so bugs in the system do not wipe out a lot of work.

At least at first, we shall give people as many chances as they like to get a perfect score. You should try to study the material for the question(s) you got wrong and take the assignment again. Note that you will probably get slightly different questions each time you take it.

In order to use OTC, you need to sign up for a user ID. We'll pass around sheets that have a selection of ID's and "tokens" (initial passwords). Pick an unused one, cross it out, and write your name clearly next to it. (If you are concerned about privacy, use a nickname that you can give us if we ever need to remind you of your ID.) You will be allowed to change your password, and should do that. However, we have not yet implemented a system that lets you log in as the ID of your choice, provided it is unique. That may come soon. If you miss signing up for an ID in class, please see Ms. Weden in 419 Gates.

To find out what assignments are due, and when, either log into OTC or check the Assignments Page.

3. Exams

Midterm: We shall also try to use the OTC for exams. Tentatively, the midterm is scheduled for Tuesday, Nov. 5, 2002. You can take the exam from wherever you wish, but it must be during the class period. Final: Our exam time is Thursday, Dec. 12, 12:15--3:15PM. We may or may not be using OTC. If not, all students will have to come to campus, with the exception of remote SITN students, i.e., those whose place of work is more than about 50 miles away (Livermore is "local"; Santa Rosa is "remote," e.g.).

Grading Policy

The approximate weights of the four components are:

ComponentWeight
Project35%
OTC Homework15%
Midterm15%
Final35%

Honor-Code Policy

The basic presumption is that the work you do is your own. Occasionally, especially when working problem sets or writing programs (but never on exams!), it may be necessary to ask someone for help. You are permitted to do so, provided you meet the following two conditions.
  1. You acknowledge the help on the work you hand in.

  2. You understand the work that you hand in, so that you could explain the reasoning behind the parts of the work done for you by another.

Any other assistance by another person constitutes a violation of the honor code and will be treated as such.

If you have any questions about what this policy means, please discuss the matter with the instructor now. We shall ask everyone to acknowledge that they have read the above material on the first homework.