CSC302 2006S, Class 29: SQL(1): The Relational Model

Admin:
* Neither grading nor answer keys are available.  Sorry.
* Are there questions on the exam?
* The reading for Friday is the preliminary paper on SEQUEL.
* Late: DAVE
Overview:
* Context.
* Codd's Contributions.
* Why Read This Paper?
* Thinking About Relations.
* Operations on Relations.

What was the state of Data bases (Data banks) at the time Codd's paper was written?

* Not Abstract: The programs that uses databases relied upon particular organizations of data
  * Issues of ordering/indices/paths and independence
  * Indices were useful for fast access, but programs relied on knowledge of the index.
* Data are not in memory, they are in files (disk or tape).
* Two primary models of databases:
  * Hierarchical model (traditional)
    * Each "file" contains records
      Each record may contain other records
      * Do records have to be uniform size?
      * If so, easy to access the nth record
      * If not, harder.  How?  Linear search.  Array of indices into the file.
  * Network model (CODASYL, 1969) - Records that belong to other records can be shared
* Non-abstraction and hierchical/network models support faster accessa
* No real DBMS's.
* Generally informal descriptions.
* "Database" not yet a widely accepted term.

Codd's (apparent) goals:

* Abstraction: Client code need not rely on particular implementation details
* A way to formalize things.  (E.g, use of pi and the formal descriptions of the relation operators.)
* A model that persists to today.  (Goal was "better model")
* Criticize existing models

Why read this paper?
* Background for learning about data banks (now databases)
  * Language design goals
* SQL is a declarative language.  Is Codd defining a declarative language?
* Designated "Classic" Paper
  * See original statements rather than interpretations
* Treatment of relations is related to type theory

Model data as relations 
* A set of n-tuples, where each element of the n-tuple is a member of the corresponding domain
  * A relation is a table in a database
  * A subset of of D1 x D2 x D3 x D4 x ... x Dn
    Consider a Names/Grades table
      Names = { Dave, Davis, Dimitar, Dalex }
      Grades = { A, B, C, D, F }
      Names x Grades = { (Dave,A) (Dave,B) (Dave,C) (Dave,D) (Dave,F (Davis,A), ... (Dalex,F) }
      CurrentGradesIn302 = { (Dave,A), (Davis,A), (Dimitar,F), (Dalex,D) }
* A relation can be thought of us a predicate (a function that returns true or false).  T(tuple) = true iff tuple in T.
* A relation can be thought of as a multi-valued function.
  If you provide some of the columns, it returns a tuple of the rest of the columns
    CurrentGradesIn302(Name=Dave) => A
    CurrentGradesIn302(Grade=A) => Dave,Davis
    CurrentGradesIn302(Grade=C) => NOT IN DOMAIN
    * Treatment as function is like "where" clauses

Model data as relationships
* A relationship is like a relation, except 
  * The elements of the elements of a relation are ordered by the order of the domains
  * The elements of the elements of a relationship are ordered by the "name" of the domain.
* Why?
  * More abstract!
* Problem for Codd: What if two of the domains are "the same"?
  Classmates is a subset of StudentxStudent
  TempOnDay is a subset of IntxInt (the first int represents the temp, the second the day)
  * Hack solution: Name the domains
  * In SQL: Name and Type the columns
* Almost every relational language uses relations rather than relationships

Operations on Relations
* Project - (pi) SELECT in SQL
  * What does pi_1,2 do to the relation { (Dave,Ventresca,A), (Davis,Hart,A) (Davis,Hart,B) } => { (Dave,Ventresca), (Davis,Hart) }
  * Select the specified columns and remove duplicate entries. (Well, it's a set)
  * Does project also permute?
    * Certainly an be used to permute, pi_3,2,1 permutes
* Join
* Compose