CSC302 2006S, Class 30: SEQUEL

Admin:
* Questions on the exam?
* Broken record: Time for serious grading seems to be an impossibility.
* Reading for Monday: Kowalski = Algorithm = Logic + Control.
  Famous paper : Program = Algorithms + Data Structures
* Math/CS Pub Night next Thursday at 10 p.m.
* Missing: Dave, Davis

Overview:
* Combining relations.
* Relations as programming language.
* Describing a language.
* General SEQUEL and SQUARE issues.
* What's missing from SEQUEL?
* Other issues

Back to Codd
* Context for current discussion: Think about databases as collections of relations
  * Data types: Core domains (not specified) (int,string in SEQUEL)
    (int,string,real,date,(variants)   in SQL) plus relations (set of records)
* In designing a language (and a formalism), we must consider the core operations on those data types
  * ints: arithmetic (+, -, etc.), relational operators (<, >, =, !=, etc.)
  * strings: equality
  * Most interesting question: Relations!
* What are the operations we do with relations?
  * Project: Select and order columns/domains
  * Natural Join: Put two sets together.  The result set contains values that *combine* the values from the first set and the second set by taking (in effect) the cross product.
    Example
      A = { <a,10>, <b,20>, <c,15> } subset STRING x INT
      B = { <x,y> <q,r> <a,b> } subset STRING x STRING
      A naturaljoin B = { <a,10,x,y>, <a,10,q,r>, <a,10,a,b>, <b,20,x,y> ... , <c,15,a,b> }
  * General join: Some subset of the natural join.
  * Why naturaljoin two relations?
    * Two tables with similar data, make 'em into one.
    * Typically used in addition to selection
      Relation1 = StudentId x Name
      Relation2 = StudentId x ClassId
      Relation1 join Relation2 = something silly
      Then select those whose two StudentId columns are equal
      And project out the duplicate student id
  * cyclic joins
    * Way to combine three (binary) relations
      * Look for combinations of (x,y) in A, (y,z) in B, (z,x) in C
  * Compose relations:
    * Three ways to think of relations: 
      * set of tuples
      * predicates
      * multi-valued functions
    * Composition is a natural operation on functions
      (f o g) x = f (g x)
  * Restriction
    * What I call selection
  * Plus the obvious union, intersection, difference

Relations as programming language
* Define relations
* Apply relation operations
* Apply basic operations to columns

Behind the scenes:
* How do we represent a relation?
* How do we implement the basic operations?
* Representing relations: Combine arrays and pointers
  * Each tuple as an object or record
  * Simple strategy: Relation is an array of tuples
CSC152 = new Vector();
CSC152.add(new Student("Dave", "Ventresca", 10);
CSC152.add(new Student("Dimitar", "Tasev", 2);
CSC152.add(new Student("Michael", "Claveria", -5);
    * Select by for loop O(n)
    * Join by nested for loops: O(n x m)
  * ANother strategy: Hash table, hashed by (something) (primary key) ...
  * Another strategy: Binary search tree
    * Join might be painful
    * But can delay the join, too
  * BUT ... you don't have to worry about it!
  * Really: Multiple solutions

On to SEQUEL
* The IBM database research team wanted to turn Codd's ideas into languages
* SQUARE: One language for expressing operations on relations
* What was proven about SQUARE (and about relational algebra in general)?
  * Equivalent to the first-order predicate calculus with quantification.
    * calculus: formal system
    * predicates: functions that have boolean results
    * first order:  (contrast to higher-order) - predicates cannot be applied to other predicates
  * Therefore a general model of computation.
Claim from paper (approximately): "Declarative languages are easier and more efficient to program in"
* Why do we still program in other ways?