CSC302 2006S, Class 30: SEQUEL
Admin:
* Questions on the exam?
* Broken record: Time for serious grading seems to be an impossibility.
* Reading for Monday: Kowalski = Algorithm = Logic + Control.
Famous paper : Program = Algorithms + Data Structures
* Math/CS Pub Night next Thursday at 10 p.m.
* Missing: Dave, Davis
Overview:
* Combining relations.
* Relations as programming language.
* Describing a language.
* General SEQUEL and SQUARE issues.
* What's missing from SEQUEL?
* Other issues
Back to Codd
* Context for current discussion: Think about databases as collections of relations
* Data types: Core domains (not specified) (int,string in SEQUEL)
(int,string,real,date,(variants) in SQL) plus relations (set of records)
* In designing a language (and a formalism), we must consider the core operations on those data types
* ints: arithmetic (+, -, etc.), relational operators (<, >, =, !=, etc.)
* strings: equality
* Most interesting question: Relations!
* What are the operations we do with relations?
* Project: Select and order columns/domains
* Natural Join: Put two sets together. The result set contains values that *combine* the values from the first set and the second set by taking (in effect) the cross product.
Example
A = { , **, } subset STRING x INT
B = { ** } subset STRING x STRING
A naturaljoin B = { , , , ** ... , }
* General join: Some subset of the natural join.
* Why naturaljoin two relations?
* Two tables with similar data, make 'em into one.
* Typically used in addition to selection
Relation1 = StudentId x Name
Relation2 = StudentId x ClassId
Relation1 join Relation2 = something silly
Then select those whose two StudentId columns are equal
And project out the duplicate student id
* cyclic joins
* Way to combine three (binary) relations
* Look for combinations of (x,y) in A, (y,z) in B, (z,x) in C
* Compose relations:
* Three ways to think of relations:
* set of tuples
* predicates
* multi-valued functions
* Composition is a natural operation on functions
(f o g) x = f (g x)
* Restriction
* What I call selection
* Plus the obvious union, intersection, difference
Relations as programming language
* Define relations
* Apply relation operations
* Apply basic operations to columns
Behind the scenes:
* How do we represent a relation?
* How do we implement the basic operations?
* Representing relations: Combine arrays and pointers
* Each tuple as an object or record
* Simple strategy: Relation is an array of tuples
CSC152 = new Vector();
CSC152.add(new Student("Dave", "Ventresca", 10);
CSC152.add(new Student("Dimitar", "Tasev", 2);
CSC152.add(new Student("Michael", "Claveria", -5);
* Select by for loop O(n)
* Join by nested for loops: O(n x m)
* ANother strategy: Hash table, hashed by (something) (primary key) ...
* Another strategy: Binary search tree
* Join might be painful
* But can delay the join, too
* BUT ... you don't have to worry about it!
* Really: Multiple solutions
On to SEQUEL
* The IBM database research team wanted to turn Codd's ideas into languages
* SQUARE: One language for expressing operations on relations
* What was proven about SQUARE (and about relational algebra in general)?
* Equivalent to the first-order predicate calculus with quantification.
* calculus: formal system
* predicates: functions that have boolean results
* first order: (contrast to higher-order) - predicates cannot be applied to other predicates
* Therefore a general model of computation.
Claim from paper (approximately): "Declarative languages are easier and more efficient to program in"
* Why do we still program in other ways?
**