CSC302 2006S, Class 30: SEQUEL Admin: * Questions on the exam? * Broken record: Time for serious grading seems to be an impossibility. * Reading for Monday: Kowalski = Algorithm = Logic + Control. Famous paper : Program = Algorithms + Data Structures * Math/CS Pub Night next Thursday at 10 p.m. * Missing: Dave, Davis Overview: * Combining relations. * Relations as programming language. * Describing a language. * General SEQUEL and SQUARE issues. * What's missing from SEQUEL? * Other issues Back to Codd * Context for current discussion: Think about databases as collections of relations * Data types: Core domains (not specified) (int,string in SEQUEL) (int,string,real,date,(variants) in SQL) plus relations (set of records) * In designing a language (and a formalism), we must consider the core operations on those data types * ints: arithmetic (+, -, etc.), relational operators (<, >, =, !=, etc.) * strings: equality * Most interesting question: Relations! * What are the operations we do with relations? * Project: Select and order columns/domains * Natural Join: Put two sets together. The result set contains values that *combine* the values from the first set and the second set by taking (in effect) the cross product. Example A = { , , } subset STRING x INT B = { } subset STRING x STRING A naturaljoin B = { , , , ... , } * General join: Some subset of the natural join. * Why naturaljoin two relations? * Two tables with similar data, make 'em into one. * Typically used in addition to selection Relation1 = StudentId x Name Relation2 = StudentId x ClassId Relation1 join Relation2 = something silly Then select those whose two StudentId columns are equal And project out the duplicate student id * cyclic joins * Way to combine three (binary) relations * Look for combinations of (x,y) in A, (y,z) in B, (z,x) in C * Compose relations: * Three ways to think of relations: * set of tuples * predicates * multi-valued functions * Composition is a natural operation on functions (f o g) x = f (g x) * Restriction * What I call selection * Plus the obvious union, intersection, difference Relations as programming language * Define relations * Apply relation operations * Apply basic operations to columns Behind the scenes: * How do we represent a relation? * How do we implement the basic operations? * Representing relations: Combine arrays and pointers * Each tuple as an object or record * Simple strategy: Relation is an array of tuples CSC152 = new Vector(); CSC152.add(new Student("Dave", "Ventresca", 10); CSC152.add(new Student("Dimitar", "Tasev", 2); CSC152.add(new Student("Michael", "Claveria", -5); * Select by for loop O(n) * Join by nested for loops: O(n x m) * ANother strategy: Hash table, hashed by (something) (primary key) ... * Another strategy: Binary search tree * Join might be painful * But can delay the join, too * BUT ... you don't have to worry about it! * Really: Multiple solutions On to SEQUEL * The IBM database research team wanted to turn Codd's ideas into languages * SQUARE: One language for expressing operations on relations * What was proven about SQUARE (and about relational algebra in general)? * Equivalent to the first-order predicate calculus with quantification. * calculus: formal system * predicates: functions that have boolean results * first order: (contrast to higher-order) - predicates cannot be applied to other predicates * Therefore a general model of computation. Claim from paper (approximately): "Declarative languages are easier and more efficient to program in" * Why do we still program in other ways?