CSC302 2006S, Class 29: SQL(1): The Relational Model Admin: * Neither grading nor answer keys are available. Sorry. * Are there questions on the exam? * The reading for Friday is the preliminary paper on SEQUEL. * Late: DAVE Overview: * Context. * Codd's Contributions. * Why Read This Paper? * Thinking About Relations. * Operations on Relations. What was the state of Data bases (Data banks) at the time Codd's paper was written? * Not Abstract: The programs that uses databases relied upon particular organizations of data * Issues of ordering/indices/paths and independence * Indices were useful for fast access, but programs relied on knowledge of the index. * Data are not in memory, they are in files (disk or tape). * Two primary models of databases: * Hierarchical model (traditional) * Each "file" contains records Each record may contain other records * Do records have to be uniform size? * If so, easy to access the nth record * If not, harder. How? Linear search. Array of indices into the file. * Network model (CODASYL, 1969) - Records that belong to other records can be shared * Non-abstraction and hierchical/network models support faster accessa * No real DBMS's. * Generally informal descriptions. * "Database" not yet a widely accepted term. Codd's (apparent) goals: * Abstraction: Client code need not rely on particular implementation details * A way to formalize things. (E.g, use of pi and the formal descriptions of the relation operators.) * A model that persists to today. (Goal was "better model") * Criticize existing models Why read this paper? * Background for learning about data banks (now databases) * Language design goals * SQL is a declarative language. Is Codd defining a declarative language? * Designated "Classic" Paper * See original statements rather than interpretations * Treatment of relations is related to type theory Model data as relations * A set of n-tuples, where each element of the n-tuple is a member of the corresponding domain * A relation is a table in a database * A subset of of D1 x D2 x D3 x D4 x ... x Dn Consider a Names/Grades table Names = { Dave, Davis, Dimitar, Dalex } Grades = { A, B, C, D, F } Names x Grades = { (Dave,A) (Dave,B) (Dave,C) (Dave,D) (Dave,F (Davis,A), ... (Dalex,F) } CurrentGradesIn302 = { (Dave,A), (Davis,A), (Dimitar,F), (Dalex,D) } * A relation can be thought of us a predicate (a function that returns true or false). T(tuple) = true iff tuple in T. * A relation can be thought of as a multi-valued function. If you provide some of the columns, it returns a tuple of the rest of the columns CurrentGradesIn302(Name=Dave) => A CurrentGradesIn302(Grade=A) => Dave,Davis CurrentGradesIn302(Grade=C) => NOT IN DOMAIN * Treatment as function is like "where" clauses Model data as relationships * A relationship is like a relation, except * The elements of the elements of a relation are ordered by the order of the domains * The elements of the elements of a relationship are ordered by the "name" of the domain. * Why? * More abstract! * Problem for Codd: What if two of the domains are "the same"? Classmates is a subset of StudentxStudent TempOnDay is a subset of IntxInt (the first int represents the temp, the second the day) * Hack solution: Name the domains * In SQL: Name and Type the columns * Almost every relational language uses relations rather than relationships Operations on Relations * Project - (pi) SELECT in SQL * What does pi_1,2 do to the relation { (Dave,Ventresca,A), (Davis,Hart,A) (Davis,Hart,B) } => { (Dave,Ventresca), (Davis,Hart) } * Select the specified columns and remove duplicate entries. (Well, it's a set) * Does project also permute? * Certainly an be used to permute, pi_3,2,1 permutes * Join * Compose