DNAunion2000
11-07-2003, 10:42 PM
/*DNAunion*/ Some "random" thoughts I've been gathering and refining on relational databases.
Bit – A single binary digit: a 1 or a 0. The smallest unit of memory/data.
Byte – A group of 8 contiguous bits considered as a unit. Oftentimes represents a single alphanumeric character.
Domain – The set of allowed atomic (i.e., indivisible) values for a given piece of information (such as age or phone number); a domain consists of a data type (such as integer or character)
and may also include a constraining range for that data type (an integer between 18 and 65, or a character string of no more than 10 characters, for example).
Attribute – One or more bytes that together represent a single value, from the appropriate domain, that stores a single fact about a person, place, thing, or event. Attributes actually consist
of an identifying name and a domain: for example, address:varchar(25).
Tuple – A set of attributes that pertains to a single person, place, thing, or event. Technically, a tuple consists of a set of attribute:value pairs. And since each attribute consists of a
name:domain pair (such as address:varchar(25)), a tuple actually consists of a set of name:domain:value triplets. For example, when logically arranged horizontally, the following items would represent a single tuple:
address:varchar(30):”3145 Main Street”,
city:varchar(25):”Little Town”,
state:char(2):”IA”,
zipcode:char(10):”12345-6789”
Relation – A set tuples (all with the same attributes) that stores data about a certain category of people, places, things, or events.
Relational Database - A set of related relations that are linked by interlocking keys that allow values to be cross referenced for the maintenance of the data’s integrity.
RDBMS (Relational Database Management System) – A software system that allows users to define, create, manipulate, access, and control a relational database.
Differences between a relation (a formal, logical concept) and a table or file:
1. Because a relation is a set of tuples:
a. duplicate tuples aren’t permitted (duplicates make a multiset or bag of tuples)
b. the tuples have no particular order
2. Because a tuple is a set of attributes:
a. duplicate attributes aren’t permitted
b. the attributes have no particular order
3. Each value in a tuple must be atomic: neither multi-value attributes nor composite attributes are permitted.
Keys
Superkey - Any set of attributes that can uniquely identify each tuple in a relation. Since each tuple in a relation must be unique (a constraint know as entity integrity), each relation has
at least one superkey: the set of all its attributes.
Candidate Key – An irreducible superkey. That is, a superkey for which removing any attribute from the set would lead to loss of the superkey property.
(1) A candidate key is an attribute (or minimum set of two or more attributes) whose value uniquely and unambiguously identifies each tuple in a relation.
(2) A candidate key is an attribute (or minimum set of two or more attributes) whose value functionally determines each of the other columns in a relation.
(3) Attribute A (or a collection of two or more attributes) is a candidate key for a relation, R, if:
(a) Each of the other attributes in R is functionally dependent on A.
AND
(b) No proper subset of the attributes in A also has property (a)
NOTE: A relation can have multiple candidate keys. Basically, each of these keys is a “candidate” for being promoted to the relation’s single primary key.
Primary Key – The single candidate key (irreducible superkey) that is selected to serve as the relation’s primary key.
NonKey Attribute - An attribute that is not a part of the relation’s primary key.
Alternate Key – Another name for candidate key.
Foreign Key - A foreign key is an attribute (or set of two or more attributes), A1, in one relation, R1, whose value in each tuple, T1, is required to match the value of the primary key (or a candidate key) for some tuple, T2, in a referenced relation, R2 (alternatively, the foreign key’s value for T1 can be NULL, if allowed). The attribute names from the two relations do not have to match, but the attributes' data types must.
Concatenated Key – Any key composed of more than one attribute.
Integrity Rules
Entity Integrity - Entity integrity is the rule that states that no column that participates in the primary key may accept NULL values. This ensures that each tuple can be uniquely identified.
Referential Integrity - Referential integrity is the rule that states that if relation A contains a foreign key that references a key of relation B, then values used for the foreign key in relation A
must either match the value of the key for some tuple in relation B or else be NULL. Referential integrity is enforced and maintained by the use of foreign keys.
Domain Integrity – Domain integrity is the rule that states that the value assigned to each attribute of a tuple must be one of the atomic values in that attribute’s domain.
Functional Dependence
Functional Dependence - Attribute B is functionally dependent on attribute A (or possibly a set of two or more attributes) if a given value for A determines exactly one value for B at any one time. If attribute B is functionally dependent on A, it is represented as A --> B. It is not always possible to accurately determine functional dependencies by looking at tuples; the only way to definitively determine the functional dependencies that exist is to examine the business rules.
Determinant - A determinant is any attribute (or set of two or more attributes) that functionally determines another attribute. If A is a determinant of B, then it is represented as A --> B.
Therefore, the symbolic representation A --> B can be read as any of the following:
1) B is functionally dependent on A.
OR
2) A is a determinant of B
OR
3) A functionally determines B
Note that A-->B does not necessarily mean B-->A.
Think of an individual’s name and Social Security Number (SSN). A given Social Security Number has exactly 1 corresponding name. So SSN functionally determines name (which can also be expressed as either “SSN is a determinant of name” or as “name is functionally dependent on SSN”). However, since a particular name (such as John J. Jones) might be associated with more than one Social Security Number, SSN is not functionally dependent on name (i.e., name is not a determinant of SSN).
Index
Index – Typically a separate file (even if only a temporary one) that contains one or more collections of sorted “keys” for a given table (the “keys” – that is, the column values – in an index are not required to be unique). The “keys” of an index are arranged in an algorithmically efficient manner (binary trees, for example) that allows for extremely rapid searches. Furthermore, since a “key” is typically much smaller than an entire row, many
more keys than rows can be stored in memory.
NOTE: Formal relational terminology is not used when discussing indexes because indexes do not exist in the relational model (remember, a relation is a set of tuples and so there is no particular order to the elements of the set).
Advantages of indexes:
1) Indexes can greatly accelerate retrieval of rows from a table
2) Indexes can accelerate queries that join two or more tables
3) Indexes help enforce referential integrity (and maintain persistent relationships)
Disadvantages of indexes:
1) Indexes occupy space
2) Maintaining indexes (keeping trees balanced “on the fly”, for example) adds overhead.
Concerning disadvantage (2), if many indexes are being maintained for a table, performance hits will occur during insertions, deletions, and any updates that alter the value of one or more columns referenced by at least one of the indexes. But concerning advantage (1), maintaining many indexes on a table will reduce query response times. A balance has to be achieved based on the table’s typical usage. If it is queried frequently but rarely updated, many indexes can be maintained for it: in fact, one could even consider “inverting the table” (which means to create an index on every column). On the other hand, if the table is updated continually and is queried infrequently, then the number of indexes should be kept to a bare minimum.
Views
View – In SQL, a view is a virtual table (as opposed to a base table, which is a physical table that exists as an essential part of the database). Views can provide a measure of security by
logically omitting sensitive tables, records, or columns from a database as specified in the view's defining query. And since queries can be run against views just as they can be run
against base tables, views can also make frequently used or complex queries easier to code, and may even provide a performance benefit (some implementations actually
materialize the virtual table, as explained next).
A RDBMS can handle views in one of two ways: either (1) the view’s contents can be materialized – that is, physically stored (in which case the RDBMS is responsible for keeping its contents updated as changes are made to the underlying base tables), or (2) the view can be truly virtual, having no physical existence itself (in which case the defining query is run each time the view is referenced; or if an ad hoc query is being run against the view, the defining query and the ad hoc query are transparently merged into a single SQL command that is then executed). Regardless whether the RDBMS implements views as (1) or (2), a view is not a static snapshot of data; it’s contents always reflect the current data contained in the underlying base table(s).
There's more I have to add (I already have stuff on normalization, but it doesn't carry over well to discussion forums) such as relational algebra and SQL.
Bit – A single binary digit: a 1 or a 0. The smallest unit of memory/data.
Byte – A group of 8 contiguous bits considered as a unit. Oftentimes represents a single alphanumeric character.
Domain – The set of allowed atomic (i.e., indivisible) values for a given piece of information (such as age or phone number); a domain consists of a data type (such as integer or character)
and may also include a constraining range for that data type (an integer between 18 and 65, or a character string of no more than 10 characters, for example).
Attribute – One or more bytes that together represent a single value, from the appropriate domain, that stores a single fact about a person, place, thing, or event. Attributes actually consist
of an identifying name and a domain: for example, address:varchar(25).
Tuple – A set of attributes that pertains to a single person, place, thing, or event. Technically, a tuple consists of a set of attribute:value pairs. And since each attribute consists of a
name:domain pair (such as address:varchar(25)), a tuple actually consists of a set of name:domain:value triplets. For example, when logically arranged horizontally, the following items would represent a single tuple:
address:varchar(30):”3145 Main Street”,
city:varchar(25):”Little Town”,
state:char(2):”IA”,
zipcode:char(10):”12345-6789”
Relation – A set tuples (all with the same attributes) that stores data about a certain category of people, places, things, or events.
Relational Database - A set of related relations that are linked by interlocking keys that allow values to be cross referenced for the maintenance of the data’s integrity.
RDBMS (Relational Database Management System) – A software system that allows users to define, create, manipulate, access, and control a relational database.
Differences between a relation (a formal, logical concept) and a table or file:
1. Because a relation is a set of tuples:
a. duplicate tuples aren’t permitted (duplicates make a multiset or bag of tuples)
b. the tuples have no particular order
2. Because a tuple is a set of attributes:
a. duplicate attributes aren’t permitted
b. the attributes have no particular order
3. Each value in a tuple must be atomic: neither multi-value attributes nor composite attributes are permitted.
Keys
Superkey - Any set of attributes that can uniquely identify each tuple in a relation. Since each tuple in a relation must be unique (a constraint know as entity integrity), each relation has
at least one superkey: the set of all its attributes.
Candidate Key – An irreducible superkey. That is, a superkey for which removing any attribute from the set would lead to loss of the superkey property.
(1) A candidate key is an attribute (or minimum set of two or more attributes) whose value uniquely and unambiguously identifies each tuple in a relation.
(2) A candidate key is an attribute (or minimum set of two or more attributes) whose value functionally determines each of the other columns in a relation.
(3) Attribute A (or a collection of two or more attributes) is a candidate key for a relation, R, if:
(a) Each of the other attributes in R is functionally dependent on A.
AND
(b) No proper subset of the attributes in A also has property (a)
NOTE: A relation can have multiple candidate keys. Basically, each of these keys is a “candidate” for being promoted to the relation’s single primary key.
Primary Key – The single candidate key (irreducible superkey) that is selected to serve as the relation’s primary key.
NonKey Attribute - An attribute that is not a part of the relation’s primary key.
Alternate Key – Another name for candidate key.
Foreign Key - A foreign key is an attribute (or set of two or more attributes), A1, in one relation, R1, whose value in each tuple, T1, is required to match the value of the primary key (or a candidate key) for some tuple, T2, in a referenced relation, R2 (alternatively, the foreign key’s value for T1 can be NULL, if allowed). The attribute names from the two relations do not have to match, but the attributes' data types must.
Concatenated Key – Any key composed of more than one attribute.
Integrity Rules
Entity Integrity - Entity integrity is the rule that states that no column that participates in the primary key may accept NULL values. This ensures that each tuple can be uniquely identified.
Referential Integrity - Referential integrity is the rule that states that if relation A contains a foreign key that references a key of relation B, then values used for the foreign key in relation A
must either match the value of the key for some tuple in relation B or else be NULL. Referential integrity is enforced and maintained by the use of foreign keys.
Domain Integrity – Domain integrity is the rule that states that the value assigned to each attribute of a tuple must be one of the atomic values in that attribute’s domain.
Functional Dependence
Functional Dependence - Attribute B is functionally dependent on attribute A (or possibly a set of two or more attributes) if a given value for A determines exactly one value for B at any one time. If attribute B is functionally dependent on A, it is represented as A --> B. It is not always possible to accurately determine functional dependencies by looking at tuples; the only way to definitively determine the functional dependencies that exist is to examine the business rules.
Determinant - A determinant is any attribute (or set of two or more attributes) that functionally determines another attribute. If A is a determinant of B, then it is represented as A --> B.
Therefore, the symbolic representation A --> B can be read as any of the following:
1) B is functionally dependent on A.
OR
2) A is a determinant of B
OR
3) A functionally determines B
Note that A-->B does not necessarily mean B-->A.
Think of an individual’s name and Social Security Number (SSN). A given Social Security Number has exactly 1 corresponding name. So SSN functionally determines name (which can also be expressed as either “SSN is a determinant of name” or as “name is functionally dependent on SSN”). However, since a particular name (such as John J. Jones) might be associated with more than one Social Security Number, SSN is not functionally dependent on name (i.e., name is not a determinant of SSN).
Index
Index – Typically a separate file (even if only a temporary one) that contains one or more collections of sorted “keys” for a given table (the “keys” – that is, the column values – in an index are not required to be unique). The “keys” of an index are arranged in an algorithmically efficient manner (binary trees, for example) that allows for extremely rapid searches. Furthermore, since a “key” is typically much smaller than an entire row, many
more keys than rows can be stored in memory.
NOTE: Formal relational terminology is not used when discussing indexes because indexes do not exist in the relational model (remember, a relation is a set of tuples and so there is no particular order to the elements of the set).
Advantages of indexes:
1) Indexes can greatly accelerate retrieval of rows from a table
2) Indexes can accelerate queries that join two or more tables
3) Indexes help enforce referential integrity (and maintain persistent relationships)
Disadvantages of indexes:
1) Indexes occupy space
2) Maintaining indexes (keeping trees balanced “on the fly”, for example) adds overhead.
Concerning disadvantage (2), if many indexes are being maintained for a table, performance hits will occur during insertions, deletions, and any updates that alter the value of one or more columns referenced by at least one of the indexes. But concerning advantage (1), maintaining many indexes on a table will reduce query response times. A balance has to be achieved based on the table’s typical usage. If it is queried frequently but rarely updated, many indexes can be maintained for it: in fact, one could even consider “inverting the table” (which means to create an index on every column). On the other hand, if the table is updated continually and is queried infrequently, then the number of indexes should be kept to a bare minimum.
Views
View – In SQL, a view is a virtual table (as opposed to a base table, which is a physical table that exists as an essential part of the database). Views can provide a measure of security by
logically omitting sensitive tables, records, or columns from a database as specified in the view's defining query. And since queries can be run against views just as they can be run
against base tables, views can also make frequently used or complex queries easier to code, and may even provide a performance benefit (some implementations actually
materialize the virtual table, as explained next).
A RDBMS can handle views in one of two ways: either (1) the view’s contents can be materialized – that is, physically stored (in which case the RDBMS is responsible for keeping its contents updated as changes are made to the underlying base tables), or (2) the view can be truly virtual, having no physical existence itself (in which case the defining query is run each time the view is referenced; or if an ad hoc query is being run against the view, the defining query and the ad hoc query are transparently merged into a single SQL command that is then executed). Regardless whether the RDBMS implements views as (1) or (2), a view is not a static snapshot of data; it’s contents always reflect the current data contained in the underlying base table(s).
There's more I have to add (I already have stuff on normalization, but it doesn't carry over well to discussion forums) such as relational algebra and SQL.