PDA

View Full Version : Are these tables normalized?


DNAunion2000
01-26-2004, 12:15 AM
In a different thread, I slapped together a quick and dirty, simple database consisting of only 2 related tables. Here are some relation schemas for them.

BOOKS(ISBN, bkTitle, bkPrimaryAuthor, bkYear, bkPubID)

PUBLISHERS(pubID, pubName, pubCity, pubState)

where bold indicates the primary keys.

jemfinch asserted that I needed to buy a book and read up on normalization because of the poor design.

Go to the bookstore and read about normalization. You're lacking in your knowledge of databases (no doubt influenced by VFP). This is why "easy" languages oftentimes aren't the best ones to learn first: their users tend to forget the fundamentals (like you have with database normalization).

This is where the human side of me wants to insult you, but I'm restraining myself. Your arguments do a good enough job already.

Well, that gives rise to the question as to whether or not these tables are normalized.

Yes, they are. In fact, not only are they in 3NF, but they are also in the higher BCNF. In fact, not only are they in BCNF, but they are also in the higher 4NF. Here's the check.

1) A table is in 1NF (first normal form) if it meets all of the criteria for being a relation (most often this means that each attribute can contain only one, atomic value for each tuple).

Since both of the above tables meet all of the required conditions for being a relation, they are both in 1NF.

2) A table is in 2NF (second normal form) if it is in 1NF and no non-key column is functionally dependent upon only part of the primary key.

Since any table in 1NF that does not have a composite primary key is automatically in 2NF, both tables are in 2NF.

3) A table is in 3NF (third normal form) if it is in 2NF and it contains no transitive dependencies.

One could try to argue that there are transitive dependencies. For example, if we assume pubName is unique, then pubID functionally determines pubName and pubName functionally determines pubCity: thus, pubID-->pubName-->pubCity. However, this fails because the intermediary link in the chain of attributes is itself a candidate key, and:

"A functional dependency X --> Y in a relation R is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R, and both X --> Z and Z --> Y hold." (Ramez Elmasri & Shamkant B Navathe, Fundamentals of Database Systems: Fourth Edition, Addison Wesley, 2003, p320)

Since both tables are in 2NF and neither has any transitive dependencies that would prevent it from being in 3NF, both tables are in 3NF.

4) A table is in BCNF (Boyce-Codd normal form) if it is in 3NF and the only determinants it contains are candidate keys.

The only determinants known in the tables are the primary keys (which are, by definition, candidate keys).

But what about pubName? Doesn't a given valud for it yield only one unambigious value for pubCity and pubState? We could probably safely assume so, but if we do, pubName would be unique and thus a candidate key (the alternative is just about nonsensical: for pubName to functionally determine pubCity and pubState and NOT be a candidate key, pubID would not functionally determine pubName, yet the pubName would have to functionally determine pubCity and pubState. In other words, you could have two different publishers with the same name, but if you did, they would have to be located in the same city and state). Since it would be a candidate key, its being a determinant would not prevent the table from being in BCNF.

If we go out of our way to assume that pubName is not unique (to prevent it from being a candidate key), then multiple publishers could have the same name: if that were the case, there would be no valid means of enforcing a rule that any such multiple publishers would have to be located in the same city or the same state. Thus, pubName would no longer functionally determine city or state and so wouldn't be a determinant: again, not preventing the table from being in BCNF.

The same type of logic that showed pubName does not prevent PUBLISHERS from being in BCNF works for bkTitle in TITLES too (also, I later stated explicitly that there was no constraint for bkTitle to be unique, and thus there could be multiple, different books with the same title: there's no valid way to create a constraint that demands that each of the different books in such a set have the same year, publisher, or primary author, and so bkTitle would no longer be a determinant. Thus, it could not be used to exclude the BOOKS table from being in BCNF).

The only remaining possible determinant would be pubCity, as in the functional dependency pubCity-->pubState. But that cannot just be assumed: one would need to know the business rules. Furthermore, since a single value for city -- such as "Philadelphia" -- can be in more than one state, city does NOT functionally determine state. Thus, pubCity is not a determinant and cannot possibly disqualify the PUBLISHER table from being in BCNF.

Thus, both tables are in BCNF.

5) A table is in 4NF (fourth normal form) if it is in BCNF and has no multivalued dependencies.

"...a relation R(A, B, C), a multi-value dependency exists if A determines multiple values of B, A determines multiple values of C, and B and C are independent of each other." (David M Kroenke, Database Processing: Fundamentals, Design, and Implementation, Prentice Hall, 2004, p132)

This is not the case for either of the tables being discussed.

Therefore, both tables are in 4NF.

And thus another of jemfinch's assertions is shown to be wrong

Strike
01-26-2004, 01:00 AM
You are a sad sad man. Really. You should go outside into the Big Blue Room sometime. Wear some sunblock and some sunglasses.

DNAunion2000
01-27-2004, 02:32 AM
Originally posted by Strike
You are a sad sad man. Really. You should go outside into the Big Blue Room sometime. Wear some sunblock and some sunglasses.

What's your problem with me defending my statements? jemfinch doesn't have a problem with it.

Shoot, jemfinch himself said that he admires people who are willing and able to defend their positions.

jemfinch also stated that he admits when he is wrong.

So I expect any minute now that jemfinch will post in this thread, saying how much he admires me and also admitting that he was wrong on this. Isn't that what you expect the honorable jemfinch to do?

Strike
01-27-2004, 04:09 AM
You have a reading comprehension problem if you think I have a problem with you defending your statements, because I never said that. I find you pathetic and sad because of the length of time you spend composing these lengthy posts and provoking arguments. It would be a different story if anyone knew who you actually were, but you have gone to great lengths to keep this online persona completely anonymous. So, this intangible existence you cherish so dearly is one that you refuse to connect to your actual real world self in any tangible way and yet it consumes this inordinately huge amount of your time. It's sad, really.

DNAunion2000
01-28-2004, 11:28 PM
Strike: You have a reading comprehension problem if you think I have a problem with you defending your statements, because I never said that. I find you pathetic and sad because of the length of time you spend composing these lengthy posts and provoking arguments. It would be a different story if anyone knew who you actually were, but you have gone to great lengths to keep this online persona completely anonymous. So, this intangible existence you cherish so dearly is one that you refuse to connect to your actual real world self in any tangible way and yet it consumes this inordinately huge amount of your time. It's sad, really.

In other words, you have nothing at all worthwhile to contribute to the discussion here: you just want to insult me...as usual.

Let me see if I can try to get you to focus on the topic. Do you agree that the tables are normalized? If not, exactly why not?

Strike
01-29-2004, 01:19 AM
There is no discussion here. Just you blathering on and on. Ever hear of the expression "no sense in beating a dead horse"?

DNAunion2000
01-29-2004, 01:58 AM
Strike: There is no discussion here. Just you blathering on and on. Ever hear of the expression "no sense in beating a dead horse"?

So you do agree that the tables are normalized, and that, therefore, jemfinch was wrong. Right?

jemfinch
01-30-2004, 04:51 PM
I missed this paragraph in the original post:


The book table’s primary key will be the ISBN; the publisher table’s primary key will be the publisher’s assigned ID; and a foreign key will be setup where the publisher in the book table references the publisher ID in the publisher table.


And thus assumed each column would have an integer primary key (as you saw implemented in Strike's example implementation in Python).

Note that you generally shouldn't assign an intrinsic value to the primary key for a table. Your publisher id primary key is probably fine, but I don't think ISBN is a good candidate for a primary key. What about second editions, or reprints of the same edition by a different publisher, or things like that?

Jeremy

DNAunion2000
01-31-2004, 12:47 PM
jemfinch: Note that you generally shouldn't assign an intrinsic value to the primary key for a table. Your publisher id primary key is probably fine, but I don't think ISBN is a good candidate for a primary key. What about second editions, or reprints of the same edition by a different publisher, or things like that?

Both of the examples you listed would have different ISBNs.

The International Standard Book Number, or ISBN, is a unique identifier for a book. The ISBN system was created in the United Kingdom in 1966 (then called Standard Book Numbering SBN) and adopted as international standard ISO 2108 in 1970.

Each edition and variation (except reprints) of a book receives its own ISBN. The number consists of four parts:

the country of origin,
the publisher,
the item number, and
a checksum digit.

The different parts can have different lengths and are usually separated by hyphens. These hyphens are not strictly necessary however, since prefix codes are used which ensure that no two codes start the same way.

The country field is 0 or 1 for English speaking countries, 2 for French speaking countries, 3 for German speaking countries etc. The country field can be up to 5 digits long; 99936 for instance is used for Bhutan. See this complete list.

The publisher number is assigned by the national ISBN agency, and the item number is chosen by the publisher.

Publishers receive blocks of ISBNs, with larger blocks going to publishers that are expected to need them; a small publisher might receive ISBNs consisting of a digit for the language, seven digits for the publisher, and a single digit for the individual items. Once that block is used up, the publisher can receive another block of numbers, with a different publisher number. As a consequence, different publisher numbers occasionally correspond to the same publisher.”
(http://en.wikipedia.org/wiki/ISBN)

Strike
01-31-2004, 01:04 PM
Originally posted by DNAunion2000
jemfinch: Note that you generally shouldn't assign an intrinsic value to the primary key for a table. Your publisher id primary key is probably fine, but I don't think ISBN is a good candidate for a primary key. What about second editions, or reprints of the same edition by a different publisher, or things like that?
Both of the examples you listed would have different ISBNs.
(note: super-strong emphasis mine, italic emphasis exists in original)

Did you miss this part of your own paste?

Each edition and variation (except reprints) of a book receives its own ISBN.
(emphasis mine)

DNAunion2000
01-31-2004, 01:29 PM
Strike [starting off with quotes from earlier posts]:

*********************************
jemfinch: Note that you generally shouldn't assign an intrinsic value to the primary key for a table. Your publisher id primary key is probably fine, but I don't think ISBN is a good candidate for a primary key. What about second editions, or reprints of the same edition by a different publisher, or things like that?
*********************************

*********************************
DNAunion: Both of the examples you listed would have different ISBNs.
*********************************

Strike: (note: super-strong emphasis mine, italic emphasis exists in original)

Did you miss this part of your own paste?


***********************
Each edition and variation (except reprints) of a book receives its own ISBN.
***********************
(emphasis mine [Strike])

No, I didn’t miss it: I read it.

But unlike you, I bothered to try to understand what was being said. If you would have done that, you would have seen that your attempt to "nab" me falls flat on its face.

Did you miss this part?

The publisher number is assigned by the national ISBN agency, and the item number is chosen by the publisher.

Publishers receive blocks of ISBNs, with larger blocks going to publishers that are expected to need them; a small publisher might receive ISBNs consisting of a digit for the language, seven digits for the publisher, and a single digit for the individual items. Once that block is used up, the publisher can receive another block of numbers, with a different publisher number. As a consequence, different publisher numbers occasionally correspond to the same publisher.”

A publisher number functionally determines a publisher (but not the other way around). Thus, a REPRINT OF THE SAME EDITION BY A DIFFERENT PUBLISHER will have a DIFFERENT ISBN, because THE PUBLISHER ID PORTION of the ISBN will be DIFFERENT.

******************************
PS: Considering your past reactions to being wrong, I don't expect you to acknowledge your error here. Shoot, you'll probably try to push blame off onto Wikipedia, just like you tried to push blame off onto MSDN. Doesn't matter...we all know the truth.

jemfinch
01-31-2004, 02:38 PM
Eh, Strike may be wrong, but your post from Wikipedia proves my point either way: a publisher may have more than one publisher id. Since you used the publisher id as a primary key in your publisher table, you're going to have problems with that. Primary keys should not have intrinsic meaning.

Jeremy

DNAunion2000
01-31-2004, 03:47 PM
jemfinch: Eh, Strike may be wrong...

No...IS wrong.

jemfinch: ... but your post from Wikipedia proves my point either way: a publisher may have more than one publisher id.

Your confusing 2 types of publisher IDs.

Businesses assign their own internally used ID numbers for external entities, such as vendors/suppliers, customers, and business partners. These are INTERNAL ID numbers, which are guaranteed to be unique within the system they are used in. Same would go for a publisher ID - it would be an internally assigned ID, completely separate from whatever the publisher might have been assigned by an outside organization. To try to actually show the general idea, I am assigned one ID by the US government (my SSN), but the USCF does not use that as their identifier for me, they use a member ID number; neither does Amazon.com use my SSN - they have their own member ID number for me, which differs from both my SSN and my USCF member number.


jemfinch: Since you used the publisher id as a primary key in your publisher table, you're going to have problems with that. Primary keys should not have intrinsic meaning.

That is more of a design philosophy, not an absolute that all must follow.

Some state that it is okay for a primary key to have meaning, such as using the text "ACCOUNTING" for a DEPARTMENT table, instead of using a meaningless, incrementing number, such as 12. For example, in a related table, having "ACCOUNTING" as the foreign key value makes it immediately clear what department the current record is related to. On the other hand, having 12 as the foreign key value forces one to either memorize the various department codes, or to stop and somehow look them up whenever they are referenced.

jemfinch
01-31-2004, 04:03 PM
Originally posted by DNAunion2000
That is more of a design philosophy, not an absolute fact.


Sure, but there are good design philosophies and bad design philosophies. Good ones lead to superior software; bad ones lead to inferior software. What I'm espousing is a good one; using primary keys with intrinsic meaning is a bad one.


Some state that it is okay for a primary key to have meaning, such as using the text "ACCOUNTING" for a DEPARTMENT field, instead of using a meaningless, incrementing number, such as 12.


And some others would say that you'd be screwed when the ACCOUNTING department changes its name to the FINANCIAL SERVICES department. The latter people would be correct.


Having "ACCOUNTING" as a foreign key value makes it immediately clear what department the current record is related to.


As long as that department never changes its name.


On the other hand, having 12 as the foreign key value forces one to either memorize the various codes, or to stop and somehow look them up whenever they are referenced.

Or just to stick it in your query like everything else you're doing.

Jeremy

DNAunion2000
01-31-2004, 04:36 PM
DNAunion: Some state that it is okay for a primary key to have meaning, such as using the text "ACCOUNTING" for a DEPARTMENT field, instead of using a meaningless, incrementing number, such as 12.

jemfinch: And some others would say that you'd be screwed when the ACCOUNTING department changes its name to the FINANCIAL SERVICES department. The latter people would be correct.

Why? Once again you give only half an argument…where’s the rest?

I suppose I am once again going to have to GUESS what you are getting at.

I will guess your problem is that "ACCOUNTING” would appear in various tables and in numerous records in those tables, and so the change would be time consuming and intensive and possibly be error prone too. If so, that is incorrect. That is, many relational database management systems have CASCADING updates for primary keys. The RDBMS, not the users, would take care of changing all occurrences of “ACCOUNTING” to “FINANCIAL SERVICES”, doing it accurately and quickly (and the use of transactions would ensure that all changes were made as a single unit, so no partial updates could occur and the database could not be left in an inconsistent state).

DNAunion: Having "ACCOUNTING" as a foreign key value makes it immediately clear what department the current record is related to.

jemfinch: As long as that department never changes its name.

Cascading updates can take care of the “problems” associated with changing a primary key’s value.

DNAunion: On the other hand, having 12 as the foreign key value forces one to either memorize the various codes, or to stop and somehow look them up whenever they are referenced.

jemfinch: Or just to stick it in your query like everything else you're doing.

Who said all people always look at database tables through queries? Some users (of DBMSs such as Access and VFP and probably others, such as Alpha 4) open up and look at individual tables, where all they see is the foreign key value. “ACCOUNTING” would have immediately ascertainable meaning, the number 12 wouldn’t.

DNAunion2000
01-31-2004, 04:52 PM
Since I like to support my statements...

"As noted in Chapter 2, some tables may have more than one identifier. If so, it is important to choose the best of them for the primary key. For example, Figure 5-3 shows a DEPARTMENT table with DepartmentName as the identifier and with two candidate keys: BudgetCode and the composite {Building, Room}. ...

Given the criteria for short, numeric, and seldom-changing, the attribute BudgetCode in Figure 5-3 is probably the best choice for primary key. One could make the argument, however, that DepartmentName would be a better primary key because it is more natural to the users. Anyone can look at "Accounting" and know what it means. Few people, however, can look at BudgetCode 10445 and know that it represents the accounting department. Therefore, as long as department names are reasonably short, DepartmentName could also be selected as the primary key." (David M. Kroenke, Database Processing: Fundamentals, Design, and Implementation: Ninth Edition, Pearson/Prentice Hall, 2004, p153)

DNAunion2000
01-31-2004, 05:04 PM
And for the other part of my argument...

"Considering updates, the default behavior is not to allow an update of a primary key value if the row has any related child rows. Another possibility, however, is to automatically change the value of the foreign key in all related child rows to the new value.

For example, when the Bookkeeping department changes its name to Accounting, it is possible to change the value of Bookkepping to Accounting in all related child rows as well. In that way, the relationships among the rows are maintained, and the referential integrity constraint is also preserved. Such a policy is called cascading updates. During database design, if we want this behavior, we need to define it as a referential integrity constraint." (David M. Kroenke, Database Processing: Fundamentals, Design, and Implementation: Ninth Edition, Pearson/Prentice Hall, 2004, p157)

One way to accomplish this is to add an ON UPDATE CASCADE (or CASCADE ON UPDATE) clause to an SQL CREATE TABLE statement.