Does This Violate First Normal Form? The discussion below may be academic and have little practical use. It was posted to the Oracle LazyDBA mailing list (www.lazydba.com). Given three designs: -- Design I -- PERSON table has columns: PID, LName, FName, MI, Woodwork, Painting, Mechanical, where the last three columns contain values 'Y' or 'N'. -- Design II -- PERSON table has columns: PID, LName, FName, MI SKILL has columns: SID, SName P_S_XREF has: PID, SID -- Design III -- PERSON table has PID, LName, FName, MI, Skills, where Skills contains values like "Woodwork, Mechanical" or "Painting and Mechanical" (doesn't matter what component separator is used); no other tables. I found a number of books from the local library but didn't find C.J. Date's "Introduction to Database Systems". I also did some research on the Internet. Generally, Web pages discussing first normal form could take either of the following two views. All books I read take View B except G. Koch and K. Loney's "Oracle8 The Complete Reference" which takes View A. View A: Design I violates first normal form. To correct it, it should be changed to Design II. The three skill columns "Woodwork", "Painting" and "Mechanical" are repeating groups. (View A doesn't mention Design III) View B: Design III violates first normal form because the "Skills" column contains non-atomic values and therefore it's a repeating group. To correct it, Design II should be used. (No mention of Design I) C.J. Date says "A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only" ("An Introduction To Database Systems", 1985, p.367, cited from R. Hogan, "A Practical Guide to Data Base Design", p.25). My understanding of "domain" is the mathematical set of all possible values for an attribute. From this definition of first normal form, I have to say only Design III violates first normal form; Design I does not, because the values 'Y' and 'N' stored in the individual skill columns are atomic. Note the word "values" in his definition. That should mean actual values you put in the column. If you can guarantee all values are indivisible, e.g. every person in the PERSON table in Design III can only register one skill (even if you are multi-talented), Design III is also 1NF-normalized. What's wrong with Design I therefore is not violation of 1NF; it's some other design flaw. In fact, normalization does not GUARANTEE a good design. There're many rules with regard to entities, attributes and relationships alone (see M.C. Reingruber and W.W. Gregory, "The Data Modeling Handbook"). For example, synonyms in columns: Item_ID in one table is called Item_Code in another. Homonyms: Company_ID means Customer's ID in CUSTOMER but Seller's ID in SELLER. We know in practical use our Design I is very inflexible when it's time to change a skill name and drop or add a skill, and it makes the table unnecessarily fat. We have yet to come up with a rule to eliminate, and a term to name, the problem associated with Design I. This rule may be: If several attributes can be grouped into one category and they are subject to change in names and numbers (drop and add), there should be a separate entity holding these attributes and a cross reference entity should be built between them. This is hardly a rule in the sense of rules as 1NF, 2NF... and should only be thought of as a theoretical attempt to justify database practitioners' common sense. To illustrate why it's not a real rule, consider Design IV: PERSON: PID, LName, FName, MI, HomePhone, WorkPhone, FAX If "WorkPhone" could be changed to "OfficePhone" and other contacts such as "Email", "Pager" etc. could be added and "FAX" dropped, why not change it to -- Design V -- PERSON: PID, LName, FName, MI CONTACT: CID, ContactType (containing values like "3", "WorkPhone", respectively) P_C_XREF: PID, CID, Value (containing values like "123", "3", "713-555-5555", respectively) You have a strong tendency to change Design I to II but don't want to change Design IV to V even if V works. The only explanation for it is that contact type names "WorkPhone", "Email" as column names are not frequently changed, added or dropped, and of course data modelers have been designing databases this way for the past several decades and it's been proved to be a good design. Yong Huang yong321@yahoo.com