PDA

View Full Version : Old-style vs. New-style joins in SQL


DNAunion2000
01-13-2004, 03:51 PM
In SQL, there is an old way to join tables that uses the WHERE clause, and a new way that uses the FROM clause. Here is an example of the old style.


SELECT publishers.name, publishers.id, books.title
FROM publishers, books
WHERE publishers.id = books.publisher_id
ORDER BY publishers.name, books.title ;


Specifying joins in the WHERE clause is considered, at least by some, to be poor style these days. First, if one or more restrictions are being applied to the rows being retrieved from one or both of the tables, then the WHERE clause gets cluttered, having both the join and restrictions bunched together: clarity is improved by moving the join condition out of the WHERE clause and into the FROM clause.

Here’s how this could be done


SELECT publishers.name, publishers.id, books.title
FROM publishers INNER JOIN books ON publishers.id = books.publisher_id
ORDER BY publishers.name, books.title ;


************************************************
NOTE: Even the above improved command can be improved, by using table aliases, such as:

SELECT p.name, p.id, b.title
FROM publishers p INNER JOIN books b ON p.id = b.publishers_id
ORDER BY p.name, b.title ;

The benefits of using table aliases will be taken up in a separate thread.
************************************************

For several reasons, most sources recommend using the newer syntax for joins.

”The join in Figure 7-2 is accomplished using the ANSI JOIN SQL syntax, introduced in SQL Server version 6.5. Many examples and applications still continue to use the old-style JOIN syntax, which is shown below. (The term “old-style JOIN” is actually used in the SQL-92 specification.) The ANSI JOIN syntax is based on ANSI SQL-92. The main differences between the two types of join formulations are:

[1]The ANSI JOIN actually uses the keyword JOIN.

[2]The ANSI JOIN segregates join conditions from search conditions.

The ANSI JOIN syntax specifies the JOIN conditions in the ON clause (one for each pair of tables), and the search conditions are specified in the WHERE clause – for example, WHERE state <> ‘CA’. Although slightly more verbose, the explicit JOIN syntax is more readable.



One of the most common errors new SQL users make when using the old-style JOIN syntax is not specifying a join condition. Omitting the WHERE clause is still a valid SQL request and causes an answer to be returned. However, the result set is likely not what the user wanted. In Figure 7-3, omitting the WHERE clause would return the Cartesian product of the three tables: it would generate every possible combination of rows between them. Although in a few unusual cases you might want all permutations, usually this is just a user error. The number of rows returned is huge and typically doesn’t represent anything meaningful. For example, the Cartesian product of the three small tables here (authors, titles, and titleauthor each has less than 26 rows) generates 10,350 rows of (probably) meaningless output.

Using the ANSI JOIN syntax, it’s impossible to accidentally return a Cartesian product – one reason to use ANSI JOINs exclusively.” (Inside Microsoft SQL Server 7.0, Ron Soukup and Kalen Delaney, Microsoft Press, 1999, p305-306)


“The concept of a joined table (or joined relation) was incorporated into SQL to permit users to specify a table resulting from a join operation in the FROM clause of a query. This construct may be easier to comprehend than mixing together all the select and join conditions in the WHERE clause.” (Fundamentals of Database Systems: Fourth Edition, Ramez Elmasri & Shamkant B. Navathe, Addison Wesley, 2004, p237)


”When you join tables, you can use either the older SQL ’89 ANSI syntax, or the newer SQL ’99 ANSI join syntax.



SQL ’99 ANSI Join Syntax
The join statements for the newer ANSI syntax show up in the FROM clause of the SELECT statement:



The WHERE clause selects rows from the joined rows to be returned. You can choose three types of ANSI join statements: INNERJOIN, OUTERJOIN, and CROSSJOIN.

SQL ’89 Join Syntax
Although SQL Server 2000 still supports the older SQL ’89 syntax, we suggest you use the newer ANSI standard syntax instead:” (Teach Yourself Microsoft SQL Server 2000 in 21 Days: Second Edition, Richard Waymire and Rick Sawtell, SAMS, 2003, p361-362)

Finally, in the section on joins (pages 195 – 201), the book “Mastering SQL Server 2000” (Mike Gunderloy & Joseph L. Jorden, Sybex, 2000) does not even mention the old-style join using a WHERE clause; all of its joins use the new-style join using the FROM clause.

So while it is not an error to implement joins using the WHERE clause, it (1) is the old style of doing joins, which if nothing else means it may be dropped from the SQL specification in the near future, (2) can lead to a cluttered up WHERE clause, making that clause more difficult to comprehend, (3) has reduced clarity compared to having the join condition on its own, and (4) makes the process of joining tables more error prone (at least for those fairly new to SQL).

sicarius
01-13-2004, 05:53 PM
I agree with most of what you said, but could you please clarify what you meant by saying:


which if nothing else means it may be dropped from the SQL specification in the near future


How can they drop this from the spec? The query doesn't use any keywords other then SELECT, FROM, WHERE, and in this particular case ORDER and BY.

The fact that you can join two tables this way is rooted more in database theory than the syntax for SQL.

I am not saying the old style is better, just that I don't think they can prevent the user from doing it.

stuka
01-13-2004, 06:36 PM
What I don't get is the overwhelming love for table aliases when (at least in the examples given) they merely clutter things up, rather than clearing them up. While the full names may be more verbose, they're easier to understand and less likely to be confused IMHO.

sicarius
01-13-2004, 07:46 PM
I don't know. I could go either way with table aliases. To me it depends on the specific query.

If the query is going to use the same tables over and over again, then I will probably use an alias. Otherwise I will probably stick with the table name.

Also, most complex queries you expect to use a lot should be in stored proceedures (assuming your DBMS supports them) anyway, and at that point these little details start to go out of the window.

DNAunion2000
01-13-2004, 07:59 PM
...which if nothing else means it may be dropped from the SQL specification in the near future

How can they drop this from the spec? The query doesn't use any keywords other then SELECT, FROM, WHERE, and in this particular case ORDER and BY.

The fact that you can join two tables this way is rooted more in database theory than the syntax for SQL.

Good point...guess I didn't think that one through very well. I might have been thrown off by this:

SQL ’89 Join Syntax
Although SQL Server 2000 still supports the older SQL ’89 syntax, we suggest you use the newer ANSI standard syntax instead:” (Teach Yourself Microsoft SQL Server 2000 in 21 Days: Second Edition, Richard Waymire and Rick Sawtell, SAMS, 2003, p361-362)

That can be taken to indicate the while SQL Server still supports the old method, it may not in the future.

The WHERE clause needs to be able to (1) reference multiple tables (one would surely not want to limit restriction criteria to just one table), (2) access whatever columns from those tables the user wants, and (3) handle an equal operator as part of implementing restriction criteria. With those three things in place for a WHERE clause, a table join seems hard to prevent.

sicarius
01-13-2004, 09:52 PM
Ok. I just wanted to make sure I wasn't going crazy :)

DNAunion2000
01-24-2004, 10:51 PM
Picked up a tutee in the computer science course Database Processing. Got the book and checked it out. Here's what it says about joins and table aliases.

"SQL-92 introduced an alternative join syntax that has become very popular because it is easier to interpret. This syntax substitutes the words JOIN and ON for WHERE, as follows:


SELECT Name, HoursWorked
FROM EMPLOYEE JOIN ASSIGNMENT
ON EMPLOYEE.EmployeeNumber = ASSIGNMENT.EmployeeNum;


The results of this are: [table omitted]

In addition, it is possible to improve the readability of a join by using aliases for table names. The expression FROM EMPLOYEE E will assign the alias E to the EMPLOYEE table. Using this syntax the join above is the following:


SELECT Name, HoursWorked
FROM EMPLOYEE AS E JOIN ASSIGNMENT AS A
ON E.EmployeeNumber = A.EmployeeNum;


... note that the keyword AS can be omitted, if desired.

As queries become more complicated, this format is easier to interpret and it is the format we will use for all of the joins in Chapters 7 and 8. ..." (David M Kroenke, Database Processing: Fundamentals, Design, and Implementation, Prentice Hall, 2004, p212-213)