How to Examine the SELECT statement
Transact-SQL or T-SQL, is a dialect of the ANSI standard SQL language used by Microsoft SQL products and services. It is similar to standard SQL. Most of our focus will be on the SELECT statement, which has by far the most options and variations of any DML statement.
Let’s start by taking a high-level look at how a SELECT statement is processed. The order in which a SELECT statement is written is not the order in which it is evaluated and processed by the SQL Server database engine.
Consider the following query:
SQLCopy
SELECT OrderDate, COUNT(OrderID) AS Orders
FROM Sales.SalesOrder
WHERE Status = 'Shipped'
GROUP BY OrderDate
HAVING COUNT(OrderID) > 1
ORDER BY OrderDate DESC;
The query consists of a SELECT statement, which is composed of multiple clauses, each of which defines a specific operation that must be applied to the data being retrieved. Before we examine the run-time order of operations, let’s briefly take a look at what this query does, although the details of the various clauses will not be covered in this module.
The SELECT clause returns the OrderDate column, and the count of OrderID values, to which is assigns the name (or alias) Orders:
SQLCopy
SELECT OrderDate, COUNT(OrderID) AS Orders
The FROM clause identifies which table is the source of the rows for the query; in this case it’s the Sales.SalesOrder table:
SQLCopy
FROM Sales.SalesOrder
The WHERE clause filters rows out of the results, keeping only those rows that satisfy the specified condition; in this case, orders that have a status of “shipped”:
SQLCopy
WHERE Status = 'Shipped'
The GROUP BY clause takes the rows that met the filter condition and groups them by OrderDate, so that all the rows with the same OrderDate are considered as a single group and one row will be returned for each group:
SQLCopy
GROUP BY OrderDate
After the groups are formed, the HAVING clause filters the groups based on its own predicate. Only dates with more than one order will be included in the results:
SQLCopy
HAVING COUNT(OrderID) > 1
For the purposes of previewing this query, the final clause is the ORDER BY, which sorts the output into descending order of OrderDate:
SQLCopy
ORDER BY OrderDate DESC;
Now that you’ve seen what each clause does, let’s look at the order in which SQL Server actually evaluates them:
- The FROM clause is evaluated first, to provide the source rows for the rest of the statement. A virtual table is created and passed to the next step.
- The WHERE clause is next to be evaluated, filtering those rows from the source table that match a predicate. The filtered virtual table is passed to the next step.
- GROUP BY is next, organizing the rows in the virtual table according to unique values found in the GROUP BY list. A new virtual table is created, containing the list of groups, and is passed to the next step. From this point in the flow of operations, only columns in the GROUP BY list or aggregate functions may be referenced by other elements.
- The HAVING clause is evaluated next, filtering out entire groups based on its predicate. The virtual table created in step 3 is filtered and passed to the next step.
- The SELECT clause finally executes, determining which columns will appear in the query results. Because the SELECT clause is evaluated after the other steps, any column aliases (in our example, Orders) created there cannot be used in the GROUP BY or HAVING clause.
- The ORDER BY clause is the last to execute, sorting the rows as determined by its column list.
To apply this understanding to our example query, here is the logical order at run time of the SELECT statement above:
SQLCopy
FROM Sales.SalesOrder
WHERE Status = 'Shipped'
GROUP BY OrderDate
HAVING COUNT(OrderID) > 1
SELECT OrderDate, COUNT(OrderID) AS Orders
ORDER BY OrderDate DESC;
Not all the possible clauses are required in every SELECT statement that you write. The only required clause is the SELECT clause, which can be used on its own in some cases. Usually a FROM clause is also included to identify the table being queried. In addition, Transact-SQL has other clauses that can be added.
As you have seen, you do not write T-SQL queries in the same order in which they are logically evaluated. The run-time order of evaluation determines what data is available to which clauses, as a clause only has access to information already made available from an already processed clause. For this reason, it’s important to understand the true logical processing order when writing queries.