A Visual Guide to T-SQL Set Operators

Following on from my previous post visually showing how T-SQL PIVOT works (here), the following presents a visual guide to how set operators are used in T-SQL and what restrictions there are when creating such queries. I found these diagrams very helpful when revising for my MS 70-761 (Querying Data with Transact-SQL) exam. 

What are the Set Operators?

Results of T-SQL queries are sets, as such set operators can be applied to the results of multiple queries to combine the results in different ways.
There are 4 set operators in T-SQL:
  1. UNION: the concatenation of results from both sets, excluding duplicates.
  2. UNION ALL: the concatenation of results from both sets, including duplicates.
  3. INTERSECT: the results that appear in both sets.
  4. EXCEPT: think of this set operator as the results that appear in one set minus the results that appear in another set.

Basic Set Operator Query Format

The basic format for using a set operator is to have multiple select statements (sub queries) interleaved with the set operator of your choice. Execution of these sub queries and set operator continue in the following manner:
  • Each sub query is evaluated providing an intermediate result set
  • The set operator logic is applied to the intermediate result sets
  • This provides the final result set 



Set Operator Query Restrictions

Now that we've defined the basic format of the set operator query, I'll build up the diagram with the restrictions that need to be known. The first two restrictions we'll add to this diagram are: there has to be the same number of columns in each query and the data types for paired columns between sub queries (by position) must be compatible. 


The third restriction we'll add to this diagram is that no ORDER BY statement can be included in any of the individual sub queries, but an ORDER BY statement can be included at the end of the overall query and will be applied to the final result set of the query. 

The fourth and final restriction we'll add to this diagram to complete the visual guide is that the naming of any columns in the final result set is derived from the column naming in the first sub query and none of the other sub queries. 

Set Operator Query Format Syntax

For completeness, the following lists the syntax used for a set operator query.
SELECT <col1>, <col2>, <col3>
 [FROM ...]
 [WHERE ...]
{ UNION | UNION ALL | INTERSECT | EXCEPT }
SELECT <resA>, <resB>, <resC>
 [FROM ...]
 [WHERE ...] 
[ORDER BY ...]

Summary

  • Multiple select statements interleaved with set operators. 
  • Same number of columns in each SELECT statement. 
  • Convertible data types for each matching column between the sub queries. 
  • No ORDER BY in sub queries, but can have one for the whole query. 
  • The column names in the returned result set come from the first sub query.

The End

Comments

Popular posts from this blog

My First Year as a Data Scientist

PIVOT and UNPIVOT in T-SQL

My First Retro