Monday, February 17, 2014

UNION ALL is possible. How about INTERSECT ALL and EXCEPT ALL?

Not expecting duplicates but a significant improvement in performance, we tend to use UNION ALL when no duplicates are guaranteed in combining two sets. The reason is, UNION ALL does not performing an additional task for filtering (or removing) duplicates, hence gives a better performance than UNION. However, the purpose of UNION ALL is not for improving the performance but for producing the result with duplicates if exist. While this was being taught during my classes, a thought came into my mind: Why other set operators such as INTERSECT and EXCEPT do not offer the same built-in functionality like INTERSECT ALL and EXCEPT all?

A usual quick search did not give me the required answer but proved that it is not available as a built-in functionality. There were few possible solutions but the one given by T-SQL expert Itzik Ben-Gan was quite interesting. Here is a small note on INTERSECT and EXCEPT and implementation suggested by Itzik Ben-Gan for INTERSECT ALL and EXCEPT ALL.

INTERSECT and INTERSECT ALL
INTERSECT allows us to retrieve only rows that are available in both two sets. This removes duplicates if found. It basically behaves as DISTINCT INTERSECT. If duplicates to be included to the result, a workaround is required and it can be easily done with ROW_NUMBER ranking function. All required columns need to be added under PARTITION BY clause and SELECT < constant> needs to be added for ODER BY clause for instructing SQL Server to not consider the order of the window. This generates aN unique number of all records, making them available in the final resultset of INTERSECT. Here is a sample code for this;

  1. USE tempdb
  2. GO
  3.  
  4. -- creating table 1
  5. CREATE TABLE dbo.InternetSales
  6. (
  7.     SaleDate datetime NOT NULL
  8.     , Product varchar(100) NOT NULL
  9.     , Amount money NOT NULL
  10. )
  11. GO
  12.  
  13. -- creating table 2
  14. CREATE TABLE dbo.ResellerSales
  15. (
  16.     SaleDate datetime NOT NULL
  17.     , Product varchar(100) NOT NULL
  18.     , Amount money NOT NULL
  19. )
  20. GO
  21.  
  22. -- inserting values for table 1
  23. INSERT INTO dbo.InternetSales
  24.     (SaleDate, Product, Amount)
  25. VALUES
  26.     ('01/01/2014', 'A', 100)
  27.     , ('02/01/2014', 'B', 100)
  28.     , ('03/01/2014', 'B', 100)
  29.     , ('04/01/2014', 'B', 100)
  30.     , ('05/01/2014', 'C', 100)
  31.     , ('05/01/2014', 'C', 100)
  32.  
  33. -- inserting values for table 2
  34. INSERT INTO dbo.ResellerSales
  35.     (SaleDate, Product, Amount)
  36. VALUES
  37.     ('01/01/2014', 'A', 100)
  38.     , ('02/01/2014', 'B', 100)
  39.     , ('03/01/2014', 'B', 100)
  40.     , ('04/01/2014', 'D', 100)
  41.     , ('05/01/2014', 'D', 100)
  42.  
  43. -- checking INTERSECT, will return only A, B
  44. -- If duplicates were included, should return A, B, B
  45. SELECT Product FROM dbo.InternetSales
  46. INTERSECT
  47. SELECT Product FROM dbo.ResellerSales
  48.  
  49. -- Adding ROW_NUMBER for creating unique numbers
  50. -- And using CTE for removing the number
  51. -- This returns A, B, B
  52. WITH cte
  53. AS
  54. (
  55.     SELECT
  56.         ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
  57.         , Product
  58.     FROM dbo.InternetSales
  59.     INTERSECT
  60.     SELECT
  61.         ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
  62.         , Product
  63.     FROM dbo.ResellerSales
  64. )
  65. SELECT Product FROM cte;

EXCEPT and EXCEPT ALL
EXCEPT returns all distinct rows found in “left” (or first) set that are not found in “right” (or second) set. This discards duplicates too. If duplicates are required, same workaround can be applied. Here is the code for it;

  1. -- checking EXCEPT, will return only C
  2. -- If duplicates were included, should return B, C, C
  3. SELECT Product FROM dbo.InternetSales
  4. EXCEPT
  5. SELECT Product FROM dbo.ResellerSales
  6.  
  7. -- Adding ROW_NUMBER for creating unique numbers
  8. -- And using CTE for removing the number
  9. -- This returns B, C, C (Duplicates)
  10. WITH cte
  11. AS
  12. (
  13.     SELECT
  14.         ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
  15.         , Product
  16.     FROM dbo.InternetSales
  17.     EXCEPT
  18.     SELECT
  19.         ROW_NUMBER() OVER (PARTITION BY Product ORDER BY (SELECT 0)) AS number
  20.         , Product
  21.     FROM dbo.ResellerSales
  22. )
  23. SELECT Product FROM cte;
  24.  
  25. -- Cleaning
  26. DROP TABLE dbo.InternetSales
  27. DROP TABLE dbo.ResellerSales

No comments: