Tuesday, February 23, 2010

Understanding the GROUPING SETS

SQL Server 2008 introduced GROUPING SETS as an extension to the GROUP BY, facilitating us to combine more than one group into a single result set. We can think it is as, combining multiple grouped aggregations into a single result set by using UNION ALL.

This is almost similar to CUBE and ROLLUP operators but easy to implement and offers best performance. Not only that, it is ANSI SQL 2006 compliant too.

Let’s understand this. The below code creates a SalesTransactions table and inserts 6 records. Then query to see the Sales Amount by (1.) Year, (2.) Customer, (3.) Year, Customer and Product. Result is shown too.

   1: USE tempdb
   2: GO
   3:  
   4: CREATE TABLE SalesTransactions
   5: (    [Year] smallint
   6:     , Customer varchar(30)
   7:     , Product varchar(30)
   8:     , Amount money
   9: )
  10: GO
  11: INSERT INTO SalesTransactions VALUES
  12:     (2001, 'Hendergart', 'Mountain-100 Black, 42', 100)
  13:     ,(2001, 'Hendergart', 'AWC Logo Cap, 42', 200)
  14:     ,(2001, 'Hendergart', 'Long-Sleeve Logo Jersey, M', 600)
  15:     ,(2001, 'Collins', 'Long-Sleeve Logo Jersey, M', 400)
  16:     ,(2002, 'Collins', 'Mountain-100 Black, 42', 300)
  17:     ,(2002, 'Collins', 'Mountain-100 Black, 42', 100)
  18:  
  19: SELECT [Year], SUM(Amount) Amount
  20: FROM SalesTransactions
  21: GROUP BY [Year]
  22:  
  23: SELECT Customer, SUM(Amount) Amount
  24: FROM SalesTransactions
  25: GROUP BY Customer
  26:  
  27: SELECT [Year], Customer, Product, SUM(Amount) Amount
  28: FROM SalesTransactions
  29: GROUP BY [Year], Customer, Product

2
If all grouped aggregations need to be compiled into a single result set, we can use UNION ALL as below.

   1: SELECT [Year], NULL AS Customer, NULL AS Product, SUM(Amount) AS Amount
   2: FROM SalesTransactions
   3: GROUP BY [Year]
   4: UNION ALL
   5: SELECT NULL AS [Year], Customer, NULL AS Product, SUM(Amount) AS Amount
   6: FROM SalesTransactions
   7: GROUP BY Customer
   8: UNION ALL
   9: SELECT [Year], Customer, Product, SUM(Amount) AS Amount
  10: FROM SalesTransactions
  11: GROUP BY [Year], Customer, Product
union

Okay, that’s the old technique, see how easily we can generate the same result set by using GROUPING SET. All you have to do is, add required grouping in GROUPING SET.

   1: SELECT [Year], Customer, Product, SUM(Amount) AS Amount
   2: FROM SalesTransactions
   3: GROUP BY GROUPING SETS ([YEAR], (Customer), ([YEAR], Customer, Product))
grouping

I see a small issue with this result set. Too much of NULLs. Do you see it too? These NULLs are not real NULLs, how can we identify whether the NULLs in the result set are real or not? There are two ways, you can use either GROUPING function that accepts one parameter as the column name or GROUPING_ID that accepts multiple columns as a parameter. First function returns 0 or 1 that indicates whether it is a real NULL or result of the GROUPING respectively. The second function returns a bitmask that shows whether passed columns contains NULLs or not.

   1: SELECT [Year], Customer, Product, SUM(Amount) AS Amount
   2:     , GROUPING_ID([Year], Customer, Product) AS Bitmask
   3:     , GROUPING([Year]) AS YearGrouping
   4:     , GROUPING(Customer) AS CustomerGrouping
   5:     , GROUPING(Product) AS ProductGrouping
   6: FROM SalesTransactions
   7: GROUP BY GROUPING SETS ([YEAR], (Customer), ([YEAR], Customer, Product))

1 comment:

Anonymous said...
This comment has been removed by a blog administrator.