Monday, December 28, 2015

GUID or Uniqueidentifier as the key - II - Issues with it

My first post related to GUID discussed the main issue we face when using GUID as the key, specifically as the Clustered Key. Here is another concern we should consider when maintaining a table with GUID as the key. This issue does need to be considered as a critical issue if the table contains low number of records. But if it contains millions of records, this is something we need to consider for sure.

Let's create two tables just as the first post. The below code creates two tables named dbo.Sales_WithGUID and dbo.Sales_WithoutGUID and loads same data set for both tables.

USE tempdb;
GO

IF OBJECT_ID('dbo.Sales_WithGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithGUID;
GO

-- Creating the table
CREATE TABLE dbo.Sales_WithGUID
(
 SalesKey uniqueidentifier PRIMARY KEY DEFAULT(NewID()),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

INSERT INTO dbo.Sales_WithGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
FROM AdventureWorksDW2014.dbo.FactInternetSales;


IF OBJECT_ID('dbo.Sales_WithoutGUID') IS NOT NULL
 DROP TABLE dbo.Sales_WithoutGUID;
GO

-- Create the second table
CREATE TABLE dbo.Sales_WithoutGUID
(
 SalesKey int identity(1,1),
 CompanyCode char(2) Default ('SL'),
 constraint pk_Sales_WithoutGUID primary key (SalesKey, CompanyCode),
 ProductKey int NOT NULL,
 OrderDateKey int NOT NULL,
 DueDateKey int NOT NULL,
 ShipDateKey int NOT NULL,
 CustomerKey int NOT NULL,
 PromotionKey int NOT NULL,
 CurrencyKey int NOT NULL,
 SalesTerritoryKey int NOT NULL,
 SalesOrderNumber nvarchar(20) NOT NULL,
 SalesOrderLineNumber tinyint NOT NULL,
 RevisionNumber tinyint NOT NULL,
 OrderQuantity smallint NOT NULL,
 UnitPrice money NOT NULL,
 ExtendedAmount money NOT NULL,
 UnitPriceDiscountPct float NOT NULL,
 DiscountAmount float NOT NULL,
 ProductStandardCost money NOT NULL,
 TotalProductCost money NOT NULL,
 SalesAmount money NOT NULL,
 TaxAmt money NOT NULL,
 Freight money NOT NULL,
 CarrierTrackingNumber nvarchar(25) NULL,
 CustomerPONumber nvarchar(25) NULL,
 OrderDate datetime NULL,
 DueDate datetime NULL,
 ShipDate datetime NULL
);
GO

INSERT INTO dbo.Sales_WithoutGUID
  (
   ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
    ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
    ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
    ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
    ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
  )
SELECT ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey
      ,CurrencyKey, SalesTerritoryKey, SalesOrderNumber, SalesOrderLineNumber, RevisionNumber
      ,OrderQuantity, UnitPrice, ExtendedAmount, UnitPriceDiscountPct, DiscountAmount
      ,ProductStandardCost, TotalProductCost, SalesAmount, TaxAmt, Freight
      ,CarrierTrackingNumber, CustomerPONumber, OrderDate, DueDate, ShipDate
FROM AdventureWorksDW2014.dbo.FactInternetSales;

Let's check the space used by both tables.


As you see, first table which is based on GUID consumes 10.7MB for data and 80KB for indexes. Second table which is based on int+char(2) as the key consumes 10.0MB for data and 56KB for indexes. Now, let's additional non-clustered indexes for both tables.

CREATE INDEX IX_Sales_WithGUID_ShipDateKey ON dbo.Sales_WithGUID (ShipDateKey);
CREATE INDEX IX_Sales_WithGUID_CurrencyKey ON dbo.Sales_WithGUID (CurrencyKey);
CREATE INDEX IX_Sales_WithGUID_CustomerKey ON dbo.Sales_WithGUID (CustomerKey);
CREATE INDEX IX_Sales_WithGUID_DueDateKey ON dbo.Sales_WithGUID (DueDateKey);
CREATE INDEX IX_Sales_WithGUID_OrderDateKey ON dbo.Sales_WithGUID (OrderDateKey);
CREATE INDEX IX_Sales_WithGUID_ProductKey ON dbo.Sales_WithGUID (ProductKey);
CREATE INDEX IX_Sales_WithGUID_PromotionKey ON dbo.Sales_WithGUID (PromotionKey);

CREATE INDEX IX_Sales_WithoutGUID_ShipDateKey ON dbo.Sales_WithoutGUID (ShipDateKey);
CREATE INDEX IX_Sales_WithoutGUID_CurrencyKey ON dbo.Sales_WithoutGUID (CurrencyKey);
CREATE INDEX IX_Sales_WithoutGUID_CustomerKey ON dbo.Sales_WithoutGUID (CustomerKey);
CREATE INDEX IX_Sales_WithoutGUID_DueDateKey ON dbo.Sales_WithoutGUID (DueDateKey);
CREATE INDEX IX_Sales_WithoutGUID_OrderDateKey ON dbo.Sales_WithoutGUID (OrderDateKey);
CREATE INDEX IX_Sales_WithoutGUID_ProductKey ON dbo.Sales_WithoutGUID (ProductKey);
CREATE INDEX IX_Sales_WithoutGUID_PromotionKey ON dbo.Sales_WithoutGUID (PromotionKey);

Here is the result now.


Focus on the space used for indexes. First table uses 11MB while second table uses only 7MB. Note that this is only for 60,000 records but if the table has more records, gap will be significantly huge.

How can it be happened when we have same number of records and same types of indexes. This is because of every non-clustered key value is maintained with the clustered key. When it comes to the first table, since it is 16 bytes, the size of the non-clustered key value will be 16+4. But with the second table, it is 6+4 bytes. This is the reason for high usage of the spave for non-clustered indexes.

Here is the code that shows the number of data pages used by each index. Large number of pages increases IO operations and usage of resources.

SELECT OBJECT_NAME (i.object_id) AS TableName 
 , i.name AS IndexName
 , ps.index_id AS IndexID 
 , ps.alloc_unit_type_desc AS DataStructure
 , ps.page_count AS Pages
 , ps.record_count AS Rows
FROM sys.indexes AS i
 CROSS APPLY sys.dm_db_index_physical_stats 
  (DB_ID ()
  , i.object_id
  , NULL
  , NULL
  , N'DETAILED') AS ps
WHERE i.object_id = ps.object_id
  AND i.index_id = ps.index_id
  AND i.object_id 
   IN (OBJECT_ID (N'Sales_WithGUID')
    , OBJECT_ID (N'Sales_WithoutGUID'))
  AND ps.index_level = 0 -- only lef level
ORDER BY TableName, IndexID;
GO


Now you see issue we may face when using GUIDs, hence use it knowing these issues and manage it properly if it is really required.

No comments: