Friday, September 10, 2010

SQL Server data compression: PAGE

Although the storage cost is going down, we still think about storage optimization because once optimized, it is not only about the cost, it improves many things. One of the features available with SQL Server 2008 Enterprise Edition (and Developer Edition) is data compression. SQL Server supports compression on both ROW and PAGE; this discusses data compression on pages.

The PAGE compression minimizes the storage requirement for redundant data. This compression is applied for each page. Once enabled, SQL Server goes through two techniques; Prefix Compression and Dictionary Compression.

Prefix Compression: This identifies a value for each column that can be used for compression. Then a row that contains values for each column is created below the page header. Next, values in columns will be replaced with the reference to the value in newly created row. Note that column can only reference a value created for it, not for other columns. See the image taken from BOL. It shows a table with three columns. You can see how the column values are fully and partially linked with the references.
page compression

Dictionary Compression: This compresses the page further, finding the repeated values anywhere in the page and storing them in same way. Unlike prefix compression, any column value can reference any value added, not limiting to the value created for it. See the image taken from BOL.
page compression2

Now let’s see how we enable this. Create the following table and insert records as given.

CREATE TABLE TestTable
(
    Id int IDENTITY(1,1) PRIMARY KEY
    , Value varchar(50) NOT NULL DEFAULT ('This is the default value')
)
GO
 
INSERT INTO TestTable
    (Value) VALUES (DEFAULT)
GO 100000

Retrieve all data, enabling STATISTICS IO on. You will notice that data is held with 523 pages.

sp_estimate_data_compression_savings: This system stored procedure allows us to determine the space can be saved when compression is applied. Execute the code given and analyze the result. First stored procedure can be used to find out the space it currently uses. Second shows the estimate data compression savings.

-- check the space used
sp_spaceused 'TestTable'
 
-- determine data compression savings
EXEC sp_estimate_data_compression_savings 'dbo', 'TestTable', NULL, NULL, 'PAGE'

The sp_spaceused says that 4168KB is required for storing data. Note the column “size_with_requested_compression_settings” coming from second stored procedure. It says that the size after compression would be 1096KB, encouraging us to go ahead with compression.

Enabling PAGE compression: This can be done with either CREATE TABLE or ALTER TABLE. Use below code for compressing the table. You can pass NONE, ROW, or PAGE for DATA_COMPRESSION.

ALTER TABLE TestTable
REBUILD WITH (DATA_COMPRESSION = PAGE)

Now the data is compressed. Check retrieving records with STATISTICS IO on, you will notice that only 139 pages are used now. Check the space used too. Has it gone down to 1096KB?

Disadvantages: Are there? Yes, there is. This increases the CPU cost for both reading and modifying. If you are to enable, consider this factor too.

No comments: