DB2 Compression

Sales Pitch
Deep compression as IBM likes to call it, is a nice little feature available in DB2 UDB since V9.1.  It has grown better and better as each versions goes by.  In DB2 v9.7, the index and the temporary tables are compressed.

So what all does compression bring to the table ?  Is it just space savings ?  Yes, but thats not all.  It indirectly increases your buffer pool capacity, decreases you I/O requirements and give you better over all CPU performance.

With row compression, every data page can now contain more rows. Thus, the workload queries on a data compressed table need fewer I/Os to access the same amount of data. Moreover, buffer pool hit-ratios are improved with more (compressed) data rows residing on a data page. This leads to performance improvements. For queries that handle data in a less sequential fashion because the result set is made up of few rows (as in an OLTP environment), the performance benefits of row compression might not be quite as great.

Row compression performs it’s best in I/O bound environments. CPU cycles are required to perform row compression and expansion. Extra CPU cycle usage can be offset by efficiencies gained in doing fewer I/Os. This feature works very well on decision support workloads composed of complex select queries where I/O access is mostly sequential and less random.

Proof of Concept
Lets take a deep dive into a situation.  I have a DB2 V9.5 warehouse type of database that has some heavy analytical workload. Below is a snapshot of the % of I/O wait by hour we saw on the disk

The storage and system admin tried all sort of stuff that they can.  Once the core workload started, the CPU on the machine was constantly at 98-99 % due to I/O waits.  We determined that we required around 75 additional disks to distribute the data . So it was either that or compression.  I decided to compress the data to see what happened.

We picked all the tables that was more than 1 GB and compressed them. The results were outstanding. We had an average 70% compression ratio , 90% improvement on linear workload timings and overall CPU improvement of 35%   on unit of work . Most of all a mind blowing reduction in I/O contention. Below is a snapshot of % of  I/O waits by hour after the compression.

The real deal, how to do it:
I will tell you what all steps are required to compress tables in a database that compression license enabled.

  1. Identify all the tables that you want compressed. Naturally, the largest tables will be easy candidates for selection to be compressed, but do not overlook smaller tables. If you have hundreds or thousands of smaller tables, you may find a benefit resulting from the aggregate effect of compression over many smaller tables. As a rule of thumb, compress anything that is more than 1 MB.  Use this sql to choose your candidates :
    select tabschema,tabname,data_object_p_size from sysibmadm.admintabinfo where data_object_p_size > 1024 and dictionary_size = 0 and tabschema = ‘DSS’
    This will give you the list of all tables that are more than 1 MB in size that doesn’t have a compression dictionary build for the schema DSS
  2. Estimate compression ratio for the tables. This way you can get an idea on how much storage  you can save by compressing the table.  To get this information you can either run the db2 inspect command followed by db2inspf of run an sql using the administrative view.  I recommend the sql route since thats way too simple.
    select * from table (sysproc.admin_get_tab_compress_info(‘DSS’,’ ‘,’REPORT’)) as TAB;
  3. Create a baseline for comparison.  You should get a baseline performance metrics for BP Hit Ratio,  Logical and Physical reads, log usage, cpu usage, i/o waits, uow runtimes, etc.
  4. Enabling compression for tables.  Once you have identified the tables that you want to compress. You need to turn on compression for the tables. You can do so by altering the table to enable compression.  An example is :
    alter table DSS.ORDER_HISTORY compress yes;
  5. Compressing the table.  In V9.5 there are two choices, either to run an offline reorg or use ADC (Automatic Dictionary Creation).
    You can run the “classic” or offline reorg with RESETDICTIONARY option to build the compression dictionary . When a compression dictionary is being built, a 10 MB temporary in-memory buffer is allocated from the utilities heap in order to hold data records that are sampled by the algorithm used to create the compression dictionary. Once the dictionary is created this memory is released.
    reorg table DSS.ORDER_HISTORY resetdictionary;
    In DB2 Version 9.5, you can take advantage of automatic compression dictionary creation (ADC). You may have a table that already has more than two megabytes of data pages allocated to it when the ALTER TABLE command is used to enable compression and no data compression dictionary exists in the table. In this case, once a table is enabled for compression, the compression dictionary is created the next time the table grows (i.e. when a new data page is allocated to the table). If you are creating a new table and have enabled it for compression as part of the CREATE TABLE statement, it may be some time before there is sufficient data in the table to automatically create the compression dictionary. The actual time the data compression dictionary is created will vary dependent on the amount of data added to the table. When automatically creating a dictionary, a short delay may occur during a table growth operation due to the creation of the compression dictionary.
    I suggest running an offline reorg if you can afford it.  It is the optimal way to build the compression dictionary.
  6. Monitoring the compression.  Now that you have compressed the data, its time to keep tab on the statistics. You should now compare how your database is performing with the baseline you too earlier.  You can also keep track of your compression ratio by running the sql:
    select * from table (sysproc.admin_get_tab_compress_info(‘DSS’,’ ‘,’REPORT’)) as TAB;
  7. Reclaim storage saved from compression.  You can now resize or reduce the tablespaces to release excess saved storage.

Conclusion

Compression is definitely a feature that DBAs must try out.  When wisely used, compression will bring lot of benefits to your database system.   Choose tables that are larger than 1 MB what gets more sequential reads than writes.  From DB2 V9.7, indexes and temporary tables can be compressed.  Compression license comes with the advanced enterprise edition or can be separately bought for the enterprise edition.

About rpillai

I am a technology enthusiasts and love to work with databases and other technology. Learning new things everyday and don't think the path ever ends ...
This entry was posted in database, db2, how to and tagged , , , , , , , , , , , , , . Bookmark the permalink.