db2how to

DB2 V10 Adaptive Compression

Last February, I wrote about DB2’s compression.  If that was McLaren, with DB2 V9.7’s compression, IBM upgraded to Lamborghini Aventador. Apparently IBM’s thirst didn’t stop there, they went one more step ahead and got the Bugatti Veyron with DB2 V10’s adaptive compression.  If you didn’t like the analogy, then I will put it in simple terms – DB2 V10 adaptive compression is compression on steroids.

On V9.5, I have noticed around 70% compression, with V9.7, it slightly increased to 75% compression. But then V9.7 was able to compress the indexes as well.  Galileo (a.k.a V10) adds some more juice to this, it now has adaptive compression.

So what is adaptive compression ?  Prior to V10, IBM used global pattern replacement algorithm. So if the value is found on the global dictionary its compressed. But with adaptive compression, besides compressing based on global pattern, DB2 will now look within a data page to see if there are further compression opportunities by finding locally repeating patterns within a data page. This increased the chance of compressing the data exponentially.  I have seen compression ratio of 85 % on certain tables.

I am not going to explain how the wheels move and turns and all that stuff. Most of us don’t need to know how the algorithm works.  So I am just going to show you how to use it and reap the benefit. After all thats what most of us are after. If you need to know how the algorithm works, I am sure there are many sites that explain it.

First of all  in order to use the compression feature you need to have the license for Storage Optimization or have AESE license. You can do that by doing a db2licm -l and see if the storage optimization is licensed or you have a AESE edition.

Let’s get started and rev up the engine:

1. If you are migrating from previous version of DB2 and /or want to see if you are already compressing tables :

SELECT SUBSTR(CREATOR, 1, 10) AS TABSCHEMA,
SUBSTR(NAME, 1, 30) AS TABNAME,
COMPRESSION, ROWCOMPMODE
FROM SYSIBM.TABLES
WHERE CREATOR NOT LIKE ‘SYS%’;
 

2. To find out how much you will save by compressing in DB2 V10 use the below SQL. Run against the schema you want to evaluate. The currently_pct_pagessaved will show you how many percent of pages are saved currently by compression (if any), the classic_pct_pagessaved shows you how many percent of pages will be saved with classic compression under V10 and the adaptive_pct_pagessaved show you how many percent of pages will be saved with adaptive compression. The storagesize_MB shows you how large the table is :

select tc.TABSCHEMA,tc.TABNAME, tc.ROWCOMPMODE ,
sum(tc.PCTPAGESSAVED_CURRENT) as CURRENTLY_PCTPAGESSAVED ,
sum(tc.PCTPAGESSAVED_STATIC) as CLASSIC_PCT_PAGESSAVED,
sum(tc.PCTPAGESSAVED_ADAPTIVE)as ADPATIVE_PCT_PAGESAVED,
SUM(TI.DATA_OBJECT_P_SIZE)/1024 AS STORAGESIZE_MB
from table(sysproc.admin_get_tab_compress_info(‘RAJU’,”)) TC
join TABLE (sysproc.admin_get_tab_info(‘RAJU’, ”)) TI
ON TC.TABSCHEMA = TI.TABSCHEMA AND TC.TABNAME = TI.TABNAME
group by tc.TABSCHEMA,tc.TABNAME, tc.ROWCOMPMODE
order by 7 desc;

3. Starting from DB2 V9.7 you can compress index as well. Run the following sql to find out how much compression you will get on indexes :

SELECT INDSCHEMA, INDNAME,TABNAME, index_compressed,
      pct_pages_saved, num_leaf_pages_saved 
    FROM TABLE(sysproc.admin_get_index_compress_info(‘T’, ‘RAJU’, ”, NULL, NULL));

 

Now you have the data to take a informed decision. As I mentioned on my previous article on compression , use due diligence before selecting the tables you want to compress.  Generally tables with a read/write ratio of 70 percent or more reads and 30 percent or less writes are good candidates for compression.  If the table gets lot of updates, then you lose cpu cycles on db2 uncompressing and compressing data.   Row compression is a great feature for DSS based systems, where there is lot of data and has more reads than updates.  Row compression performs best in I/O or memory-bound environments where the bottleneck is not on the CPU. Extra CPU cycles are required to perform row compression and expansion of data rows whenever they are accessed or modified. However this overhead can be offset by efficiencies that are gained in doing fewer I/O operations.

Similarly, index compression is another awesome feature , but not all indexes will benefit from compression. Best candidates for index compression are indices with multi column, columns with character data, varchar columns, columns with low cardinality.

4. Once you determine what tables and index to compress , you now need to turn on compression  :

   alter table raju.orders_history compress yes;
   alter index raju.orderhist_lkp1 compress yes;

5. Now that you have altered the objects to turn on compression you can either wait for the dictionary to get build automatically by ADC or run one of utilities that will build the dictionary, I recommend you running a reorg if you can :

 reorg table raju.orders_history index raju.orderhist_lkp2 resetdictionary;
 reorg indexes for all table raju.orders_history;

6. Run the query from step 2 and 3 and see how much space you have saved. Prepared to be amazed.

As always, make sure you get some metrics before and after compressing the tables. Saving storage is great, but whats even more great is saving system resources.  I have noticed I/O wait time go down by 80% in certain cases and logical reads going high because now DB2 can fit more rows in the buffer pools.  CPU cycle does go a bit high,but overall its lower than before because of less I/O operations.

They say a picture says a thousand words.  The below graph shows my % I/O wait before and after I turned on adaptive compression:

You can see how much it has gone down and in some cases it has gone away.  Now do you see why I say its the Bugatti Veyron ! If you haven’t looked into compression, I strongly suggest your try it out.

As always the above is my opinion and I strongly urge you to test it out in your environment to see what benefits you get. I am sure you won’t be disappointed.