Checking on HADR

By now I am sure lot of db2 users have adopted DB2 HADR as one of their high availability solution.  IBM has matured HADR over the last two versions and its becoming more and more robust.  Couple of years ago I had written an article on how to setup HADR which I know many of you used. The instruction hasn’t changed over the versions, but several of you have recently asked me for a script to check HADR status. Here is a simple script that will check on HADR status on each database on the instance and email the status.

As always verify the script  before you implement it in your system. You can download the script here : hadr_checker.sh

Please feel free to comment on the script and ways to improve it.

 

Posted in database, db2, how to | Tagged , , , , , | 1 Comment

A quick overview of DB2′s INGEST

As a DB2 DBA or as data integrator we all face a challenge in loading large amount of data into DB2 database while up keeping the 24×7 availability of the tables.  Traditionally we use LOAD or sometimes IMPORT while loading large amount of data to tables. Either of them has data concurrency and availability issue. While INSERTS gives the highest availability, it can be painfully slow when inserting millions of rows.

Brief Intro :

INGEST from DB2 10 is fast and it enables us to continuously  load  data while giving high availability and recoverability. INGEST is a cross between LOAD and IMPORT. INGEST is optimized to be multithread to process data in parallel.  Unlike LOAD and IMPORT,  INGEST can run continuously reading a data stream through a pipe. This is great, if you are loading live feeds of data from a social media stream or from your  production stream.  Another feature of INGEST is, it can read multiple files as its source to load the data. Apart from the traditional INSERT statement, it can do UPDATE, MERGE and DELETE. This comes in handy if the data you are loading  contain  updates to existing rows .

How does it work ?

INGEST has three threads/parts :

1. Transporter :  This thread reads from the input files or data stream pipe s in parallel and sets the thread to formatter. DB2 spawns one thread for each input files.

2. Formatter: Formatter threads formats and prepares the data to be send to Flusher. Formatter threads make sure the data is set in such a way that INGEST can insert into DB2 in parallel. The number of formatter threads is specified by the num_formatters configuration parameter.

3. Flusher:  Flusher threads insert the data into the DB2 table. If you are using INGEST to load into a partitioned database, then a flusher thread is associated with each partition.  There could be multiple flushers running to increase the throughput of the data ingest. The number of flushers for each partition is specified by the num_flushers_per_partition configuration parameter.

Like I said its completely designed to be run efficiently and give high availability.

Can it recover from a failure ?

Absolutely ! INGEST is capable of restarting from where it failed. It literally uses INSERT statements to load the data. With INGEST, you can specify dumpfile to place any rejected rows due to formatting error or invalid character, exception table to place any rows with constraint violations, warning counts to stop if the threshold is met.  You can also specify commit count in rows or by time. This comes in handy when loading data continuously from a stream. If the INGEST utility gets aborted for any reasons (power failure or somebody accidentally killing the thread, you can restart the INGEST with RESTART option and DB2 will resume the INGEST from the last commit point. If do not want to restart INGEST you can issue the RESTART TERMINATE and DB2 will clean up the failed INGEST. Unlike with the TERMINATE parameter of the LOAD command, data already committed by the failed INGEST command is not removed from the table.

What all format can it handle ?

INGEST utlitiy  supports the following input data formats:

Delimited text
Positional text and binary
Columns in various orders and formats

How about performance ?

Obviously this is one of the most sacred part for the DBA when choosing the option to load lot of data.  The best performance is always LOAD, but it doesn’t give high availability. INGEST even though under the covers issues INSERTS, I have noticed that when loading hundred thousands of rows INGEST was able to perform 40% faster than traditional INSERT. The reason why I compared to INSERT, is because its the only option that gives me high availability.

How is it run ?

Assume we have a coma delimited customer file as follows that needs to be loaded :

10001,”Edward”,”Williams”,”12349 West Ave St”,”Taos”,”NM”,”564-573-4839″
10002,”James”,”Miller”,”43348 Knight St”,”Dumont”,”NJ”,”201-873-4239″
10003,”Jessica”,”Robertson”,”489 River N Rd”,”Austin”,”TX”,”512-534-9859″
:      :     :    :    :   :   :    :    :   :    :   :      :     :    :    :   :   :    :    :   :    :  :      :     :    :
90000,”Kathy”,”Rodriguez”,”8293 Brooksheild Apt, # 201″,”Anahiem”,”CA”,”714-354-775

The customer table is :
CNumber   INT          —> primary key
FName       varchar(50)
LName       varchar(50)
Street         varchar(100)
City             varchar(50)
State           char(2)
Telephone  char(12)

Lets see our options to INGEST the data:

This is the simplest form :
ingest from file customer.del format delimited insert into raju.customer;

If you want to have more control or if the data is in different order than the table column, then you can do this . Note here I have flipped the firstname and lastname on the VALUES:

ingest from file customer.del format delimited
($cid integer external
,$firstname char(50)
,$lastname char(50)
,$streetname char(100)
,$cityname char(50)
,$statename char(2)
,$telephone char(12)
)
dumpfile customer_dump.del
exception table raju.customerexcept
warningcount 100
messages ingest_customer.msg
insert into raju.customer(cnumber,fname,lname,street,city,state,telephone)
values($cid,$lastname,$firstname,$streetname,$cityname,$statename,$telephone);

The exception table should be in place before you run this. You can create the table with :
create table raju.customerexcept like raju.customer  or run the same ddl you used for the customer table.

If your data contains updates to existing rows as well as new rows, you can use MERGE statement on your INGEST :

ingest from file customer_monday.del, customer_tuesday.del format delimited
($cid integer external
,$firstname char(50)
,$lastname char(50)
,$streetname char(100)
,$cityname char(50)
,$statename char(2)
,$telephone char(12)
)
dumpfile customer_dump.del
warningcount 100
messages ingest4.msg
merge into raju.customer on (cnumber = $cid)
when not matched then
insert (cnumber,fname,lname,street,city,state,telephone)
values($cid,$firstname,$lastname,$streetname,$cityname,$statename,$telephone)
when matched then
update set (fname,lname,street,city,state,telephone) =
($firstname,$lastname,$streetname,$cityname,$statename,$telephone);

Note that in the above example, I have two input files.  I have also removed the exception table. DB2 does not allow to use exception table option when using MERGE option. Don’t ask me why they don’t allow ?

If you have a list of rows that you want to delete, then you can do as well :

ingest from file customer1.del format delimited
($cid integer external
,$firstname char(50)
,$lastname char(50)
,$streetname char(100)
,$cityname char(50)
,$statename char(2)
,$telephone char(12)
)
delete from raju.customer
where cnumber = $cid;

How do we restart ?

You can restart a terminated INGEST with the RESTART CONTINUE <job_id> option

If you want the ability to easily restart a terminated INGEST use the option RESTART NEW <job_id> on your original INGEST command like :

ingest from file customer_monday.del, customer_tuesday.del format delimited
($cid integer external
,$firstname char(50)
,$lastname char(50)
,$streetname char(100)
,$cityname char(50)
,$statename char(2)
,$telephone char(12)
)
dumpfile customer_dump.del
warningcount 100
messages ingest4.msg
restart new ‘customeringest1′
merge into raju.customer on (cnumber = $cid)
when not matched then
insert (cnumber,fname,lname,street,city,state,telephone)
values($cid,$firstname,$lastname,$streetname,$cityname,$statename,$telephone)
when matched then
update set (fname,lname,street,city,state,telephone) =
($firstname,$lastname,$streetname,$cityname,$statename,$telephone);

The RESTART NEW option will help you identify the ingest command that got terminated. Once you identify the ingest that got terminated, you can restart it by issuing RESTART CONTINUE

ingest from file customer_monday.del, customer_tuesday.del format delimited
($cid integer external
,$firstname char(50)
,$lastname char(50)
,$streetname char(100)
,$cityname char(50)
,$statename char(2)
,$telephone char(12)
)
dumpfile customer_dump.del
warningcount 100
messages ingest4.msg
restart continue ‘customeringest1′
merge into raju.customer on (cnumber = $cid)
when not matched then
insert (cnumber,fname,lname,street,city,state,telephone)
values($cid,$firstname,$lastname,$streetname,$cityname,$statename,$telephone)
when matched then
update set (fname,lname,street,city,state,telephone) =
($firstname,$lastname,$streetname,$cityname,$statename,$telephone);

The RESTART CONTINUE option will enable us to restart INGEST from the last commit point.

If you want to terminate the utility and want to clean up from the previously failed run, then you can use  RESTART TERMINATE <job_id> option just like the RESTART CONTINUE option.

How can I monitor the utility ?

There are few ways to watch what INGEST utility is doing .  IBM has provided few options:

1. INGEST GET STATS
2. INGEST LIST or
3 you can query TABLE(MON_GET_CONNECTION(NULL,NULL)) and look for the application_name DB2_INGEST

Conclusion

While INGEST has a lot of room for improvement, its a great utility to have when compared to other options. Its easy to use and keeps up the availability while inserting large amount of data. I also like the fact that we can use it to delete rows for a list of values.  I hate to use large IN clause in my delete statements or have to load the key into a temporary table scenario. Few limitation with INGEST you need to aware of , currently INGEST do now allow LOBs, XML, structured types, has lot of restrictions on generated columns. I definitely recommend anyone to try out the INGEST utility and see how you can benefit from it. After all you paid for the product, so the least you can do is try it. Trust me if you use it right, you will love it !

 

Posted in database, db2, how to, unix | Tagged , , , , , , , , , , , , , | Leave a comment

Happy New Year !

To all my readers… Wish y’all a Very Happy New Year !

Posted in db2 | Leave a comment

DB2 V10 Adaptive Compression

Last February, I wrote about DB2′s compression.  If that was McLaren, with DB2 V9.7′s compression, IBM upgraded to Lamborghini Aventador. Apparently IBM’s thirst didn’t stop there, they went one more step ahead and got the Bugatti Veyron with DB2 V10′s adaptive compression.  If you didn’t like the analogy, then I will put it in simple terms – DB2 V10 adaptive compression is compression on steroids.

On V9.5, I have noticed around 70% compression, with V9.7, it slightly increased to 75% compression. But then V9.7 was able to compress the indexes as well.  Galileo (a.k.a V10) adds some more juice to this, it now has adaptive compression.

So what is adaptive compression ?  Prior to V10, IBM used global pattern replacement algorithm. So if the value is found on the global dictionary its compressed. But with adaptive compression, besides compressing based on global pattern, DB2 will now look within a data page to see if there are further compression opportunities by finding locally repeating patterns within a data page. This increased the chance of compressing the data exponentially.  I have seen compression ratio of 85 % on certain tables.

I am not going to explain how the wheels move and turns and all that stuff. Most of us don’t need to know how the algorithm works.  So I am just going to show you how to use it and reap the benefit. After all thats what most of us are after. If you need to know how the algorithm works, I am sure there are many sites that explain it.

First of all  in order to use the compression feature you need to have the license for Storage Optimization or have AESE license. You can do that by doing a db2licm -l and see if the storage optimization is licensed or you have a AESE edition.

Let’s get started and rev up the engine:

1. If you are migrating from previous version of DB2 and /or want to see if you are already compressing tables :

SELECT SUBSTR(CREATOR, 1, 10) AS TABSCHEMA,
SUBSTR(NAME, 1, 30) AS TABNAME,
COMPRESSION, ROWCOMPMODE
FROM SYSIBM.TABLES
WHERE CREATOR NOT LIKE ‘SYS%’;
 

2. To find out how much you will save by compressing in DB2 V10 use the below SQL. Run against the schema you want to evaluate. The currently_pct_pagessaved will show you how many percent of pages are saved currently by compression (if any), the classic_pct_pagessaved shows you how many percent of pages will be saved with classic compression under V10 and the adaptive_pct_pagessaved show you how many percent of pages will be saved with adaptive compression. The storagesize_MB shows you how large the table is :

select tc.TABSCHEMA,tc.TABNAME, tc.ROWCOMPMODE ,
sum(tc.PCTPAGESSAVED_CURRENT) as CURRENTLY_PCTPAGESSAVED ,
sum(tc.PCTPAGESSAVED_STATIC) as CLASSIC_PCT_PAGESSAVED,
sum(tc.PCTPAGESSAVED_ADAPTIVE)as ADPATIVE_PCT_PAGESAVED,
SUM(TI.DATA_OBJECT_P_SIZE)/1024 AS STORAGESIZE_MB
from table(sysproc.admin_get_tab_compress_info(‘RAJU’,”)) TC
join TABLE (sysproc.admin_get_tab_info(‘RAJU’, ”)) TI
ON TC.TABSCHEMA = TI.TABSCHEMA AND TC.TABNAME = TI.TABNAME
group by tc.TABSCHEMA,tc.TABNAME, tc.ROWCOMPMODE
order by 7 desc;

3. Starting from DB2 V9.7 you can compress index as well. Run the following sql to find out how much compression you will get on indexes :

SELECT INDSCHEMA, INDNAME,TABNAME, index_compressed,
      pct_pages_saved, num_leaf_pages_saved 
    FROM TABLE(sysproc.admin_get_index_compress_info(‘T’, ‘RAJU’, ”, NULL, NULL));

 

Now you have the data to take a informed decision. As I mentioned on my previous article on compression , use due diligence before selecting the tables you want to compress.  Generally tables with a read/write ratio of 70 percent or more reads and 30 percent or less writes are good candidates for compression.  If the table gets lot of updates, then you lose cpu cycles on db2 uncompressing and compressing data.   Row compression is a great feature for DSS based systems, where there is lot of data and has more reads than updates.  Row compression performs best in I/O or memory-bound environments where the bottleneck is not on the CPU. Extra CPU cycles are required to perform row compression and expansion of data rows whenever they are accessed or modified. However this overhead can be offset by efficiencies that are gained in doing fewer I/O operations.

Similarly, index compression is another awesome feature , but not all indexes will benefit from compression. Best candidates for index compression are indices with multi column, columns with character data, varchar columns, columns with low cardinality.

4. Once you determine what tables and index to compress , you now need to turn on compression  :

   alter table raju.orders_history compress yes;
   alter index raju.orderhist_lkp1 compress yes;

5. Now that you have altered the objects to turn on compression you can either wait for the dictionary to get build automatically by ADC or run one of utilities that will build the dictionary, I recommend you running a reorg if you can :

 reorg table raju.orders_history index raju.orderhist_lkp2 resetdictionary;
 reorg indexes for all table raju.orders_history;

6. Run the query from step 2 and 3 and see how much space you have saved. Prepared to be amazed.

As always, make sure you get some metrics before and after compressing the tables. Saving storage is great, but whats even more great is saving system resources.  I have noticed I/O wait time go down by 80% in certain cases and logical reads going high because now DB2 can fit more rows in the buffer pools.  CPU cycle does go a bit high,but overall its lower than before because of less I/O operations.

They say a picture says a thousand words.  The below graph shows my % I/O wait before and after I turned on adaptive compression:

You can see how much it has gone down and in some cases it has gone away.  Now do you see why I say its the Bugatti Veyron ! If you haven’t looked into compression, I strongly suggest your try it out.

As always the above is my opinion and I strongly urge you to test it out in your environment to see what benefits you get. I am sure you won’t be disappointed.

Posted in db2, how to | Tagged , , , , , , , , , , , , , , , , , , | Leave a comment