{"id":1388,"date":"2013-01-31T09:15:50","date_gmt":"2013-01-31T03:15:50","guid":{"rendered":"http:\/\/techsatwork.com\/blog\/?p=1388"},"modified":"2016-01-07T11:31:03","modified_gmt":"2016-01-07T05:31:03","slug":"a-quick-overview-of-db2s-ingest","status":"publish","type":"post","link":"https:\/\/techsatwork.com\/?p=1388","title":{"rendered":"A quick overview of DB2&#8217;s INGEST"},"content":{"rendered":"<p>As a DB2 DBA or as data integrator we all face a challenge in loading large amount of data into DB2 database while up keeping the 24&#215;7 availability of the tables. \u00c2\u00a0Traditionally we use LOAD or sometimes IMPORT while loading large amount of data to tables. Either of them has data concurrency and availability issue. While INSERTS gives the highest availability, it can be painfully slow when inserting millions of rows.<\/p>\n<p><span style=\"color: #99cc00;\"><em>Brief Intro<\/em><em> :<\/em><\/span><\/p>\n<p>INGEST from DB2 10 is fast and it enables us to continuously \u00c2\u00a0load \u00c2\u00a0data while giving high availability and recoverability. INGEST is a cross between LOAD and IMPORT. INGEST is optimized to be multithread to process data in parallel. \u00c2\u00a0Unlike LOAD and IMPORT, \u00c2\u00a0INGEST can run continuously reading a data stream through a pipe. This is great, if you are loading live feeds of data from a social media stream or from your \u00c2\u00a0production stream. \u00c2\u00a0Another feature of INGEST is, it can read multiple files as its source to load the data. Apart from the traditional INSERT statement, it can do UPDATE, MERGE and DELETE. This comes in handy if the data you are loading \u00c2\u00a0contain \u00c2\u00a0updates to existing rows .<\/p>\n<p><span style=\"color: #99cc00;\"><em>How does it work ?<\/em><\/span><\/p>\n<p>INGEST has three threads\/parts :<\/p>\n<p>1. Transporter : \u00c2\u00a0This thread reads from the input files or data stream pipe s in parallel and sets the thread to formatter. DB2 spawns one thread for each input files.<\/p>\n<p>2. Formatter: Formatter threads formats and prepares the data to be send to Flusher. Formatter threads make sure the data is set in such a way that INGEST can insert into DB2 in parallel.\u00c2\u00a0The number of formatter threads is specified by the\u00c2\u00a0num_formatters\u00c2\u00a0configuration parameter.<\/p>\n<p>3. Flusher: \u00c2\u00a0Flusher threads insert the data into the DB2 table. If you are using INGEST to load into a partitioned database, then a flusher thread is associated with each partition. \u00c2\u00a0There could be multiple flushers running to increase the throughput of the data ingest.\u00c2\u00a0The number of flushers for each partition is specified by the num_flushers_per_partition\u00c2\u00a0configuration parameter.<\/p>\n<p>Like I said its completely designed to be run efficiently and give high availability.<\/p>\n<p><span style=\"color: #99cc00;\"><em>Can it recover from a failure ?<\/em><\/span><\/p>\n<p>Absolutely ! INGEST is capable of restarting from where it failed. It literally uses INSERT statements to load the data. With INGEST, you can specify dumpfile to place any rejected rows due to formatting error or invalid character, exception table to place any rows with constraint violations, warning counts to stop if the threshold is met. \u00c2\u00a0You can also specify commit count in rows or by time. This comes in handy when loading data continuously from a stream. If the INGEST utility gets aborted for any reasons (power failure or somebody accidentally killing the thread, you can restart the INGEST with RESTART option and DB2 will resume the INGEST from the last commit point. If do not want to restart INGEST you can issue the RESTART TERMINATE and DB2 will clean up the failed INGEST.\u00c2\u00a0Unlike with the\u00c2\u00a0TERMINATE\u00c2\u00a0parameter of the\u00c2\u00a0LOAD\u00c2\u00a0command, data already committed by the failed INGEST\u00c2\u00a0command is not removed from the table.<\/p>\n<p><span style=\"color: #99cc00;\"><em>What all format can it handle ?<\/em><\/span><\/p>\n<p>INGEST\u00c2\u00a0utlitiy \u00c2\u00a0supports the following input data formats:<\/p>\n<p>Delimited text<br \/>\nPositional text and binary<br \/>\nColumns in various orders and formats<\/p>\n<p><span style=\"color: #99cc00;\"><em>How about performance ?<\/em><\/span><\/p>\n<p>Obviously this is one of the most sacred part for the DBA when choosing the option to load lot of data. \u00c2\u00a0The best performance is always LOAD, but it doesn&#8217;t give high availability. INGEST even though under the covers issues INSERTS, I have noticed that when loading hundred thousands of rows INGEST was able to perform 40% faster than traditional INSERT. The reason why I compared to INSERT, is because its the only option that gives me high availability.<\/p>\n<p><span style=\"color: #99cc00;\"><em>How is it run ?<\/em><\/span><\/p>\n<p>Assume we have a coma delimited customer file as follows that needs to be loaded :<\/p>\n<p>10001,&#8221;Edward&#8221;,&#8221;Williams&#8221;,&#8221;12349 West Ave St&#8221;,&#8221;Taos&#8221;,&#8221;NM&#8221;,&#8221;564-573-4839&#8243;<br \/>\n10002,&#8221;James&#8221;,&#8221;Miller&#8221;,&#8221;43348 Knight St&#8221;,&#8221;Dumont&#8221;,&#8221;NJ&#8221;,&#8221;201-873-4239&#8243;<br \/>\n10003,&#8221;Jessica&#8221;,&#8221;Robertson&#8221;,&#8221;489 River N Rd&#8221;,&#8221;Austin&#8221;,&#8221;TX&#8221;,&#8221;512-534-9859&#8243;<br \/>\n: \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 : \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0\u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 : \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0: \u00c2\u00a0 \u00c2\u00a0 : \u00c2\u00a0 \u00c2\u00a0:<br \/>\n90000,&#8221;Kathy&#8221;,&#8221;Rodriguez&#8221;,&#8221;8293 Brooksheild Apt, # 201&#8243;,&#8221;Anahiem&#8221;,&#8221;CA&#8221;,&#8221;714-354-775<\/p>\n<p>The customer table is :<br \/>\nCNumber \u00c2\u00a0 INT \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0&#8212;&gt; primary key<br \/>\nFName \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 varchar(50)<br \/>\nLName \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 varchar(50)<br \/>\nStreet \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 varchar(100)<br \/>\nCity \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 varchar(50)<br \/>\nState \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 \u00c2\u00a0 char(2)<br \/>\nTelephone \u00c2\u00a0char(12)<\/p>\n<p>Lets see our options to INGEST the data:<\/p>\n<p>This is the simplest form :<br \/>\n<span style=\"color: #ff9900;\">ingest from file customer.del format delimited insert into raju.customer;<\/span><\/p>\n<p>If you want to have more control or if the data is in different order than the table column, then you can do this . Note here I have flipped the firstname and lastname on the VALUES:<\/p>\n<p><span style=\"color: #ff9900;\">ingest from file customer.del format delimited<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($cid integer external<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$firstname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$lastname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$streetname char(100)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$cityname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$statename char(2)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$telephone char(12)<\/span><br \/>\n<span style=\"color: #ff9900;\"> )<\/span><br \/>\n<span style=\"color: #ff9900;\"> dumpfile customer_dump.del<\/span><br \/>\n<span style=\"color: #ff9900;\"> exception table raju.customerexcept<\/span><br \/>\n<span style=\"color: #ff9900;\"> warningcount 100<\/span><br \/>\n<span style=\"color: #ff9900;\"> messages ingest_customer.msg<\/span><br \/>\n<span style=\"color: #ff9900;\"> insert into raju.customer(cnumber,fname,lname,street,city,state,telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> values($cid,<\/span><strong>$lastname,$firstname<\/strong><span style=\"color: #ff9900;\">,$streetname,$cityname,$statename,$telephone);<\/span><\/p>\n<p>The exception table should be in place before you run this. You can create the table with :<br \/>\ncreate table raju.customerexcept like raju.customer \u00c2\u00a0or run the same ddl you used for the customer table.<\/p>\n<p>If your data contains updates to\u00c2\u00a0existing\u00c2\u00a0rows as well as new rows, you can use MERGE statement on your INGEST :<\/p>\n<p><span style=\"color: #ff9900;\">ingest from file customer_monday.del, customer_tuesday.del format delimited<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($cid integer external<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$firstname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$lastname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$streetname char(100)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$cityname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$statename char(2)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$telephone char(12)<\/span><br \/>\n<span style=\"color: #ff9900;\"> )<\/span><br \/>\n<span style=\"color: #ff9900;\"> dumpfile customer_dump.del<\/span><br \/>\n<span style=\"color: #ff9900;\"> warningcount 100<\/span><br \/>\n<span style=\"color: #ff9900;\"> messages ingest4.msg<\/span><br \/>\n<span style=\"color: #ff9900;\"> merge into raju.customer on (cnumber = $cid)<\/span><br \/>\n<span style=\"color: #ff9900;\"> when not matched then<\/span><br \/>\n<span style=\"color: #ff9900;\"> insert (cnumber,fname,lname,street,city,state,telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> values($cid,$firstname,$lastname,$streetname,$cityname,$statename,$telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> when matched then<\/span><br \/>\n<span style=\"color: #ff9900;\"> update set (fname,lname,street,city,state,telephone) =<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($firstname,$lastname,$streetname,$cityname,$statename,$telephone);<\/span><\/p>\n<p>Note that in the above example, I have two input files. \u00c2\u00a0I have also removed the exception table. DB2 does not allow to use exception table option when using MERGE option. Don&#8217;t ask me why they don&#8217;t allow ?<\/p>\n<p>If you have a list of rows that you want to delete, then you can do as well :<\/p>\n<p><span style=\"color: #ff9900;\">ingest from file customer1.del format delimited<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($cid integer external<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$firstname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$lastname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$streetname char(100)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$cityname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$statename char(2)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$telephone char(12)<\/span><br \/>\n<span style=\"color: #ff9900;\"> )<\/span><br \/>\n<span style=\"color: #ff9900;\"> delete from raju.customer<\/span><br \/>\n<span style=\"color: #ff9900;\"> where cnumber = $cid;<\/span><\/p>\n<p><span style=\"color: #99cc00;\"><em>How do we restart ?<\/em><\/span><\/p>\n<p>You can restart a terminated INGEST with the RESTART CONTINUE &lt;job_id&gt; option<\/p>\n<p>If you want the ability to easily restart a terminated INGEST use the option RESTART NEW &lt;job_id&gt; on your original INGEST command like :<\/p>\n<p><span style=\"color: #ff9900;\">ingest from file customer_monday.del, customer_tuesday.del format delimited<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($cid integer external<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$firstname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$lastname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$streetname char(100)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$cityname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$statename char(2)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$telephone char(12)<\/span><br \/>\n<span style=\"color: #ff9900;\"> )<\/span><br \/>\n<span style=\"color: #ff9900;\"> dumpfile customer_dump.del<\/span><br \/>\n<span style=\"color: #ff9900;\"> warningcount 100<\/span><br \/>\n<span style=\"color: #ff9900;\"> messages ingest4.msg<\/span><br \/>\n<span style=\"color: #ff9900;\"> <strong>restart new &#8216;customeringest1&#8217;<\/strong><\/span><br \/>\n<span style=\"color: #ff9900;\"> merge into raju.customer on (cnumber = $cid)<\/span><br \/>\n<span style=\"color: #ff9900;\"> when not matched then<\/span><br \/>\n<span style=\"color: #ff9900;\"> insert (cnumber,fname,lname,street,city,state,telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> values($cid,$firstname,$lastname,$streetname,$cityname,$statename,$telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> when matched then<\/span><br \/>\n<span style=\"color: #ff9900;\"> update set (fname,lname,street,city,state,telephone) =<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($firstname,$lastname,$streetname,$cityname,$statename,$telephone);<\/span><\/p>\n<p>The RESTART NEW option will help you identify the ingest command that got terminated. Once you identify the ingest that got terminated, you can restart it by issuing RESTART CONTINUE<\/p>\n<p><span style=\"color: #ff9900;\">ingest from file customer_monday.del, customer_tuesday.del format delimited<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($cid integer external<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$firstname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$lastname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$streetname char(100)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$cityname char(50)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$statename char(2)<\/span><br \/>\n<span style=\"color: #ff9900;\"> ,$telephone char(12)<\/span><br \/>\n<span style=\"color: #ff9900;\"> )<\/span><br \/>\n<span style=\"color: #ff9900;\"> dumpfile customer_dump.del<\/span><br \/>\n<span style=\"color: #ff9900;\"> warningcount 100<\/span><br \/>\n<span style=\"color: #ff9900;\"> messages ingest4.msg<\/span><br \/>\n<span style=\"color: #ff9900;\"> <strong>restart continue &#8216;customeringest1&#8217;<\/strong><\/span><br \/>\n<span style=\"color: #ff9900;\"> merge into raju.customer on (cnumber = $cid)<\/span><br \/>\n<span style=\"color: #ff9900;\"> when not matched then<\/span><br \/>\n<span style=\"color: #ff9900;\"> insert (cnumber,fname,lname,street,city,state,telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> values($cid,$firstname,$lastname,$streetname,$cityname,$statename,$telephone)<\/span><br \/>\n<span style=\"color: #ff9900;\"> when matched then<\/span><br \/>\n<span style=\"color: #ff9900;\"> update set (fname,lname,street,city,state,telephone) =<\/span><br \/>\n<span style=\"color: #ff9900;\"> ($firstname,$lastname,$streetname,$cityname,$statename,$telephone);<\/span><\/p>\n<p>The RESTART CONTINUE option will enable us to restart INGEST from the last commit point.<\/p>\n<p>If you want to terminate the utility and want to clean up from the previously failed run, then you can use \u00c2\u00a0RESTART TERMINATE &lt;job_id&gt; option just like the RESTART CONTINUE option.<\/p>\n<p><span style=\"color: #99cc00;\"><em>How can I monitor the utility ?<\/em><\/span><\/p>\n<p>There are few ways to watch what INGEST utility is doing . \u00c2\u00a0IBM has provided few options:<\/p>\n<p>1. <span style=\"color: #ff9900;\">INGEST GET STATS<\/span><br \/>\n2.<span style=\"color: #ff9900;\"> INGEST LIST<\/span> or<br \/>\n3 you can query <span style=\"color: #ff9900;\">TABLE(MON_GET_CONNECTION(NULL,NULL))<\/span> and look for the application_name DB2_INGEST<\/p>\n<p><span style=\"color: #99cc00;\"><em>Conclusion<\/em><\/span><\/p>\n<p>While INGEST has a lot of room for improvement, its a great utility to have when compared to other options. Its easy to use and keeps up the availability while inserting large amount of data. I also like the fact that we can use it to delete rows for a list of values. \u00c2\u00a0I hate to use large IN clause in my delete statements or have to load the key into a temporary table\u00c2\u00a0scenario. Few limitation with INGEST you need to aware of , currently INGEST do now allow LOBs, XML, structured types, has lot of restrictions on generated columns. I definitely recommend anyone to try out the INGEST utility and see how you can benefit from it. After all you paid for the product, so the least you can do is try it. Trust me if you use it right, you will love it !<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As a DB2 DBA or as data integrator we all face a challenge in loading large amount of data into DB2 database while up keeping the 24&#215;7 availability of the tables. \u00c2\u00a0Traditionally we use LOAD or sometimes IMPORT while loading large amount of data to tables. Either of them has data concurrency and availability issue. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"site-container-style":"default","site-container-layout":"default","site-sidebar-layout":"default","site-transparent-header":"default","disable-article-header":"default","disable-site-header":"default","disable-site-footer":"default","disable-content-area-spacing":"default","footnotes":""},"categories":[303,3,358,275],"tags":[849,722,846,717,782,177,854,853,852,845,847,848,850,851],"class_list":["post-1388","post","type-post","status-publish","format-standard","hentry","category-database","category-db2","category-how-to","category-unix","tag-data-load","tag-db2-import","tag-db2-ingest","tag-db2-load","tag-db2-v10-features","tag-etl","tag-high-performance-data-load","tag-high-performance-load","tag-import","tag-ingest","tag-load-data-with-high-availability","tag-loading-high-volume-of-data","tag-mass-data-insert","tag-restart"],"_links":{"self":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1388","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1388"}],"version-history":[{"count":17,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1388\/revisions"}],"predecessor-version":[{"id":1392,"href":"https:\/\/techsatwork.com\/index.php?rest_route=\/wp\/v2\/posts\/1388\/revisions\/1392"}],"wp:attachment":[{"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1388"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1388"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techsatwork.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1388"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}