databasedb2how to

Benefits of Data Partitioning and MDC !

Data Partitioning is a way to scale your database structure to improve performance and manageability.  Data Partitioning is a method of  logically and physically segregating data within tables and indexes so that queries can access them individually or concurrently. Pretty much every commercially available RDBMS has data partitioning as one of its features and they come in different varieties too.
IBM has been pioneering data partitioning for years.  DB2 on mainframe has been enjoying partitioning for many years and it rocks !   DB2 UDB started with DPF  (Distributed Partition Facility) on its DB2 EEE days.  The concept was to distribute the instance across multiple nodes (logical or physical) so that the database workload can take advantage of non-shared resources across the nodes.

I had written an article in 2009 that explained a bit about table partitoning, so I am not going to repeat it.  Today we will look into how table partitioning and multi-dimensional cluster can improve your performance.  I did a three part exercise to compare the outcome.  I am going to show you three different scenarios that illustrates the benefits of table partitioning and multi-dimensional custer.

The first scenario is with a regular table, the second scenario is with a partitioned table and the third scenario is a partitioned table with MDC columns.
The sql we are going to run all the three scenarios is :
select customer_id from tableA where year =’2010′  and state = ‘TX’ and  order_date <> ship_date ;

Scenario 1:
Lets just use a table with just 5 columns with 14 millions rows :
CUSTOMER_ID  INT
ORDER_ID     INT
ORDER_DATE  DATE
SHIP_DATE DATE
YEAR INT
STATE   CHAR(2)

The table has an index of YEAR  and STATE .

Result :   When I ran the query, I got the following results :
Estimated Cost = 90865.046875

(    2) Access Table Name = RAJU.TABLEA = 3,4
|  Index Scan:  Name = RAJU.TABLEA_X1  ID = 1
|  |  Regular Index (Not Clustered)
|  |  Index Columns:
|  |  |  1: YEAR (Ascending)
|  |  |  2: STATE (Ascending)
|  #Columns = 1
|  #Key Columns = 2
|  |  Start Key: Inclusive Value
|  |  |  |  1: 2008
|  |  |  |  2: ‘TX’
|  |  Stop Key: Inclusive Value
|  |  |  |  1: 2008
|  |  |  |  2: ‘TX’
|  Data Prefetch: Eligible 11980
|  Index Prefetch: Eligible 11980
|  Isolation Level: Uncommitted Read
|  Lock Intents
|  |  Table: Intent None
|  |  Row  : None
|  Sargable Predicate(s)
|  |  #Predicates = 1
(    2) |  |  Return Data to Application
|  |  |  #Columns = 1
(    1) Return Data Completion
Total Time:                26.740789 seconds

Scenario 2 :
Lets use the same table with YEAR column partitioned by each year on the data, in my case it was 5 partition.
Result :
Estimated Cost = 1841.140381
(    2) Access Table Name = RAJU.TABLEA  ID = -6,-32767
|  Index Scan:  Name = RAJU.TABLEA_X1  ID = 1
|  |  Regular Index (Not Clustered)
|  |  Index Columns:
|  |  |  1: YEAR (Ascending)
|  |  |  2: STATE (Ascending)
|  #Columns = 1
|  Data-Partitioned Table
|  Data Partition Elimination Info:
|  Active Data Partitions: 0-11
|  #Key Columns = 2
|  |  Start Key: Inclusive Value
|  |  |  |  1: 2008
|  |  |  |  2: ‘TX’
|  |  Stop Key: Inclusive Value
|  |  |  |  1: 2008
|  |  |  |  2: ‘TX’
|  Data Prefetch: Eligible 1563
|  Index Prefetch: Eligible 1563
|  Isolation Level: Uncommitted Read
|  Lock Intents
|  |  Table: Intent None
|  |  Row  : None
|  Sargable Predicate(s)
|  |  #Predicates = 1
(    2) |  |  Return Data to Application
|  |  |  #Columns = 1
(    1) Return Data Completion
Total Time:                10.908611 seconds

Scenario 3 :
Lets use the same table with YEAR column partitioned by each year on the data, and have it organized by Year and STATE . This means the table has a MDC of YEAR and STATE and also partitioned by YEAR.

Result :

Estimated Cost = 1650.786011
(    2) Access Table Name = RAJU.TABLEA  ID = -6,-32768
|  Index Scan:  Name = SYSIBM.SQL101019170136940  ID = 1
|  |  Composite Block Index
|  |  Index Columns:
|  |  |  1: YEAR (Ascending)
|  |  |  2: STATE (Ascending)
|  #Columns = 1
|  Clustered by Dimension for Block Index Access
|  Data-Partitioned Table
|  Data Partition Elimination Info:
|  Active Data Partitions: 0-11
|  #Key Columns = 2
|  |  Start Key: Inclusive Value
|  |  |  |  1: 2008
|  |  |  |  2: ‘TX’
|  |  Stop Key: Inclusive Value
|  |  |  |  1: 2008
|  |  |  |  2: ‘TX’
|  Data Prefetch: None
|  Index Prefetch: None
|  Isolation Level: Uncommitted Read
|  Lock Intents
|  |  Table: Intent None
|  |  Block: None
|  |  Row  : None
|  Sargable Predicate(s)
(    2) |  |  Return Data to Application
|  |  |  #Columns = 1
(    1) Return Data Completion
End of section
Total Time:                 1.147453 seconds

From this you can clearly see how efficient the combination of partitioned and MDC is.  Db2 is able to do partition elimination and access the data by block index.
Inorder to partition and create MDC, you need to know your data and how the applications access the data.  One added benefit of MDC is that DB2 will also keep the table organized meaning, you will never have to reorg the table.  Table Partitioning and MDC  are only available in Enterprise Edition and up .