Benefits of Data Partitioning and MDC !

rpillaiOct 21, 2010

Data Partitioning is a way to scale your database structure to improve performance and manageability. Â Data Partitioning is a method of Â logically and physically segregating data within tables and indexes so that queries can access them individually or concurrently. Pretty much every commercially available RDBMS has data partitioning as one of its features and they come in different varieties too.
IBM has been pioneering data partitioning for years. Â DB2 on mainframe has been enjoying partitioning for many years and it rocks ! Â DB2 UDB started with DPF Â (Distributed Partition Facility) on its DB2 EEE days. Â The concept was to distribute the instance across multiple nodes (logical or physical) so that the database workload can take advantage of non-shared resources across the nodes.

I had written an article in 2009 that explained a bit about table partitoning, so I am not going to repeat it.Â Today we will look into howÂ table partitioning and multi-dimensional cluster can improve your performance. Â I did a three part exercise to compare the outcome. Â I am going to show you three different scenarios that illustrates the benefits ofÂ table partitioning and multi-dimensional custer.

The first scenario is with a regular table, the second scenario is with a partitioned table and the third scenario is a partitioned table with MDC columns.
The sql we are going to run all the three scenarios is :
select customer_id from tableA where year =’2010′ Â and state = ‘TX’ and Â order_date <> ship_date ;

Scenario 1:
Lets just use a table with just 5 columns with 14 millions rows :
CUSTOMER_ID Â INT
ORDER_ID Â Â INT
ORDER_DATE Â DATE
SHIP_DATE DATE
YEAR INT
STATE Â CHAR(2)

The table has an index of YEAR Â and STATE .

Result : Â When I ran the query, I got the following results :
Estimated Cost = 90865.046875

( Â Â 2) Access Table Name = RAJU.TABLEA = 3,4

| Â Index Scan: Â Name =Â RAJU.TABLEA_X1 Â ID = 1

| Â | Â Regular Index (Not Clustered)

| Â | Â Index Columns:

| Â | Â | Â 1: YEAR (Ascending)

| Â | Â | Â 2: STATE (Ascending)

| Â #Columns = 1

| Â #Key Columns = 2

| Â | Â Start Key: Inclusive Value

| Â | Â | Â | Â 1: 2008

| Â | Â | Â | Â 2: ‘TX’

| Â | Â Stop Key: Inclusive Value

| Â | Â | Â | Â 1: 2008

| Â | Â | Â | Â 2: ‘TX’

| Â Data Prefetch: Eligible 11980

| Â Index Prefetch: Eligible 11980

| Â Isolation Level: Uncommitted Read

| Â Lock Intents

| Â | Â Table: Intent None

| Â | Â Row Â : None

| Â Sargable Predicate(s)

| Â | Â #Predicates = 1

( Â Â 2) | Â | Â Return Data to Application

| Â | Â | Â #Columns = 1

( Â Â 1) Return Data Completion

Total Time: Â Â Â Â Â Â Â Â 26.740789 seconds

Scenario 2 :
Lets use the same table with YEAR column partitioned by each year on the data, in my case it was 5 partition.

Result :

Estimated Cost = 1841.140381

( Â Â 2) Access Table Name =Â RAJU.TABLEAÂ ID = -6,-32767

| Â Index Scan: Â Name =Â RAJU.TABLEA_X1 Â ID = 1

| Â | Â Regular Index (Not Clustered)

| Â | Â Index Columns:

| Â | Â | Â 1: YEAR (Ascending)

| Â | Â | Â 2: STATE (Ascending)

| Â #Columns = 1

| Â Data-Partitioned Table

| Â Data Partition Elimination Info:

| Â Active Data Partitions: 0-11

| Â #Key Columns = 2

| Â | Â Start Key: Inclusive Value

| Â | Â | Â | Â 1: 2008

| Â | Â | Â | Â 2: ‘TX’

| Â | Â Stop Key: Inclusive Value

| Â | Â | Â | Â 1: 2008

| Â | Â | Â | Â 2: ‘TX’

| Â Data Prefetch: Eligible 1563

| Â Index Prefetch: Eligible 1563

| Â Isolation Level: Uncommitted Read

| Â Lock Intents

| Â | Â Table: Intent None

| Â | Â Row Â : None

| Â Sargable Predicate(s)

| Â | Â #Predicates = 1

( Â Â 2) | Â | Â Return Data to Application

| Â | Â | Â #Columns = 1

( Â Â 1) Return Data Completion

Total Time: Â Â Â Â Â Â Â Â 10.908611 seconds

Scenario 3 :
Lets use the same table with YEAR column partitioned by each year on the data, and have it organized by Year and STATE . This means the table has a MDC of YEAR and STATE and also partitioned by YEAR.

Result :

Estimated Cost = 1650.786011

( Â Â 2) Access Table Name =Â RAJU.TABLEAÂ ID = -6,-32768

| Â Index Scan: Â Name = SYSIBM.SQL101019170136940 Â ID = 1

| Â | Â Composite Block Index

| Â | Â Index Columns:

| Â | Â | Â 1: YEAR (Ascending)

| Â | Â | Â 2: STATE (Ascending)

| Â #Columns = 1

| Â Clustered by Dimension for Block Index Access

| Â Data-Partitioned Table

| Â Data Partition Elimination Info:

| Â Active Data Partitions: 0-11

| Â #Key Columns = 2

| Â | Â Start Key: Inclusive Value

| Â | Â | Â | Â 1: 2008

| Â | Â | Â | Â 2: ‘TX’

| Â | Â Stop Key: Inclusive Value

| Â | Â | Â | Â 1: 2008

| Â | Â | Â | Â 2: ‘TX’

| Â Data Prefetch: None

| Â Index Prefetch: None

| Â Isolation Level: Uncommitted Read

| Â Lock Intents

| Â | Â Table: Intent None

| Â | Â Block: None

| Â | Â Row Â : None

| Â Sargable Predicate(s)

( Â Â 2) | Â | Â Return Data to Application

| Â | Â | Â #Columns = 1

( Â Â 1) Return Data Completion

End of section

Total Time: Â Â Â Â Â Â Â Â 1.147453 seconds

From this you can clearly see how efficient the combination of partitioned and MDC is. Â Db2 is able to do partition elimination and access the data by block index.
Inorder to partition and create MDC, you need to know your data and how the applications access the data. Â One added benefit of MDC is that DB2 will also keep the table organized meaning, you will never have to reorg the table. Â Table Partitioning and MDC Â are only available in Enterprise Edition and up .

Related Posts

Backing up Aurora MySQL

How to access Aurora MySQL from Python code via a bastion server

Creating an EMR cluster using CLI