db2

Over and Partition by on SQL for aggregation

The other day  I had a requirement to create a report with total occurrence of an event by year month , running total and then running total within a year all in the same result set.  After trying out different methods, I found the results can be easily obtained by using the DB2’s OLAP function  OVER and PARTITION BY

OVER( ) by lets you do aggregate  to a subset of data.PARTITION BY let you split/segregate the subset of data  Here is a basic example of the OVER(PARTITION BY)

select dept,avg(salary) over (partition by dept) from employee_salary order by dept;

This will give the average salary of each department

An example of the result set I was after and  below is the sql that provided this result set :

YEAR MONTH EVENT_COUNT ROLLING_SUM YEARLY_SUM
2007 1 577 577 577
2007 2 502 1079 1079
2007 3 1042 2121 2121
2007 4 1271 3392 3392
2007 5 3693 7085 7085
2007 6 1634 8719 8719
2007 7 2558 11277 11277
2007 8 3726 15003 15003
2007 9 4267 19270 19270
2007 10 5306 24576 24576
2007 11 6270 30846 30846
2007 12 5017 35863 35863
2008 1 2627 38490 2627
2008 2 2762 41252 5389
2008 3 4735 45987 10124
2008 4 3213 49200 13337
2008 5 2306 51506 15643
2008 6 2845 54351 18488
2008 7 2727 57078 21215
2008 8 2953 60031 24168
2008 9 2744 62775 26912
2008 10 2977 65752 29889
2008 11 1478 67230 31367
2008 12 2371 69601 33738
2009 1 2904 72505 2904
2009 2 2907 75412 5811
2009 3 2193 77605 8004
2009 4 1960 79565 9964
2009 5 472 80037 10436
2009 6 265 80302 10701
2009 7 3518 83820 14219
2009 8 541 84361 14760
2009 9 38 84399 14798
2009 11 1 84400 14799

In this sql I used  the OVER() and then the OVER(PARTITION BY) OLAP function.  The OVER(ORDER BY yearname,monthname) give me the running sum of event_count for the ongoing months. The
over (partition by yearname order by yearname,monthname) give me the running sum of event_count for ongoing month but reset itself at the beginning of every year.

SELECT YEARNAME, MONTHNAME, mytab.count as EVENT_COUNT,
SUM(mytab.count) OVER (ORDER BY yearname,monthname) AS ROLLING_SUM,
SUM(mytab.count) OVER (PARTITION BY yearname ORDER BY yearname,monthname) as YEARLY_SUM
FROM
(select year(A.add_dt) as yearname,
month(A.add_dt) monthname ,
count(1) count
from raju.hack_attempts A
where year(A.add_dt) in (2007, 2008,2009)
group by year(A.add_dt),
month(A.add_dt)
)mytab