Tabibitosan is a technique that I picked up from, guess where? Yup, the OTN Forums! It was Aketi Jyuuzou who first introduced me to the idea, and it's a very useful technique for a certain set of problems. Aketi's forum post has lots of information on it, but I just wanted to demonstrate a simple example here, to give you a taste of what you can use it for.
So, suppose we have a set of data, which is unique for id and dt:
ID DT
---------- ----------
1 01/01/2011
1 02/01/2011
1 03/01/2011
1 04/01/2011
1 05/01/2011
1 10/01/2011
1 11/01/2011
1 12/01/2011
1 14/01/2011
1 15/01/2011
1 16/01/2011
2 01/01/2011
2 02/01/2011
2 03/01/2011
2 04/01/2011
2 06/01/2011
2 07/01/2011
2 11/01/2011
2 12/01/2011
2 13/01/2011
2 29/01/2011
and you want to group the results to find the start and end dates of consecutive rows, eg:
ID MIN_DT MAX_DT
---------- ---------- ----------
1 01/01/2011 05/01/2011
1 10/01/2011 12/01/2011
1 14/01/2011 16/01/2011
2 01/01/2011 04/01/2011
2 06/01/2011 07/01/2011
2 11/01/2011 13/01/2011
2 29/01/2011 29/01/2011
Where do we start? Well, the tabibitosan works by assigning a number to each of the rows in the resultset, either over the whole set or over the partitioned result sets - in our case, we'll be doing this per each id. Then it assigns a different, consecutive number to the rows.
Because we're using dates, we'll label each row by converting those to a number in YYYYMMDD format. We'll also use the row_number() analytic function to label each row with a consecutive number for each id in ascending dt order. Finally, we'll subtract one from the other:
with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('29/01/2011', 'dd/mm/yyyy') dt from dual)
select id,
dt,
to_number(to_char(dt, 'yyyymmdd')) main_rn,
row_number() over (partition by id order by dt) partitioned_rn,
to_number(to_char(dt, 'yyyymmdd'))
- row_number() over (partition by id order by dt) grp
from sample_data
/
ID DT MAIN_RN PARTITIONED_RN GRP
---------- ---------- ---------- -------------- ----------
1 01/01/2011 20110101 1 20110100
1 02/01/2011 20110102 2 20110100
1 03/01/2011 20110103 3 20110100
1 04/01/2011 20110104 4 20110100
1 05/01/2011 20110105 5 20110100
1 10/01/2011 20110110 6 20110104
1 11/01/2011 20110111 7 20110104
1 12/01/2011 20110112 8 20110104
1 14/01/2011 20110114 9 20110105
1 15/01/2011 20110115 10 20110105
1 16/01/2011 20110116 11 20110105
2 01/01/2011 20110101 1 20110100
2 02/01/2011 20110102 2 20110100
2 03/01/2011 20110103 3 20110100
2 04/01/2011 20110104 4 20110100
2 06/01/2011 20110106 5 20110101
2 07/01/2011 20110107 6 20110101
2 11/01/2011 20110111 7 20110104
2 12/01/2011 20110112 8 20110104
2 13/01/2011 20110113 9 20110104
2 29/01/2011 20110129 10 20110119
You can see that we now have the same number (the grp column) for rows with consecutive dt's. Each group has a different grp value per each id. This is because as each dt increases by 1, so does the row number. The difference between the two values of that row and the previous row remains the same. As soon as the dt jumps by more than 1, the difference becomes greater, as the row number will only ever increase by 1.
Having identified the numbers that separate the rows into consecutive chunks, we can group the result set using them, to find the min and max dt's for each set:
with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
select 2 id, to_date('29/01/2011', 'dd/mm/yyyy') dt from dual),
tabibitosan as (select id,
dt,
to_number(to_char(dt, 'yyyymmdd'))
- row_number() over (partition by id order by dt) grp
from sample_data)
select id,
min(dt) min_dt,
max(dt) max_dt
from tabibitosan
group by id, grp
order by id, grp
ID MIN_DT MAX_DT
---------- ---------- ----------
1 01/01/2011 05/01/2011
1 10/01/2011 12/01/2011
1 14/01/2011 16/01/2011
2 01/01/2011 04/01/2011
2 06/01/2011 07/01/2011
2 11/01/2011 13/01/2011
2 29/01/2011 29/01/2011
Neat, huh?
Whenever I see this kind of "grouping consecutive rows together" problem, the tabibitosan method immediately leaps to mind. Thanks Aketi!
Also, note how I've used both analytic functions and subquery factoring in this - both powerful tools in their own right, you can use them as building blocks to come up with queries that may look fairly complicated, but are easy to break down to see what each constituent part is doing. Pack a few tools in your SQL toolkit and you can easily combine them to build complex queries.