使用sql识别具有特定特征的时段

时间:2015-01-08 06:30:07

标签: sql oracle time-series

我正在寻找一个SQL查询,它可以确定个人在没有吃饭的情况下最长的一段时间。理想情况下,输出看起来像

person  periodstart  periodend 

对于每个人,没有肉的最长时间将被识别出来

  

期间开始将是第一次非肉食的时间

     

periodend 将是第一次吃肉的时间。

下面的SQL创建表和数据。

CREATE TABLE MEALS 
(
  PERSON VARCHAR2(20 BYTE) 
, MEALTIME DATE 
, FOODTYPE VARCHAR2(20) 
);

Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('04-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('05-JAN-15 06:09:09','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('07-JAN-15 06:01:24','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Jane',to_date('07-JAN-15 12:03:50','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('02-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('03-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('04-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('05-JAN-15 07:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('05-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'fruit');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('John',to_date('06-JAN-15 10:03:23','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('02-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('03-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 04:04:25','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('05-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'meat');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('06-JAN-15 05:01:54','DD-MON-RR HH24:MI:SS'),'veg');
Insert into MEALS (PERSON,MEALTIME,FOODTYPE) 
values ('Mary',to_date('07-JAN-15 06:04:25','DD-MON-RR HH24:MI:SS'),'veg');

commit;

2 个答案:

答案 0 :(得分:2)

这是一个缺口和岛屿问题,有各种方法可以解决它。一种方法是使用an analytic function effect/trick查找每种类型的连续句点链:

select person, mealtime, foodtype,
  case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
  dense_rank() over (partition by person,
      case when foodtype = 'meat' then 1 else 0 end order by mealtime)
    - dense_rank() over (partition by person order by mealtime) as chain
from meals
order by person, mealtime;

''伪柱基于case这里,你想要水果和蔬菜 - 或任何非肉类 - 处理相同。

然后,您可以将其用作内部查询,从每个链中的第一餐开始查找每个肉类和非肉类的开始时间:

select person, meat, min(mealtime) as first_meal
from (
  select person, mealtime, foodtype,
    case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
    dense_rank() over (partition by person,
        case when foodtype = 'meat' then 1 else 0 end order by mealtime)
      - dense_rank() over (partition by person order by mealtime) as chain
  from meals
)
group by person, meat, chain
order by person, min(mealtime);

PERSON               MEAT FIRST_MEAL       
-------------------- ---- ------------------
Jane                 No   04-JAN-15 06:09:09 
Jane                 Yes  07-JAN-15 06:01:24 
Jane                 No   07-JAN-15 12:03:50 
John                 No   02-JAN-15 10:03:23 
...

你希望这段时间能够覆盖第一顿非肉食到下一顿肉餐,所以你可以使用那个作为带有超前和滞后的内部查询来窥视两侧的行:在蔬菜期间,你要先看看下一个肉食期的开始;在肉食期间,你会看到他开始上一个蔬菜时期:

select person, meat,
  case when meat = 'Yes' then lag(first_meal) over (partition by person
      order by first_meal) else first_meal end as period_start,
  case when meat = 'No' then lead(first_meal) over (partition by person
      order by first_meal) else first_meal end as period_end
from (
  select person, meat, min(mealtime) as first_meal
  from (
    select person, mealtime, foodtype,
      case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
      dense_rank() over (partition by person,
          case when foodtype = 'meat' then 1 else 0 end order by mealtime)
        - dense_rank() over (partition by person order by mealtime) as chain
    from meals
  )
  group by person, meat, chain
)
order by person, period_start;

PERSON               MEAT PERIOD_START       PERIOD_END       
-------------------- ---- ------------------ ------------------
Jane                 No   04-JAN-15 06:09:09 07-JAN-15 06:01:24 
Jane                 Yes  04-JAN-15 06:09:09 07-JAN-15 06:01:24 
Jane                 No   07-JAN-15 12:03:50                    
John                 No   02-JAN-15 10:03:23 03-JAN-15 10:03:23 
...

虽然我已经离开了“肉”,但这有效地给了你重复。标志着让它在这一点上更清楚一点。假设你想忽略最新的开放时期,你只需要跳过这些并消除重复:

select person, period_start, period_end
from (
  select person, meat,
    case when meat = 'Yes' then lag(first_meal) over (partition by person
        order by first_meal) else first_meal end as period_start,
    case when meat = 'No' then lead(first_meal) over (partition by person
        order by first_meal) else first_meal end as period_end
  from (
    select person, meat, min(mealtime) as first_meal
    from (
      select person, mealtime, foodtype,
        case when foodtype = 'meat' then 'Yes' else 'No' end as meat,
        dense_rank() over (partition by person,
            case when foodtype = 'meat' then 1 else 0 end order by mealtime)
          - dense_rank() over (partition by person order by mealtime) as chain
      from meals
    )
    group by person, meat, chain
  )
)
where meat = 'No'
and period_start is not null
and period_end is not null
order by person, period_start;

PERSON               PERIOD_START       PERIOD_END       
-------------------- ------------------ ------------------
Jane                 04-JAN-15 06:09:09 07-JAN-15 06:01:24 
John                 02-JAN-15 10:03:23 03-JAN-15 10:03:23 
John                 04-JAN-15 10:03:23 06-JAN-15 10:03:23 
Mary                 02-JAN-15 05:01:54 03-JAN-15 06:04:25 
Mary                 05-JAN-15 04:04:25 05-JAN-15 06:04:25 

SQL Fiddle完整的中间步骤。

姗姗来迟地意识到你只想要每个人最长的时间,你可以用另一层来获得:

select person, period_start, period_end
from (
  select person, period_start, period_end,
    rank() over (partition by person order by period_end - period_start desc) as rnk
  from (
    ...
  )
  where meat = 'No'
  and period_start is not null
  and period_end is not null
)
where rnk = 1
order by person, period_start;

PERSON               PERIOD_START       PERIOD_END       
-------------------- ------------------ ------------------
Jane                 04-JAN-15 06:09:09 07-JAN-15 06:01:24 
John                 04-JAN-15 10:03:23 06-JAN-15 10:03:23 
Mary                 02-JAN-15 05:01:54 03-JAN-15 06:04:25 

Updated SQL Fiddle

答案 1 :(得分:1)

解决方案是在SQL SERVER中我希望您能够轻松理解

with x as (
 select ROW_NUMBER()over( Partition by person order by MealTime) rowId,* from #MEALS
)
,y as (
select ROW_NUMBER() over( Partition by person order by MealTime) rowID, * from 
#MEALS where FOODTYPE='meat')
select x.PERSON,x.MEALTIME startdate,y.MEALTIME endDate,        datediff(second,x.MEALTIME,y.MEALTIME) diff from x 
left join 
y on x.PERSON=y.PERSON where 
x.rowId=1 and y.rowID=1