将日期时间段除以空值

时间:2016-10-27 08:14:21

标签: sql postgresql datetime gaps-and-islands

我想将数据周期除以空值。 假设我有这样的表。

Date                |  abc
2016-04-18 07:10:00 | 2.3
2016-04-18 07:20:00 | 2.1
2016-04-18 07:30:00 |
2016-04-18 07:40:00 | 
2016-05-01 10:00:00 | 1.9
2016-05-01 10:10:00 | 4.5
2016-05-01 10:20:00 | 3.9

某些abc列数据为null,日期列间隔为10分钟。

我想将日期分为abc的空值。

结果

      start                     end 
2016-04-18 07:10:00 ~ 2016-04-18 07:20:00 
2016-05-01 10:00:00 ~ 2016-05-01 10:20:00

2 个答案:

答案 0 :(得分:1)

“Gaps And Islands”问题通常通过应用窗口函数来解决,该窗口函数检查数据的变化并根据这些变化分配组号。

首先,需要根据timestamp列定义的排序顺序将当前值与之前的值进行比较:

本声明:

select *,
       case
          when abc is null or lag(abc) over (order by "date") is not null then null
          else 1
        end as group_flag
from data
order by "date";

返回此结果:

date                | abc | group_flag
--------------------+-----+-----------
2016-04-18 07:10:00 | 2.3 |          1
2016-04-18 07:20:00 | 2.1 |           
2016-04-18 07:30:00 |     |           
2016-04-18 07:40:00 |     |          
2016-05-01 10:00:00 | 1.9 |          1
2016-05-01 10:10:00 | 4.5 |           
2016-05-01 10:20:00 | 3.9 |           

正如您所看到的,每次新的“组”开始时我们都会得到一个标记。

下一步是使用运行总和,将“标志”更改为实际组:

select *,
       sum(group_flag) over (order by date) as group_nr
from (
  select *,
         case
            when abc is null lag(abc) over (order by "date") is not null then null
            else 1
          end as group_flag
  from data
) t1
order by "date";

返回:

date                | abc | group_flag | group_nr
--------------------+-----+------------+---------
2016-04-18 07:10:00 | 2.3 |          1 |        1
2016-04-18 07:20:00 | 2.1 |            |        1
2016-04-18 07:30:00 |     |            |        1
2016-04-18 07:40:00 |     |            |        1
2016-05-01 10:00:00 | 1.9 |          1 |        2
2016-05-01 10:10:00 | 4.5 |            |        2
2016-05-01 10:20:00 | 3.9 |            |        2

如您所见,新列group_nr现在标识了我们感兴趣的连续句点。对于您的结果,我们只需要过滤掉abc为空的行:

select min(date) as period_start, max(date) as period_end
from (
    select *,
           sum(group_flag) over (order by date) as group_nr
    from (
      select *,
             case
                when abc is null or lag(abc) over (order by date) is not null then null
                else 1
              end as group_flag
      from data
    ) t1
    order by "date"
) t2
where abc is not null
group by group_nr;

返回:

period_start        | period_end         
--------------------+--------------------
2016-04-18 07:10:00 | 2016-04-18 07:20:00
2016-05-01 10:40:00 | 2016-05-01 11:00:00

答案 1 :(得分:0)

使用CTE并避免嵌套查询的更具可读性的解决方案。

我不确定在NULL和NOT NULL abc值出现相同Date的情况下,您期望的行为。在这种情况下,是否要从数据集中排除NULL?

-- Only if your server supports LAG and LEAD windowed functions
-- See alternative BeginEndFlagedSet below
WITH BeginEndFlagedSet as
(
    SELECT
        Date,
        Abc,
        ROW_NUMBER() OVER(ORDER BY Date ASC) as Num,
        CASE 
            WHEN Abc IS NOT NULL AND LAG(Abc) OVER(ORDER BY Date) IS NULL THEN 'Start'
            WHEN Abc IS NOT NULL AND LEAD(Abc) OVER(ORDER BY Date) IS NULL THEN 'End'
        END as BeginEndFlag
    FROM [YourTable]
)
SELECT 
    MIN(StartRow.Date) as "Start Date",
    CASE 
        WHEN MIN(CASE EndRow.BeginEndFlag WHEN 'End' THEN EndRow.Date END) 
            > MIN(CASE EndRow.BeginEndFlag WHEN 'Start' THEN EndRow.Date END) THEN MIN(StartRow.Date)
        WHEN MIN(CASE EndRow.BeginEndFlag WHEN 'End' THEN EndRow.Date END) IS NULL THEN MIN(StartRow.Date)
        ELSE MIN(CASE EndRow.BeginEndFlag WHEN 'End' THEN EndRow.Date END)
    END as "End Date"
FROM BeginEndFlagedSet StartRow
LEFT JOIN BeginEndFlagedSet EndRow on 
    StartRow.Num < EndRow.Num 
    and EndRow.BeginEndFlag in ('Start', 'End') 
WHERE StartRow.BeginEndFlag = 'Start'
GROUP BY StartRow.Num

首先,我们为每一行分配数字:

Date Abc {{1 }}
Num 2016-04-18 07:10:00 2
1 2016-04-18 07:20:00 2
2 2016-04-18 07:30:00 NULL
3 2016-04-18 07:40:00 NULL
4 2016-05-01 10:00:00 2
5 2016-05-01 10:10:00 5
6 2016-05-01 10:20:00 4

然后比较每一行是否设置为上一个和后一个值:

7 Date {{1 } Abc Num <强> LAG(Abc)
LEAD(Abc) Flag 2016-04-18 07:10:00 2 1 NULL
2 Start 2016-04-18 07:20:00 2 2 2
NULL End 2016-04-18 07:30:00 NULL 3 2
NULL 2016-04-18 07:40:00 NULL 4 NULL
2 2016-05-01 10:00:00 2 5 NULL
5 Start 2016-05-01 10:10:00 5 6 2
4 2016-05-01 10:20:00 4 7 5

最后,我们正在为每个开始标记的日期寻找相应的结束标记日期:

NULL End
Start Date End Date
2016-04-18 07:10:00.000 2016-04-18 07:20:00.000

备用BeginEndFlagedSet以防您的服务器不支持LAG和LEAD窗口函数(如我的):

2016-05-01 10:00:00.000