SQL计数天与间隙/重叠

时间:2018-01-30 22:07:39

标签: sql oracle proc-sql

我正在研究"计算日期"问题几乎与这一个相同。我有一个日期列表,需要计算使用不包括重复的天数,并处理差距。相同的输入和输出。

来自:Markus Jarderot

Input
ID   d1           d2
 1   2011-08-01   2011-08-08
 1   2011-08-02   2011-08-06
 1   2011-08-03   2011-08-10
 1   2011-08-12   2011-08-14
 2   2011-08-01   2011-08-03
 2   2011-08-02   2011-08-06
 2   2011-08-05   2011-08-09

Output
ID   hold_days
 1          11
 2           8

SQL to find time elapsed from multiple overlapping intervals

但是对于我的生活,我无法理解Markus Jarderot的解决方案。

SELECT DISTINCT
    t1.ID,
    t1.d1 AS date,
    -DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1) AS n
FROM Orders t1
LEFT JOIN Orders t2                   -- Join for any events occurring while this
    ON t2.ID = t1.ID                  -- is starting. If this is a start point,
    AND t2.d1 <> t1.d1                -- it won't match anything, which is what
    AND t1.d1 BETWEEN t2.d1 AND t2.d2 -- we want.
GROUP BY t1.ID, t1.d1, t1.d2
HAVING COUNT(t2.ID) = 0

为什么DATEDIFF(DAY, (SELECT MIN(d1) FROM Orders), t1.d1)从整个列表中选择min(d1)?这是不管身份证。

t1.d1 BETWEEN t2.d1 AND t2.d2 do是什么?是否确保只计算重叠间隔?

与group by相同,我认为因为如果在相同的相同时段被丢弃的话?我试图用手追踪解决方案,但更加困惑。

4 个答案:

答案 0 :(得分:1)

这主要与我的回答here (including explanation)重复,但在structure(list(a = structure(c(3L, 17L, 13L, 14L, 6L, 10L, 15L, 11L, 16L, 8L, 5L, 7L, 9L, 12L, 1L, 2L, 4L), .Label = c("00.99", "01.9", "06.4", "10.3", "10.6", "21.54", "22.72", "221.85", "23.4", "25.22", "287.85", "4.22", "40.22", "42.50", "47.30", "6.40", "80.80"), class = "factor")), .Names = "a", row.names = c(NA, -17L), class = "data.frame") 列中包含了分组。它应该使用单个表扫描,并且不需要递归子查询因子子句(CTE)或自联接。

SQL Fiddle

Oracle 11g R2架构设置

id

查询1

CREATE TABLE your_table ( id, usr, start_date, end_date ) AS
  SELECT 1, 'A', DATE '2017-06-01', DATE '2017-06-03' FROM DUAL UNION ALL
  SELECT 1, 'B', DATE '2017-06-02', DATE '2017-06-04' FROM DUAL UNION ALL -- Overlaps previous
  SELECT 1, 'C', DATE '2017-06-06', DATE '2017-06-06' FROM DUAL UNION ALL
  SELECT 1, 'D', DATE '2017-06-07', DATE '2017-06-07' FROM DUAL UNION ALL -- Adjacent to previous
  SELECT 1, 'E', DATE '2017-06-11', DATE '2017-06-20' FROM DUAL UNION ALL
  SELECT 1, 'F', DATE '2017-06-14', DATE '2017-06-15' FROM DUAL UNION ALL -- Within previous
  SELECT 1, 'G', DATE '2017-06-22', DATE '2017-06-25' FROM DUAL UNION ALL
  SELECT 1, 'H', DATE '2017-06-24', DATE '2017-06-28' FROM DUAL UNION ALL -- Overlaps previous and next
  SELECT 1, 'I', DATE '2017-06-27', DATE '2017-06-30' FROM DUAL UNION ALL
  SELECT 1, 'J', DATE '2017-06-27', DATE '2017-06-28' FROM DUAL UNION ALL -- Within H and I
  SELECT 2, 'K', DATE '2011-08-01', DATE '2011-08-08' FROM DUAL UNION ALL -- Your data below
  SELECT 2, 'L', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 2, 'M', DATE '2011-08-03', DATE '2011-08-10' FROM DUAL UNION ALL
  SELECT 2, 'N', DATE '2011-08-12', DATE '2011-08-14' FROM DUAL UNION ALL
  SELECT 3, 'O', DATE '2011-08-01', DATE '2011-08-03' FROM DUAL UNION ALL
  SELECT 3, 'P', DATE '2011-08-02', DATE '2011-08-06' FROM DUAL UNION ALL
  SELECT 3, 'Q', DATE '2011-08-05', DATE '2011-08-09' FROM DUAL;

<强> Results

SELECT id,
       SUM( days ) AS total_days
FROM   (
  SELECT id,
         dt - LAG( dt ) OVER ( PARTITION BY id
                               ORDER BY dt ) + 1 AS days,
         start_end
  FROM   (
    SELECT id,
           dt,
           CASE SUM( value ) OVER ( PARTITION BY id
                                    ORDER BY dt ASC, value DESC, ROWNUM ) * value
             WHEN 1 THEN 'start'
             WHEN 0 THEN 'end'
           END AS start_end
    FROM   your_table
    UNPIVOT ( dt FOR value IN ( start_date AS 1, end_date AS -1 ) )
  )
  WHERE start_end IS NOT NULL
)
WHERE start_end = 'end'
GROUP BY id

答案 1 :(得分:0)

强力方法是创建所有日子(在递归查询中),然后计算:

with dates(id, day, d2) as
(
  select id, d1 as day, d2 from mytable
  union all
  select id, day + 1, d2 from dates where day < d2
)
select id, count(distinct day)
from dates
group by id
order by id;

不幸的是,某些Oracle版本中存在一个错误,并且带有日期的递归查询在那里不起作用。因此,请尝试使用此代码,看看它是否适用于您的系统。 (我有Oracle 11.2,那里的bug仍然存在;所以我猜你需要Oracle 12c。)

答案 2 :(得分:0)

如果所有间隔都是从不同日期开始,请按d1升序考虑它们,计算从d1到下一个间隔的天数。 您可以丢弃其中包含的间隔。 最后一个时间间隔没有追随者。

此查询应该为您提供每个间隔给出的天数

select a.id, a.d1,nvl(min(b.d1), a.d2) - a.d1
from orders a
left join orders b
on a.id = b.id and a.d1 < b.d1 and a.d2 between b.d1 and b.d2
group by a.id, a.d1

然后按ID和总和分组

答案 3 :(得分:0)

我猜马库斯&#39;我们的想法是找到不在其他范围内的所有起点和所有不在其中的终点。然后只取第一个起点直到第一个结束点,然后是下一个起点直到下一个结束点等。由于马库斯没有使用窗函数来编号起点和终点,他必须找到一个更复杂的点实现这一目标的方法。以下是ROW_NUMBER的查询。也许这会给你一个开始在马库斯寻找的东西&#39;查询。

select startpoint.id, sum(endpoint.day - startpoint.day)
from
(
  select id, d1 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d1 > m2.d1 and m1.d1 <= m2.d2
  )
) startpoint
join
(
  select id, d2 as day, row_number() over (partition by id order by d1) as rn
  from mytable m1
  where not exists
  (
    select *
    from mytable m2
    where m1.id = m2.id 
    and m1.d2 >= m2.d1 and m1.d2 < m2.d2
  )
) endpoint on endpoint.id = startpoint.id and endpoint.rn = startpoint.rn
group by startpoint.id
order by startpoint.id;