为多次重复的每个ID选择sql中end_date> = start_date的行

时间:2018-12-16 21:55:47

标签: sql sql-server

为图像附加数据的外观。在我的表格中,我有3列idstart dateend date,以及类似的值:

id     start date  end date
-------------------------------
100    2015-01-01  2015-12-31
100    2016-01-10  2018-12-31
200    2015-02-15  2016-03-15
200    2016-03-15  2016-12-31
300    2016-01-01  2016-12-31
400    2017-01-01  2017-12-31
500    2017-02-01  2017-12-31
600    2017-01-15  2017-03-05
600    2017-02-01  2018-12-31

我希望输出为

id     start date  end date
--------------------------------
100    2015-01-01  2015-12-31
100    2016-01-10  2018-12-31
200    2015-02-15  2016-12-31
300    2016-01-01  2016-12-31
400    2017-01-01  2017-12-31
500    2017-02-01  2017-12-31
600    2017-01-15  2018-12-31

查询:

select 
    id, *
from 
    dbo.test_sl 
where 
    id in (select id
           from dbo.test_sl 
           where end_date >= start_date 
           group by id)

请帮助我获得所需的输出。

enter image description here

3 个答案:

答案 0 :(得分:1)

假设只能将两个记录合并在一起,则可以LEFT JOIN与表本身合并,然后使用CASE显示自合并记录的结束日期(如果有)。

SELECT
    t1.id,
    min(t1.start_date),
    CASE WHEN t2.end_date IS NULL THEN t1.end_date ELSE t2.end_date END
FROM
    table t1
    LEFT JOIN table t2 
        ON  t1.id = t2.id 
        AND t2.start_date > t1.start_date
        AND t2.start_date <= t1.end_date
GROUP BY 
    t1.id,
    CASE WHEN t2.end_date IS NULL THEN t1.end_date ELSE t2.end_date END
ORDER BY 1

经过this SQL Fiddle

的测试

答案 1 :(得分:1)

这是一个空白与孤岛问题的示例。在这种情况下,您要查找对于同一id 不重叠的相邻行。这些是小组的开始。提供分组编号的组起始位置的累积总和,可用于汇总。

在查询中,它看起来像:

select id, min(startdate), max(enddate)
from (select t.*,
             sum(isstart) over (partition by id order by startdate) as grp
      from (select t.*,
                   (case when exists (select 1
                                      from test_sl t2
                                      where t2.id = t.id and
                                            t2.startdate < t.startdate and
                                            t2.enddate >= t.startdate
                                     )
                         then 0 else 1
                     end) as isstart
            from test_sl t
           ) t
      ) t
group by id, grp;

答案 2 :(得分:0)

这是使用递归CTE的解决方案。

基本上,它循环遍历每个id的日期,并为重叠的end_date / start_date保留最小的start_date。

然后将结果分组,因此不再有重叠。

在妊娠期测试here

WITH SRC AS
(
  SELECT id, start_date, end_date, 
   row_number() over (partition by id order by start_date) as rn
  FROM test_sl
)
, RCTE AS
(
  SELECT id, rn, start_date, end_date
  FROM SRC
  WHERE rn = 1

  UNION ALL

  SELECT t.id, t.rn, iif(r.end_date >= t.start_date, r.start_date, t.start_date), t.end_date
  FROM RCTE r
  JOIN SRC t ON t.id = r.id AND t.rn = r.rn + 1
)
SELECT id, start_date, max(end_date) as end_date
FROM RCTE
GROUP BY id, start_date
ORDER BY id, start_date;