具有日期重复的时间范围内的最大值

时间:2010-10-28 10:02:13

标签: sql sql-server-2005 aggregate

我有一张这样的表

DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null

我想得到一个时间段内每个状态的计数,其中只有最后一个状态对给定的entry_id有效。

我现在使用的是这个(动态日期):

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

问题是当有一些entry_id有多个条目相同的start_date时,它会出错。 (在这种情况下,我并不特别关心选择哪种状态,只选择1只)

一些测试数据:

status_id   Entry_id    Start_date
496 45173   2010-09-29 18:04:33.000
490 45173   2010-09-29 18:48:20.100
495 45173   2010-09-29 19:25:29.300
489 45174   2010-09-29 18:43:01.500
493 45175   2010-09-29 18:48:00.500
493 45175   2010-09-29 21:16:02.700
489 45175   2010-09-30 17:52:12.100
493 45176   2010-09-29 17:55:21.300
492 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems

结果应为

495 1
489 2
492 1 (or 493 1)

3 个答案:

答案 0 :(得分:2)

如果我理解正确,您希望计算您的时间段内特定状态的不同条目...如果是这样,您应该使用DISTINCT中的count()条款更改计数(*)计数(不同的Entry_id)

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(distinct Entry_Id) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

修改

只要您不关心给定条目的返回状态,我认为您可以修改内部查询以返回第一个状态并加入状态

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select c.Status_Id, count(c.Entry_Id) as cnt from 
 (select Entry_Id, Start_Date, (select top 1 Status_id from c where Entry_Id = CC.Entry_Id and Start_Date = CC.Start_Date) as Status_Id
  from (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) as CC) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
and c.status_id = d.status_id
GROUP BY c.Status_Id

结果

Status_id Count
 489       2
 492       1
 495       1

答案 1 :(得分:1)

基于OP可爱评论的替代答案。

WITH
   [sequenced_data]
AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
  FROM
    tbl
  WHERE
    start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id

只有在没有单个字段可以唯一标识个别记录的情况下,才需要ROW_NUMBER()函数。可以在数据中存在唯一标识列的位置写入备用查询。但是,SQL Server在优化上面的ROW_NUMBER()查询方面非常有效,它应该(假设相关索引)有效。

修改

有人刚刚向我建议人们不喜欢长代码,他们更喜欢紧凑的代码。所以CTE版本已被替换为内联版本(由于解释原因,CTE真的只是帮助细分了查询,并且如果需要,还在编辑历史中)...

修改

ROW_NUMBER()不能构成WHERE子句的一部分,正如OP所发现的那样。通过将一个CTE放回来更新查询。

答案 2 :(得分:0)

我自己找到了解决方案:

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
(select max(Status_Id) as Status_Id, c.Entry_Id from --<--- ADDED
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
group by c.Entry_Id) y  --<--- ADDED
GROUP BY Status_Id WITH ROLLUP