Question

我有一张这样的表

DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null

我想得到一个时间段内每个状态的计数，其中只有最后一个状态对给定的entry_id有效。

我现在使用的是这个（动态日期）：

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

问题是当有一些entry_id有多个条目相同的start_date时，它会出错。（在这种情况下，我并不特别关心选择哪种状态，只选择1只）

一些测试数据：

status_id   Entry_id    Start_date
496 45173   2010-09-29 18:04:33.000
490 45173   2010-09-29 18:48:20.100
495 45173   2010-09-29 19:25:29.300
489 45174   2010-09-29 18:43:01.500
493 45175   2010-09-29 18:48:00.500
493 45175   2010-09-29 21:16:02.700
489 45175   2010-09-30 17:52:12.100
493 45176   2010-09-29 17:55:21.300
492 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176   2010-09-29 18:20:52.200 <------ This is the one that gives the problems

结果应为

495 1
489 2
492 1 (or 493 1)

Answer 1

如果我理解正确，您希望计算您的时间段内特定状态的不同条目...如果是这样，您应该使用DISTINCT中的count()条款更改计数（*）计数（不同的Entry_id）

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(distinct Entry_Id) as cnt from 
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP

修改

只要您不关心给定条目的返回状态，我认为您可以修改内部查询以返回第一个状态并加入状态

with c (Status_Id, Entry_Id, Start_Date) AS ( select Status_Id, Entry_Id, Start_Date from tbl where (End_Date BETWEEN '19000101' AND '21000101') AND ((Start_Date BETWEEN '19000101' AND '21000101') OR End_Date <= '21000101')) select c.Status_Id, count(c.Entry_Id) as cnt from (select Entry_Id, Start_Date, (select top 1 Status_id from c where Entry_Id = CC.Entry_Id and Start_Date = CC.Start_Date) as Status_Id from (select Entry_Id, max(start_date) as start_date from c group by Entry_Id) as CC) d inner join c on c.Entry_Id = d.Entry_Id and c.start_date = d.start_date and c.status_id = d.status_id GROUP BY c.Status_Id

结果

Status_id Count 489 2 492 1 495 1

Answer 2

基于OP可爱评论的替代答案。

WITH
   [sequenced_data]
AS
(
  SELECT
    *,
    ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
  FROM
    tbl
  WHERE
    start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id

只有在没有单个字段可以唯一标识个别记录的情况下，才需要ROW_NUMBER（）函数。可以在数据中存在唯一标识列的位置写入备用查询。但是，SQL Server在优化上面的ROW_NUMBER（）查询方面非常有效，它应该（假设相关索引）有效。

修改

有人刚刚向我建议人们不喜欢长代码，他们更喜欢紧凑的代码。所以CTE版本已被替换为内联版本（由于解释原因，CTE真的只是帮助细分了查询，并且如果需要，还在编辑历史中）...

修改

ROW_NUMBER（）不能构成WHERE子句的一部分，正如OP所发现的那样。通过将一个CTE放回来更新查询。

Answer 3

我自己找到了解决方案：

with c (Status_Id, Entry_Id, Start_Date) AS (
  select Status_Id, Entry_Id, Start_Date from tbl where
  (End_Date BETWEEN '19000101' AND '21000101')
  AND ((Start_Date BETWEEN '19000101' AND '21000101')
  OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from 
(select max(Status_Id) as Status_Id, c.Entry_Id from --<--- ADDED
 (select Entry_Id, max(start_date) as start_date from c
  group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
group by c.Entry_Id) y  --<--- ADDED
GROUP BY Status_Id WITH ROLLUP

具有日期重复的时间范围内的最大值

3 个答案: