我有一张这样的表
DateTime start_time not null,
DateTime end_time not null,
Status_Id int not null,
Entry_Id int not null
我想得到一个时间段内每个状态的计数,其中只有最后一个状态对给定的entry_id有效。
我现在使用的是这个(动态日期):
with c (Status_Id, Entry_Id, Start_Date) AS (
select Status_Id, Entry_Id, Start_Date from tbl where
(End_Date BETWEEN '19000101' AND '21000101')
AND ((Start_Date BETWEEN '19000101' AND '21000101')
OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from
(select Entry_Id, max(start_date) as start_date from c
group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP
问题是当有一些entry_id有多个条目相同的start_date时,它会出错。 (在这种情况下,我并不特别关心选择哪种状态,只选择1只)
一些测试数据:
status_id Entry_id Start_date
496 45173 2010-09-29 18:04:33.000
490 45173 2010-09-29 18:48:20.100
495 45173 2010-09-29 19:25:29.300
489 45174 2010-09-29 18:43:01.500
493 45175 2010-09-29 18:48:00.500
493 45175 2010-09-29 21:16:02.700
489 45175 2010-09-30 17:52:12.100
493 45176 2010-09-29 17:55:21.300
492 45176 2010-09-29 18:20:52.200 <------ This is the one that gives the problems
493 45176 2010-09-29 18:20:52.200 <------ This is the one that gives the problems
结果应为
495 1
489 2
492 1 (or 493 1)
答案 0 :(得分:2)
如果我理解正确,您希望计算您的时间段内特定状态的不同条目...如果是这样,您应该使用DISTINCT
中的count()
条款更改计数(*)计数(不同的Entry_id)
with c (Status_Id, Entry_Id, Start_Date) AS (
select Status_Id, Entry_Id, Start_Date from tbl where
(End_Date BETWEEN '19000101' AND '21000101')
AND ((Start_Date BETWEEN '19000101' AND '21000101')
OR End_Date <= '21000101'))
select Status_Id, count(distinct Entry_Id) as cnt from
(select Entry_Id, max(start_date) as start_date from c
group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
GROUP BY Status_Id WITH ROLLUP
修改强>
只要您不关心给定条目的返回状态,我认为您可以修改内部查询以返回第一个状态并加入状态
with c (Status_Id, Entry_Id, Start_Date) AS (
select Status_Id, Entry_Id, Start_Date from tbl where
(End_Date BETWEEN '19000101' AND '21000101')
AND ((Start_Date BETWEEN '19000101' AND '21000101')
OR End_Date <= '21000101'))
select c.Status_Id, count(c.Entry_Id) as cnt from
(select Entry_Id, Start_Date, (select top 1 Status_id from c where Entry_Id = CC.Entry_Id and Start_Date = CC.Start_Date) as Status_Id
from (select Entry_Id, max(start_date) as start_date from c
group by Entry_Id) as CC) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
and c.status_id = d.status_id
GROUP BY c.Status_Id
结果
Status_id Count
489 2
492 1
495 1
答案 1 :(得分:1)
基于OP可爱评论的替代答案。
WITH
[sequenced_data]
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY entry_id ORDER BY start_time DESC, status_id DESC) AS [sequence_id]
FROM
tbl
WHERE
start_time < '21:00' AND end_time > '19:00'
)
SELECT status_id, COUNT(*)
FROM [sequenced_data]
WHERE sequence_id = 1
GROUP BY status_id
只有在没有单个字段可以唯一标识个别记录的情况下,才需要ROW_NUMBER()函数。可以在数据中存在唯一标识列的位置写入备用查询。但是,SQL Server在优化上面的ROW_NUMBER()查询方面非常有效,它应该(假设相关索引)有效。
修改强>
有人刚刚向我建议人们不喜欢长代码,他们更喜欢紧凑的代码。所以CTE版本已被替换为内联版本(由于解释原因,CTE真的只是帮助细分了查询,并且如果需要,还在编辑历史中)...
修改强>
ROW_NUMBER()不能构成WHERE子句的一部分,正如OP所发现的那样。通过将一个CTE放回来更新查询。
答案 2 :(得分:0)
我自己找到了解决方案:
with c (Status_Id, Entry_Id, Start_Date) AS (
select Status_Id, Entry_Id, Start_Date from tbl where
(End_Date BETWEEN '19000101' AND '21000101')
AND ((Start_Date BETWEEN '19000101' AND '21000101')
OR End_Date <= '21000101'))
select Status_Id, count(*) as cnt from
(select max(Status_Id) as Status_Id, c.Entry_Id from --<--- ADDED
(select Entry_Id, max(start_date) as start_date from c
group by Entry_Id) d inner join
c on c.Entry_Id = d.Entry_Id
and c.start_date = d.start_date
group by c.Entry_Id) y --<--- ADDED
GROUP BY Status_Id WITH ROLLUP