与4个人一起考虑以下数据:
ID Date (DMY)
1 2014-12-30
2 2014-12-30
3 2014-12-30
4 2014-12-30
1 2014-12-31
2 2014-12-31
3 2015-01-01
1 2015-01-01
3 2015-01-02
1 2015-01-02
3 2015-01-03
1 2015-01-03
4 2015-01-03
现在我想做的是检测ID每天的变化。最初当我想到它时,这是一个相对容易的问题,但是非常困难,因为:
所以我希望SQL返回日期:2014-12-30到2014-12-31,2015-01-01到2015-01-03。
在我的拙见中,这是非常困难的,我不知道如何解决这个问题。 TSQL甚至可以解决这些问题吗?
谢谢!
答案 0 :(得分:0)
所以,有人在他们的第一次出场到最后一次出现的数据。这是累积总和的一种方法:SQL Fiddle
with persondates as (
select id, min(date) as dte, 1 as inc
from data
group by id
union all
select id, dateadd(day, 1, max(date)) as dte, -1 as inc
from data
group by id
)
select dte, min(cume) as actives
from (select dte, sum(inc) over (order by dte) as cume
from persondates
) d
group by dte
order by dte;
答案 1 :(得分:0)
SQL 2008中的这项工作SQL Fiddle
我无法告诉您数据大小的效率,但不应该有任何问题。
WITH dateGroup(gDate)
AS (
-- SEE HOW MANY DIFFERENT DATES ARE THERE
SELECT DISTINCT DATE
FROM [dbo].[testData]
), userActivity (id, dBegin, dEnd)
AS (
-- SEE THE ACTIVITY WINDOW FOR EACH USER
SELECT ID, MIN(DATE), MAX(DATE)
FROM [dbo].[testData]
GROUP BY ID
), rangeDate ( gDate, users)
AS (
-- SEE WHICH USERS ARE ACTIVE ON EACH DATE
SELECT *
FROM dateGroup as p OUTER APPLY
(SELECT STUFF(( SELECT ';' + CAST(a.id AS VARCHAR(10) )
FROM userActivity AS a
WHERE p.gDate BETWEEN a.dBegin AND a.dEnd
ORDER BY a.id
FOR XML PATH('') ), 1,1,'') AS users ) AS f
), activityWindow (users)
AS (
-- DETECT WHEN THE ACTIVE GROUP CHANGE
SELECT distinct users
FROM rangeDate
)
-- SEE THE RANGE FOR EACH GROUP.
SELECT *
FROM activityWindow as p OUTER APPLY
(SELECT STUFF(( SELECT ' ; ' + CAST(a.gDate AS VARCHAR(10) )
FROM rangeDate AS a
WHERE p.users = a.users
FOR XML PATH('') ), 1,1,'') AS activity_window ) AS f
您不仅拥有日期范围。
您有哪个用户在该范围内处于活动状态。您可以按;
分开
你也看到了所有的日子,所以如果星期日没有数据,你可以看到它
如果只想要开始结束,则会按;
进行拆分并获取第一个和最后一个日期。
答案 2 :(得分:0)
试试这个:
with c as(
select min(d) as d from t group by id
union
select max(d) as d from t group by id),
u as(
select * from c
union all
select dateadd(dd, 1, d) from c
where d <> (select max(d) from c) and d <> (select min(d) from c)),
r as(select d, row_number() over(order by d) rn from u)
select r1.d, r2.d from r r1
join r r2 on r1.rn + 1 = r2.rn
where r2.rn % 2 = 0
如果我是正确的,我的想法是选择高峰日期,即添加某人或某人的最后一天。它是在第一个cte完成的。第二个cte用这些高峰日期的下一个日期填充高峰日期。第三个cte只是为跟随连接的行编号来获取间隔。
我不完全确定这是否是正确的逻辑,但它适用于提供的测试数据http://sqlfiddle.com/#!3/2d7a6/6