我有这张桌子
表1
eventid entityid eventdate
----------------------------------------
123 xyz Jan-02-2019
541 xyz Jan-02-2019
234 xyz Jan-03-2019
432 xyz Jan-04-2019
111 xyz Jan-05-2019
124 xyz Jan-06-2019
123 xyz Jan-07-2019
234 xyz Jan-08-2019
432 xyz Jan-09-2019
111 xyz Jan-12-2019
我希望将最终结果显示为
entityid interval1 interval2
------------------------------
xyz 2 4
这里的间隔以天为单位。
计算间隔的逻辑是:
Ex-事件123和234多次发生,因此每次出现之间的日期差(如下所示)将最终添加到interval1中。 请注意-它的不必要234总是在下一行123中。之间可能还会有其他事件。
公式是
interval1 = datediff(日期,123的事件日期,234的事件日期)+ datediff(日期,123的事件日期,234的事件日期)+等等
时间间隔2相同,但事件432和111。
entityid eventid1 eventid2 event_date_diff
--------------------------------------------
xyz 123 234 1
xyz 123 234 1
xyz 432 111 1
xyz 432 111 3
这里的挑战是找出事件123在即将到来的行中是否有234个事件(不一定在紧接的下一行中),如果存在,则找到日期差。如果在123-234之间还有其他事件,那么我们需要忽略这些事件之间的事件。另外,如果123出现两次,则需要123的最新事件日期。
答案 0 :(得分:3)
让我们根据您的要求进行讨论,并构建必要的部分。不会按照您说的顺序进行处理,而是以使它们更易于理解的顺序进行处理。
如果123出现两次,则需要最新的
eventdate
为123。
这意味着我们需要创建一个范围边界。这很简单:
NextOccurence AS (SELECT eventId, entityId, eventDate,
LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
FROM Table1)
...这将为我们提供每一次事件的发生,以及下一次事件的发生(如果存在的话,这些可以仅限于您的“源”事件,但我不会为之烦恼在这里)。
这里的挑战是找出事件123在即将到来的行中是否有234个事件(不一定在紧接的下一行中),如果存在,则找到日期差。如果在123-234之间还有其他事件,那么我们需要忽略这些事件之间的事件。
(您之前提到过,如果有多个后续事件,则应为最低日期)。
为此,我们需要先映射事件:
EventMap AS (SELECT 123 AS original, 234 AS follow
UNION ALL
SELECT 432, 111)
...,并使用它来获取范围内的“下一个”后续事件,其中部分是greatest-n-per-group查询:
SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
FROM NextOccurence
JOIN EventMap
ON EventMap.original = NextOccurence.eventId
CROSS APPLY (SELECT TOP 1 Table1.eventDate
FROM Table1
WHERE Table1.entityId = NextOccurence.entityId
AND Table1.eventId = EventMap.follow
AND Table1.eventDate >= NextOccurence.eventDate
AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
ORDER BY Table1.eventDate) AS Table1
...至此,我们已经接近您的中间结果表:
| entityId | eventId | diff |
|----------|---------|------|
| xyz | 123 | 1 |
| xyz | 123 | 1 |
| xyz | 432 | 1 |
| xyz | 432 | 3 |
...,之后将是标准PIVOT
查询,以汇总结果。
最终查询最终如下所示:
WITH NextOccurence AS (SELECT eventId, entityId, eventDate,
LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
FROM Table1),
EventMap AS (SELECT 123 AS original, 234 AS follow
UNION ALL
SELECT 432, 111)
SELECT entityId, [123] AS '123-234', [432] AS '432-111'
FROM (SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
FROM NextOccurence
JOIN EventMap
ON EventMap.original = NextOccurence.eventId
CROSS APPLY (SELECT TOP 1 Table1.eventDate
FROM Table1
WHERE Table1.entityId = NextOccurence.entityId
AND Table1.eventId = EventMap.follow
AND Table1.eventDate >= NextOccurence.eventDate
AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
ORDER BY Table1.eventDate) AS Table1) AS d
PIVOT (SUM(diff)
FOR eventId IN ([123], [432])
) AS pvt
...产生预期结果:
| entityId | 123-234 | 432-111 |
|----------|---------|---------|
| xyz | 2 | 4 |
答案 1 :(得分:0)
根据我对问题的了解,我们被要求提供每个日期每个eventid的出现情况。但是,这些应以列而不是行表示。
我解决此问题的方法是,首先在cte内旋转数据,然后从每列中选择唯一值作为查询的交叉应用运算符。也许有更好的方法,但这对我来说最有意义。
DECLARE @T TABLE
(
EventId INT,
EntityId NVARCHAR(3),
EventDate DATETIME
);
INSERT INTO @T (EventId, EntityId, EventDate)
SELECT * FROM (VALUES
(123, 'xyz', '2019-01-02'),
(234, 'xyz', '2019-01-03'),
(432, 'xyz', '2019-01-04'),
(111, 'xyz', '2019-01-05'),
(124, 'xyz', '2019-01-06'),
(123, 'xyz', '2019-01-07'),
(234, 'xyz', '2019-01-08'),
(432, 'xyz', '2019-01-09'),
(111, 'xyz', '2019-01-12')
) X (EVENTID, ENTITYID, EVENTDATE);
with cte as (
select EntityId, [123] as Interval1, [234] as Interval2, [432] as Interval3, [111] as
Interval4, [124] as Interval5
from
(
select top 5 EntityId, EventId, min(eventdate) as ordering, count(distinct EventDate)
as
vol from @T
group by EntityId, EventId
order by ordering
) src
PIVOT
(
max(vol)
for EventId in ([123], [234], [432], [111], [124])
) as pvt)
select distinct EntityId, Interval1, Interval2, Interval3, Interval4, Interval5
from (select EntityId from cte) a
cross apply
(select Interval1 from cte where Interval1 is not null) b
cross apply
(select Interval2 from cte where Interval2 is not null) c
cross apply
(select Interval3 from cte where Interval3 is not null) d
cross apply
(select Interval4 from cte where Interval4 is not null) e
cross apply
(select Interval5 from cte where Interval5 is not null) f;
答案 2 :(得分:0)
您可以为此使用lead()
和条件聚合:
select sum(case when eventid = 123 and next_eventid = 234
then datediff(day, eventdate, next_eventdate)
end) as interval1,
sum(case when eventid = 432 and next_eventid = 111
then datediff(day, eventdate, next_eventdate)
end) as interval2
from (select t.*,
lead(eventid) over (partition by entityid order by eventdate) as next_eventid,
lead(eventdate) over (partition by entityid order by eventdate) as next_eventdate
from t
) t;
处理介入事件的最简单方法可能是有条件累积算术:
select sum(case when eventid = 123 and
then datediff(day, eventdate, next_eventdate_234)
end) as interval1,
sum(case when eventid = 432 and
then datediff(day, eventdate, next_eventdate_111)
end) as interval2
from (select t.*,
min(case when eventid = 234 then eventdate end) over (order by eventdate desc) as next_eventdate_234,
min(case when eventid = 111 then eventdate end) over (order by eventdate desc) as next_eventdate_111
from t
where eventid in (123, 234)
) t
where eventid in (123, 432);