如何使用SQL从同一张表中获取累积数据?

时间:2019-02-05 20:57:31

标签: sql sql-server

我有这张桌子

表1

eventid  entityid  eventdate
----------------------------------------
123       xyz      Jan-02-2019
541       xyz      Jan-02-2019
234       xyz      Jan-03-2019
432       xyz      Jan-04-2019
111       xyz      Jan-05-2019
124       xyz      Jan-06-2019
123       xyz      Jan-07-2019
234       xyz      Jan-08-2019
432       xyz      Jan-09-2019
111       xyz      Jan-12-2019

我希望将最终结果显示为

entityid  interval1  interval2 
------------------------------
xyz         2            4

这里的间隔以天为单位。

计算间隔的逻辑是:

Ex-事件123和234多次发生,因此每次出现之间的日期差(如下所示)将最终添加到interval1中。 请注意-它的不必要234总是在下一行123中。之间可能还会有其他事件。

公式是

interval1 = datediff(日期,123的事件日期,234的事件日期)+ datediff(日期,123的事件日期,234的事件日期)+等等

时间间隔2相同,但事件432和111。

entityid eventid1 eventid2  event_date_diff  
--------------------------------------------
xyz        123      234          1
xyz        123      234          1
xyz        432      111          1
xyz        432      111          3

这里的挑战是找出事件123在即将到来的行中是否有234个事件(不一定在紧接的下一行中),如果存在,则找到日期差。如果在123-234之间还有其他事件,那么我们需要忽略这些事件之间的事件。另外,如果123出现两次,则需要123的最新事件日期。

3 个答案:

答案 0 :(得分:3)

让我们根据您的要求进行讨论,并构建必要的部分。不会按照您说的顺序进行处理,而是以使它们更易于理解的顺序进行处理。

  

如果123出现两次,则需要最新的eventdate为123。

这意味着我们需要创建一个范围边界。这很简单:

NextOccurence AS (SELECT eventId, entityId, eventDate, 
                         LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
                  FROM Table1)

...这将为我们提供每一次事件的发生,以及下一次事件的发生(如果存在的话,这些可以仅限于您​​的“源”事件,但我不会为之烦恼在这里)。

  

这里的挑战是找出事件123在即将到来的行中是否有234个事件(不一定在紧接的下一行中),如果存在,则找到日期差。如果在123-234之间还有其他事件,那么我们需要忽略这些事件之间的事件。

(您之前提到过,如果有多个后续事件,则应为最低日期)。

为此,我们需要先映射事件:

EventMap AS (SELECT 123 AS original, 234 AS follow
             UNION ALL
             SELECT 432, 111)

...,并使用它来获取范围内的“下一个”后续事件,其中部分是查询:

SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
  FROM NextOccurence
  JOIN EventMap 
    ON EventMap.original = NextOccurence.eventId
  CROSS APPLY (SELECT TOP 1 Table1.eventDate
               FROM Table1
               WHERE Table1.entityId = NextOccurence.entityId
                     AND Table1.eventId = EventMap.follow
                     AND Table1.eventDate >= NextOccurence.eventDate
                     AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
               ORDER BY Table1.eventDate) AS Table1

...至此,我们已经接近您的中间结果表:

| entityId | eventId | diff |
|----------|---------|------|
| xyz      | 123     | 1    |
| xyz      | 123     | 1    |
| xyz      | 432     | 1    |
| xyz      | 432     | 3    |

...,之后将是标准PIVOT查询,以汇总结果。

最终查询最终如下所示:

WITH NextOccurence AS (SELECT eventId, entityId, eventDate, 
                       LEAD(eventDate) OVER(PARTITION BY eventId, entityId ORDER BY eventDate) AS nextOccurenceDate
                   FROM Table1),
     EventMap AS (SELECT 123 AS original, 234 AS follow
                  UNION ALL
                  SELECT 432, 111)
SELECT entityId, [123] AS '123-234', [432] AS '432-111'
FROM (SELECT NextOccurence.entityId, NextOccurence.eventId, DATEDIFF(day, NextOccurence.eventDate, Table1.eventDate) AS diff
      FROM NextOccurence
      JOIN EventMap 
        ON EventMap.original = NextOccurence.eventId
      CROSS APPLY (SELECT TOP 1 Table1.eventDate
                   FROM Table1
                   WHERE Table1.entityId = NextOccurence.entityId
                         AND Table1.eventId = EventMap.follow
                         AND Table1.eventDate >= NextOccurence.eventDate
                         AND (Table1.eventDate < NextOccurence.nextOccurenceDate OR NextOccurence.nextOccurenceDate IS NULL)
                   ORDER BY Table1.eventDate) AS Table1) AS d
PIVOT (SUM(diff)
       FOR eventId IN ([123], [432])
       ) AS pvt

Fiddle example

...产生预期结果:

| entityId | 123-234 | 432-111 |
|----------|---------|---------|
| xyz      | 2       | 4       |

答案 1 :(得分:0)

根据我对问题的了解,我们被要求提供每个日期每个eventid的出现情况。但是,这些应以列而不是行表示。

我解决此问题的方法是,首先在cte内旋转数据,然后从每列中选择唯一值作为查询的交叉应用运算符。也许有更好的方法,但这对我来说最有意义。

DECLARE @T TABLE
(
    EventId INT,
    EntityId NVARCHAR(3),
    EventDate DATETIME
);

INSERT INTO @T (EventId, EntityId, EventDate)
SELECT * FROM (VALUES
(123,       'xyz',      '2019-01-02'),
(234,       'xyz',      '2019-01-03'),
(432,       'xyz',      '2019-01-04'),
(111,       'xyz',      '2019-01-05'),
(124,       'xyz',      '2019-01-06'),
(123,       'xyz',      '2019-01-07'),
(234,       'xyz',      '2019-01-08'),
(432,       'xyz',      '2019-01-09'),
(111,       'xyz',      '2019-01-12')
) X (EVENTID, ENTITYID, EVENTDATE);

with cte as (
select EntityId, [123] as Interval1, [234] as Interval2, [432] as Interval3, [111] as 
Interval4, [124] as Interval5

from
(
select top 5 EntityId, EventId, min(eventdate) as ordering, count(distinct EventDate) 
as 
vol from @T
group by EntityId, EventId
order by ordering
) src
PIVOT
(
    max(vol)
    for EventId in ([123], [234], [432], [111], [124])
) as pvt)

select distinct EntityId, Interval1, Interval2, Interval3, Interval4, Interval5
from (select EntityId from cte) a
cross apply
(select Interval1 from cte where Interval1 is not null) b
cross apply
(select Interval2 from cte where Interval2 is not null) c
cross apply
(select Interval3 from cte where Interval3 is not null) d
cross apply
(select Interval4 from cte where Interval4 is not null) e
cross apply
(select Interval5 from cte where Interval5 is not null) f; 

答案 2 :(得分:0)

您可以为此使用lead()和条件聚合:

select sum(case when eventid = 123 and next_eventid = 234
                then datediff(day, eventdate, next_eventdate)
           end) as interval1,
       sum(case when eventid = 432 and next_eventid = 111
                then datediff(day, eventdate, next_eventdate)
           end) as interval2
from (select t.*,
             lead(eventid) over (partition by entityid order by eventdate) as next_eventid,
             lead(eventdate) over (partition by entityid order by eventdate) as next_eventdate
      from t
     ) t;

处理介入事件的最简单方法可能是有条件累积算术:

select sum(case when eventid = 123 and
                then datediff(day, eventdate, next_eventdate_234)
          end) as interval1,
       sum(case when eventid = 432 and
                then datediff(day, eventdate, next_eventdate_111)
          end) as interval2           
from (select t.*,
             min(case when eventid = 234 then eventdate end) over (order by eventdate desc) as next_eventdate_234,
             min(case when eventid = 111 then eventdate end) over (order by eventdate desc) as next_eventdate_111
      from t
      where eventid in (123, 234)
     ) t
where eventid in (123, 432);