聚集超前/滞后

时间:2018-07-27 13:13:56

标签: sql sql-server tsql hadoop hive

我有一个称为活动的表,该表具有一个memberId和一个时间戳。我想找出在给定的月份中有多少成员执行了一项活动(即-在活动表中有记录),而在过去12个月中没有进行过某项活动。我认为领先/落后在这里会有所帮助,但我无法将自己的大脑包在周围。

(我在这里同时标记了Apache Hadoop和MS SQL Server,因为我可以在两者中都这样做,而且我认为我可以很容易地将一个解决方案转换为另一个解决方案)。

任何帮助表示赞赏!

谢谢!

2 个答案:

答案 0 :(得分:1)

使用LAG函数时,我们需要首先为每个成员和月份创建一条记录,使用LAG函数获取重要的活动月份,最后使用where子句仅获取我们想要的内容:

DECLARE 
  @year int = 2018, 
  @month int = 7;

WITH
  monthwise (MemberID, FirstOfMonth) AS (
    SELECT DISTINCT  MemberID, DATEADD(month, DATEDIFF(month, 0, ActivityDate), 0)
    FROM Activities
  ),
  prevActivity (MemberID, FirstOfMonth, prevFirstOfMonth) AS (
    SELECT MemberID, FirstOfMonth
    , LAG(FirstOfMonth) OVER (PARTITION BY MemberID ORDER BY FirstOfMonth)
    FROM monthwise
  )
SELECT MemberID
FROM prevActivity
WHERE MONTH(FirstOfMonth) = @month
  AND YEAR(FirstOfMonth) = @year
  AND (prevFirstOfMonth IS NULL OR DATEDIFF(month, prevFirstOfMonth, FirstOfMonth) > 12)

您也可以不使用LAG功能来执行此操作:使用两个查询,一个查询用于本月活动的成员,一个查询用于在过去十二个月中活动的成员。然后使用内部联接和左联接查找本月活动的成员,而前几个月没有活动。

DECLARE 
  @year int = 2018, 
  @month int = 7;

WITH
  this (MemberID) AS (
    SELECT DISTINCT MemberID
    FROM Activities 
    WHERE YEAR(ActivityDate) = @year
      AND MONTH(ActivityDate) = @month
  ),
  prev (MemberID) AS (
    SELECT DISTINCT MemberID
    FROM Activities
    WHERE ActivityDate < DATEADD(month, @month-1 +12*(@year-1900), 0)
      AND ActivityDate >= DATEADD(month, @month-1 +12*(@year-1901), 0)
  )
SELECT m.MemberID
FROM Members m
  INNER JOIN this ON m.MemberID = this.MemberID
  LEFT JOIN prev ON m.MemberID = prev.MemberID
WHERE prev.MemberID IS NULL

答案 1 :(得分:-1)

您可以使用lag()进行此操作:

select year(ts), month(ts),
       (count(distinct memberid) - 
        count(distinct case when prev_ts > dateadd(year, -1, ts) then memberid)
       ) as 
from (select memberid, 
             lag(ts) over (partition by memberid order by ts) as prev_ts
      from activities a
     ) a
group by year(ts), month(ts);