重新激活SQL

时间:2014-11-20 17:57:36

标签: sql postgresql window-functions

我有以下内容:

with t as (
      SELECT advertisable, EXTRACT(YEAR from day) as yy, EXTRACT(MONTH from day) as mon, 
             ROUND(SUM(cost)/1e6) as val
      FROM adcube dac
      WHERE advertisable IN (SELECT advertisable
                                 FROM adcube dac 
                                 GROUP BY advertisable
                                 HAVING SUM(cost)/1e6 > 100
                                )
      GROUP BY advertisable, EXTRACT(YEAR from day), EXTRACT(MONTH from day)
     )
select advertisable, min(yy * 10000 + mon) as yyyymm
from (select t.*,
             (row_number() over (partition by advertisable order by yy, mon) -
              row_number() over (partition by advertisable, val order by yy, mon)
             ) as grp
      from t
     )as foo
group by advertisable, grp, val
having count(*) >= 6 and val = 0
;  

这会跟踪停止消费4个月的帐户的激活日期。但是,我想跟踪重新激活日期。因此,如果帐户在4个月后再次开始花费,我可以看到该帐户的新开始日期?

2 个答案:

答案 0 :(得分:1)

您希望找到val > 0的帐户,并且有4个(或6个)前面的记录为0。

这是一个想法:

  • 计算查询中类似值的组。
  • 为每个组分配一个序号(val_seqnum)。
  • 然后拉出每条记录的先前值和序号。

现在,您需要以下内容为真的记录:

  • val > 0
  • prev_val = 0
  • 之前的val_seqnum >= 4(或任何您的门槛)。

以下查询应该这样做(假设t的定义相同):

select t.*
from (select t.* ,
             lag(val) over (partition by advertisable order by yy, mon) prev_val,
             lag(val_seqnum) over (partition by advertisable order by yy, mon) as prev_val_seqnum
      from (select t.*,
                   row_number() over (partition by advertisable, val, grp order by yy, mon) as val_seqnum
                  ) as grp
            from (select t.*,
                         (row_number() over (partition by advertisable order by yy, mon) -
                          row_number() over (partition by advertisable, val order by yy, mon)
                         ) as grp
                  from t
                 ) t
           ) t
     ) t
where val > 0 and prev_val = 0 and prev_val_seqnum >= 4;

答案 1 :(得分:1)

我认为这可以更简单(并且更快):

SELECT advertisable, ym AS reactivation_ym
FROM (
   SELECT advertisable
        , date_trunc('month', day) AS ym
        , SUM(cost) < 500000       AS asleep
        , count(SUM(cost) < 500000 OR NULL)
                OVER (PARTITION BY advertisable
                      ORDER BY date_trunc('month', day)
                      ROWS BETWEEN 4 PRECEDING AND 1 PRECEDING) AS ct
   FROM   adcube dac
   JOIN  (
      SELECT advertisable
      FROM   adcube
      GROUP  BY 1
      HAVING SUM(cost) > 1e8   -- really 10000000 ?
      ) x USING (advertisable)
   GROUP BY 1, 2
   ) sub
WHERE  NOT asleep
AND    ct = 4;

基于几个假设来填补缺失的信息 我在很大程度上解开了你的计算并简化了代码,使其比你原来的更短更快。

  • 计算每个advertisable过去4个月中有多少人的总数cost低于500000.只有当所有4个(现有)月份低于阈值时,该行才有资格。 (如果您没有所有月份的行,则需要决定如何处理缺失的行。您的问题中没有相关信息。)

使用count()作为窗口聚合函数和自定义框架。这是最近的相关答案,详细解释如下:

如何“嵌套”count()sum()
它们并不是真正嵌套的。它是一个聚合函数的窗口函数。详细说明: