Question

我有下表，其中DISTINCT CustomerID和Date_Trnc(TIME,MONTH)如下所示。如果客户在同一个月内有很多交易，它在我们的表格中仅表示为一个记录。

<头>

日期	客户 ID
2021-01-01	111
2021-01-01	112
2021-02-01	111
2021-03-01	113
2021-03-01	115
2021-04-01	119

对于给定的 M 月，我想查看在 M-4（M 月前四个月）和 M-2（M 月前两个月）之间的任何时间至少从我们这里购买过一次的 Distinct CustomerID，并且这些客户做了不购买 M-1 月（上个月）。

基本上，如果我们查看第 6 个月，我希望所有在第 2 个月和第 4 个月之间从我们这里购买的不同客户（回顾 3 个月，不包括上个月）但后来没有购买上个月（第 5 个月）。

我想要的输出是一个按 DATE（M 月）分组的表格，并显示曾经购买过（在 M-4 和 M-2 之间）但在上个月（M- 1).

<头>

日期 (M)	客户 ID
2021-01-01	111
2021-01-01	114
2021-02-01	118
2021-02-01	113
2021-02-01	115
2021-03-01	119

Answer 1

您的数据已经只有一行按月和客户 ID，所以只需使用 lag()：

select t.*
from (select t.*,
             lag(yyyymm) over (partition by customerid order by yyyymm) as prev_yyyymm
      from t
     ) t
where prev_yyyymm >= date_add(yyyymm, interval -4 month) and
      prev_yyyymm <= date_add(yyyymm, interval -2 month);

或者更简单地使用 qualify：

select t.*
from t
where 1=1
qualify lag(yyyymm) over (partition by customerid order by yyyymm) >= date_add(yyyymm, interval -4 month) and
        lag(yyyymm) over (partition by customerid order by yyyymm) <= date_add(yyyymm, interval -2 month);

Answer 2

使用以下方法

select date, customerid
from (
  select *, 
    array_agg(customerid) over(order by pos range between 4 preceding and 2 preceding) bought_in_3_months_before_prev,
    array_agg(customerid) over(order by pos range between 1 preceding and 1 preceding) bought_in_prev,
  from (
    select *, date_diff(date, '2000-01-01', month) pos
    from `project.dataset.table`
  )
) t, unnest(array(
  select distinct id
  from t.bought_in_3_months_before_prev id
  where not id in (select * from t.bought_in_prev)
)) customerid

<块引用>

更新：
如果表中有大量数据导致内存/资源相关问题 - 使用以下方法

select * from (
  select date_add(date, interval offset month) as date, customerid
  from `project.dataset.table`, unnest([2,3,4]) offset 
  except distinct 
  select date_add(date, interval 1 month) as date, customerid
  from `project.dataset.table`
)
where date <= (select max(date) from `project.dataset.table`)

以防万一如果您对 except distinct 运算符不完全满意 - 您可以使用以下版本和更常见/传统的 union distinct - 两个版本都非常不言自明，因此更多的是偏好问题

select date, customerid from (
  select date_add(date, interval offset month) as date, customerid, true flag 
  from `project.dataset.table`, unnest([2,3,4]) offset 
  union distinct
  select date_add(date, interval 1 month) as date, customerid, false flag
  from `project.dataset.table`
)
where date <= (select max(date) from `project.dataset.table`)
group by date, customerid
having logical_and(flag)

SQL BigQuery：识别停止购买产品的客户

2 个答案: