每个ID的订单计数,并在BigQuery中计算订单之间的时间

时间:2020-11-05 10:50:44

标签: sql datetime count google-bigquery date-arithmetic

我正在处理客户购买数据,并试图在Google BigQuery中编写一个查询,该查询按日期对所有购买进行排序,并添加每个客户的购买/订单计数(order_count)。另外,我想计算一个客户的订单之间的时间间隔(以天为单位)(purchase_latency)。我的查询当前如下所示:

    select
email,
first_name,
last_name,
order_number,
purchase_date,
order_price,
d.code,
from my_order_data 
left join
unnest(discount_codes) as d

包含“ order_count”和“ purchase_latency”的结果应如下所示:

email   | order_number | purchase_date               | order_price | order_count | purchase_latency 
a@a.com | 34874        | 2020-01-02 16:20:12 UTC     | 20,-        | 1           | 0            |
a@a.com | 43598        | 2020-01-18 12:00:00 UTC     | 30,-        | 2           | 16           |
a@a.com | 47520        | 2020-01-30 08:05:00 UTC     | 15,-        | 3           | 12           |
b@b.com | 23598        | 2019-03-25 22:10:00 UTC     | 22,-        | 1           | 0            |
b@b.com | 25459        | 2019-03-31 17:35:00 UTC     | 55,-        | 2           | 6            |

我如何添加“ order_count”的编号以及“ purchase_latency”的计算?

非常感谢您!

1 个答案:

答案 0 :(得分:1)

您可以使用窗口功能:

  • 要通过增加购买日期来枚举每个客户的订单,可以使用row_number()

  • lag()检索“先前”购买的日期,您可以使用date_diff()计算与当前日期的差额:

所以:

select
    email,
    first_name,
    last_name,
    order_number,
    purchase_date,
    order_price,
    row_number() over(partition by email order by purchase_date) order_count,
    date_diff(
        date(purchase_date),
        coalesce(date(lag(purchase_date) over(partition by email order by purchase_date)), date(purchase_date)),
        day
    ) purchase_latency 
from my_order_data od
left join unnest(od.discount_codes) as dc

注意:我强烈建议在查询中的所有列名前加上它们所属的表的别名;这样一来,查询就变得清晰无比,而且更容易跟踪。