我使用bigquery #standardsql
来处理表格。该表将记录在第9个月和第10个月购买商品的用户的转化(1)。对于未在第10个月购买的用户,其行中只有0
到目前为止,这是custom_coded
(case when row_number()
over (partition by customer_id order by purchase_date asc) =
count(*) over (partition by customer_id)
then 1 else 0 END) AS custom_coded
我的期望是customer_id = 288
0
只有custom_coded
,因为他没有在下个月或第10个月购买。customer_id = 879
预计会1
1}}在他最新的purchase_date
中,因为他在第10个月有购买记录
我之前在这个帖子中询问过(Decode maximum number in rows for sql),但数据集不符合我要执行的分析的想法
答案 0 :(得分:1)
以下是BigQuery Standard SQL
#standardSQL
SELECT customer_id, item_purchased, purchase_date,
(CASE WHEN
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) =
COUNT(*) OVER (PARTITION BY customer_id)
AND SUM(DISTINCT (CASE FORMAT_DATE('%Y%m', purchase_date)
WHEN '201709' THEN 1 WHEN '201710' THEN 2 ELSE 0 END))
OVER(PARTITION BY customer_id) = 3
THEN 1 ELSE 0
END) AS custom_coded
FROM `project.dataset.table`
您可以使用问题中的虚拟数据进行上述测试/播放
#standardSQL
WITH `project.dataset.table` AS (
SELECT 288 customer_id, 'Rice' item_purchased, DATE '2017-09-02' purchase_date UNION ALL
SELECT 288, 'Rice', DATE '2017-09-02' UNION ALL
SELECT 288, 'Rice', DATE '2017-09-06' UNION ALL
SELECT 879, 'Plate', DATE '2017-09-01' UNION ALL
SELECT 879, 'Plate', DATE '2017-09-25' UNION ALL
SELECT 879, 'Plate', DATE '2017-10-25' UNION ALL
SELECT 879, 'Plate', DATE '2017-10-27'
)
SELECT customer_id, item_purchased, purchase_date,
(CASE WHEN
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date ASC) =
COUNT(*) OVER (PARTITION BY customer_id)
AND SUM(DISTINCT (CASE FORMAT_DATE('%Y%m', purchase_date)
WHEN '201709' THEN 1 WHEN '201710' THEN 2 ELSE 0 END))
OVER(PARTITION BY customer_id) = 3
THEN 1 ELSE 0
END) AS custom_coded
FROM `project.dataset.table`
ORDER BY customer_id, purchase_date
结果是
customer_id item_purchased purchase_date custom_coded
288 Rice 2017-09-02 0
288 Rice 2017-09-02 0
288 Rice 2017-09-06 0
879 Plate 2017-09-01 0
879 Plate 2017-09-25 0
879 Plate 2017-10-25 0
879 Plate 2017-10-27 1