我在BigQuery上有一个表格,其中包含以下列:
user_id visit_date referral transaction
1234 20180101 site2 0
1234 20180102 site3 1
1234 20180103 site2 1
4567 20180104 site4 0
4567 20180105 site5 0
5678 20180101 site2 0
5768 20180102 site3 1
我的目标是使表具有以下格式:
path transactions
site2 > site3 2
site2 1
site4 > site5 0
我不明白的是,如何为在同一时间段内发生多次转化的用户“重置”路径,就像user_id = 1234一样。
到目前为止,我设法使用以下查询,但这不是所需的输出。
SELECT
referral_path,
SUM(transactions) AS transactions
FROM (
SELECT
user_id,
STRING_AGG(DISTINCT(referral), ',') AS referral_path,
MAX(transactions) AS transactions
FROM (
SELECT
user_id,
referral,
transactions
FROM
table
ORDER BY
user_id )a
GROUP BY
user_id )b
GROUP BY
referral_path
ORDER BY
transactions DESC
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
#standardSQL
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
您可以使用下面的问题中的虚拟数据进行测试,操作
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1234 user_id, '20180101' visit_date, 'site2' referral, 0 transaction UNION ALL
SELECT 1234, '20180102', 'site3', 1 UNION ALL
SELECT 1234, '20180103', 'site2', 1 UNION ALL
SELECT 4567, '20180104', 'site4', 0 UNION ALL
SELECT 4567, '20180105', 'site5', 0 UNION ALL
SELECT 5678, '20180101', 'site2', 0 UNION ALL
SELECT 5678, '20180102', 'site3', 1
)
SELECT
path,
SUM(transaction) transactions
FROM (
SELECT
STRING_AGG(referral, ' > ') path,
SUM(transaction) transaction
FROM (
SELECT
user_id, visit_date, referral, transaction,
IFNULL(SUM(transaction) OVER(PARTITION BY user_id ORDER BY visit_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) grp
FROM `project.dataset.table`
)
GROUP BY user_id, grp
)
GROUP BY path
ORDER BY transactions DESC
有结果
Row path transactions
1 site2 > site3 2
2 site2 1
3 site4 > site5 0