我正在做简单的多渠道归因探索,并且陷入了对用户会话进行分组的问题。
例如,我有简单的会话表:
client channel time converted
1 social 1 0
1 cpc 2 0
1 email 3 1
1 email 4 0
1 cpc 5 1
2 organic 1 0
2 cpc 2 1
3 email 1 0
每行包含用户会话和 converted 列,该列显示用户是否在特定会话中进行了转换。
我需要对引导每个用户和每次转化的转化进行分组,因此完美的结果应该是:
client channels time converted
1 [social,cpc,email] 3 1
1 [email,cpc] 5 1
2 [organic,cpc] 2 1
3 [email] 1 0
通知用户3,他没有被转换,但我需要进行会话
答案 0 :(得分:0)
您需要分配一个组。为此,converted
的反和看起来像是正确的事情:
select client, array_agg(channel order by time) as channels,
max(time) as time, max(converted) as converted
from (select t.*,
sum(t.converted) over (partition by t.client order by t.time desc) as grp
from t
) t
group by client, grp;
答案 1 :(得分:0)
以下是用于BigQuery标准SQL
#standardSQL
SELECT
client,
STRING_AGG(channel ORDER BY time) channels,
MAX(time) time,
MAX(converted) converted
FROM (
SELECT *, COUNTIF(converted = 1) OVER(PARTITION BY client ORDER BY time DESC) session
FROM `project.dataset.table`
)
GROUP BY client, session
-- ORDER BY client, time
您可以使用问题中的示例数据来测试,玩游戏,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 client, 'social' channel, 1 time, 0 converted UNION ALL
SELECT 1, 'cpc', 2, 0 UNION ALL
SELECT 1, 'email', 3, 1 UNION ALL
SELECT 1, 'email', 4, 0 UNION ALL
SELECT 1, 'cpc', 5, 1 UNION ALL
SELECT 2, 'organic', 1, 0 UNION ALL
SELECT 2, 'cpc', 2, 1 UNION ALL
SELECT 3, 'email', 1, 0
)
SELECT
client,
STRING_AGG(channel ORDER BY time) channels,
MAX(time) time,
MAX(converted) converted
FROM (
SELECT *, COUNTIF(converted = 1) OVER(PARTITION BY client ORDER BY time DESC) session
FROM `project.dataset.table`
)
GROUP BY client, session
ORDER BY client, time
有结果
Row client channels time converted
1 1 social,cpc,email 3 1
2 1 email,cpc 5 1
3 2 organic,cpc 2 1
4 3 email 1 0