我正在使用标准SQL分析Google BigQuery中的Google Analytics数据。
我经常遇到想要将命中级别信息汇总到会话级别指标的情况。这涉及拖曳无人打击的命中并过滤相关信息。
对于预订流程分析的示例,您最终会得到一个命中级别的数据表,其内容如下:
客户,bookingStep
因此我需要将其归结为这种模式:
客户,第1步,第2步等......
以下是我目前使用的那种方法:
WITH normalised AS (
# data created from trawling through hit level GA page views and selecting relevant rows
SELECT 1 AS customer, 'step1' AS step UNION ALL
SELECT 1, 'step1' UNION ALL
SELECT 1, 'step2' UNION ALL
SELECT 1, 'step3' UNION ALL
SELECT 1, 'step4' UNION ALL
SELECT 1, 'step5' UNION ALL
SELECT 2, 'step1' UNION ALL
SELECT 2, 'step2' UNION ALL
SELECT 2, 'step3' UNION ALL
SELECT 2, 'step4' UNION ALL
SELECT 3, 'step1' UNION ALL
SELECT 3, 'step2' UNION ALL
SELECT 3, 'step3' UNION ALL
SELECT 3, 'step4' UNION ALL
SELECT 4, 'step1' UNION ALL
SELECT 4, 'step2' UNION ALL
SELECT 4, 'step3' UNION ALL
SELECT 4, 'step4' UNION ALL
SELECT 5, 'step1' UNION ALL
SELECT 5, 'step2' UNION ALL
SELECT 5, 'step3' UNION ALL
SELECT 6, 'step1' UNION ALL
SELECT 6, 'step2' UNION ALL
SELECT 7, 'step1'
)
查询:
/* aggregate to remove duplicate entries */
SELECT
customer,
CASE WHEN SUM(step1) > 0 THEN 1 ELSE 0 END AS step1,
CASE WHEN SUM(step2) > 0 THEN 1 ELSE 0 END AS step2
# for each step
FROM (
/* denormalise into multiple fields */
SELECT DISTINCT
customer,
CASE WHEN step = 'step1' THEN 1 ELSE 0 END AS step1,
CASE WHEN step = 'step2' THEN 1 ELSE 0 END AS step2
# for each step
FROM normalised
)
GROUP BY customer
ORDER BY customer ASC
是否有更好,更有效的方法可以做到这一点?我的解决方案似乎有效,但考虑到涉及的代码量,我不禁想到可能会有更简洁的方法。
答案 0 :(得分:1)
我认为你可以一步完成你想做的事情:
SELECT customer,
MAX(CASE WHEN step = 'step1' THEN 1 ELSE 0 END) AS step1,
MAX(CASE WHEN step = 'step2' THEN 1 ELSE 0 END) AS step2
FROM normalised
GROUP BY customer;
答案 1 :(得分:0)
下面是BigQuery Standard SQL的一个不那么冗长的版本
#standardSQL
SELECT
customer,
SIGN(COUNTIF(step = 'step1')) AS step1,
SIGN(COUNTIF(step = 'step2')) AS step2
FROM normalised
GROUP BY customer
ORDER BY customer ASC
同时,对于许多实际用法,您很可能需要在此输出之上应用更多处理,这不会像您希望的那样灵活 - 因为硬编码的列名称(假设您实际上有超过2或5或者只是想变得更有活力)
我建议使用下面的数组来考虑“非规范化” - 根据我的经验,它为您提供了进一步处理的更大灵活性
#standardSQL
SELECT
customer,
ARRAY_AGG(DISTINCT step ORDER BY step) AS steps
FROM normalised
GROUP BY customer
ORDER BY customer ASC
结果是:
customer steps
-------- -----
1 step1
step2
step3
step4
step5
2 step1
step2
step3
step4
... ....
Ror在许多情况下非规范化为字符串也有效
#standardSQL
SELECT customer,
STRING_AGG(DISTINCT step ORDER BY step) AS steps
FROM normalised
GROUP BY customer
ORDER BY customer ASC
输出简单如下:
customer steps
-------- -----
1 step1,step2,step3,step4,step5
2 step1,step2,step3,step4
3 step1,step2,step3,step4
4 step1,step2,step3,step4
5 step1,step2,step3
6 step1,step2
7 step1