将命中率归一化到会话级数据的最有效方法是什么?

时间:2017-09-15 10:58:33

标签: sql google-analytics google-bigquery

我正在使用标准SQL分析Google BigQuery中的Google Analytics数据。

我经常遇到想要将命中级别信息汇总到会话级别指标的情况。这涉及拖曳无人打击的命中并过滤相关信息。

对于预订流程分析的示例,您最终会得到一个命中级别的数据表,其内容如下:

客户,bookingStep

因此我需要将其归结为这种模式:

客户,第1步,第2步等......

以下是我目前使用的那种方法:

WITH normalised AS (
  # data created from trawling through hit level GA page views and selecting relevant rows
  SELECT 1 AS customer, 'step1' AS step UNION ALL 
  SELECT 1, 'step1' UNION ALL 
  SELECT 1, 'step2' UNION ALL 
  SELECT 1, 'step3' UNION ALL 
  SELECT 1, 'step4' UNION ALL 
  SELECT 1, 'step5' UNION ALL 
  SELECT 2, 'step1' UNION ALL 
  SELECT 2, 'step2' UNION ALL 
  SELECT 2, 'step3' UNION ALL 
  SELECT 2, 'step4' UNION ALL 
  SELECT 3, 'step1' UNION ALL 
  SELECT 3, 'step2' UNION ALL 
  SELECT 3, 'step3' UNION ALL 
  SELECT 3, 'step4' UNION ALL 
  SELECT 4, 'step1' UNION ALL 
  SELECT 4, 'step2' UNION ALL 
  SELECT 4, 'step3' UNION ALL 
  SELECT 4, 'step4' UNION ALL 
  SELECT 5, 'step1' UNION ALL 
  SELECT 5, 'step2' UNION ALL 
  SELECT 5, 'step3' UNION ALL 
  SELECT 6, 'step1' UNION ALL 
  SELECT 6, 'step2' UNION ALL 
  SELECT 7, 'step1'
)

查询:

/* aggregate to remove duplicate entries */
SELECT
  customer,
  CASE WHEN SUM(step1) > 0 THEN 1 ELSE 0 END AS step1,
  CASE WHEN SUM(step2) > 0 THEN 1 ELSE 0 END AS step2
  # for each step
FROM (
  /* denormalise into multiple fields */
  SELECT DISTINCT
    customer, 
    CASE WHEN step = 'step1' THEN 1 ELSE 0 END AS step1,
    CASE WHEN step = 'step2' THEN 1 ELSE 0 END AS step2
    # for each step
  FROM normalised
)
GROUP BY customer
ORDER BY customer ASC

是否有更好,更有效的方法可以做到这一点?我的解决方案似乎有效,但考虑到涉及的代码量,我不禁想到可能会有更简洁的方法。

2 个答案:

答案 0 :(得分:1)

我认为你可以一步完成你想做的事情:

SELECT customer, 
       MAX(CASE WHEN step = 'step1' THEN 1 ELSE 0 END) AS step1,
       MAX(CASE WHEN step = 'step2' THEN 1 ELSE 0 END) AS step2
FROM normalised
GROUP BY customer;

答案 1 :(得分:0)

下面是BigQuery Standard SQL的一个不那么冗长的版本

  
#standardSQL
SELECT 
  customer, 
  SIGN(COUNTIF(step = 'step1')) AS step1,
  SIGN(COUNTIF(step = 'step2')) AS step2
FROM normalised
GROUP BY customer
ORDER BY customer ASC  

同时,对于许多实际用法,您很可能需要在此输出之上应用更多处理,这不会像您希望的那样灵活 - 因为硬编码的列名称(假设您实际上有超过2或5或者只是想变得更有活力)

我建议使用下面的数组来考虑“非规范化” - 根据我的经验,它为您提供了进一步处理的更大灵活性

#standardSQL
SELECT 
  customer, 
  ARRAY_AGG(DISTINCT step ORDER BY step) AS steps
FROM normalised
GROUP BY customer
ORDER BY customer ASC   

结果是:

customer    steps    
--------    -----
1           step1    
            step2    
            step3    
            step4    
            step5    
2           step1    
            step2    
            step3    
            step4    
...         ....

Ror在许多情况下非规范化为字符串也有效

#standardSQL
SELECT customer, 
       STRING_AGG(DISTINCT step ORDER BY step) AS steps
FROM normalised
GROUP BY customer
ORDER BY customer ASC   

输出简单如下:

customer    steps    
--------    -----
1           step1,step2,step3,step4,step5    
2           step1,step2,step3,step4  
3           step1,step2,step3,step4  
4           step1,step2,step3,step4  
5           step1,step2,step3    
6           step1,step2  
7           step1