Google Analytics:时间间隔内的转化率

时间:2016-06-10 14:55:23

标签: google-analytics google-bigquery

我已在网站上运行Google Analytics,现在正尝试在特定时间间隔内确定转化率。因此,我有一个包含

的表
  • interval_id
  • i.interval_start_time_utc
  • i.interval_stop_time_utc

可悲的是,以下将每个订单分配给某个间隔的BigQuery查询将不起作用:

SELECT
totals.transactions,
totals.visits,
i.interval_id
FROM [123456.ga_sessions_20160609]
INNER JOIN intervals i ON i.interval_start_time_utc < visitStartTime AND visitStartTime < i.interval_end_time_utc

这会引发错误

ON clause must be AND of = comparisons of one field name from each table [...]

所以我认为BigQuery根本不做范围连接。还有另一种方法可以做到这一点,不做完全加入,然后削减?对于这类事情,有完全不同的,更好的方法吗?

2 个答案:

答案 0 :(得分:1)

BigQuery Standard SQL没有此限制 - 请参阅Enabling Standard SQL

如果您想使用BigQuery Legacy SQL - 请尝试下面的内容

SELECT
  totals.transactions,
  totals.visits,
  i.interval_id
FROM [123456.ga_sessions_20160609]
CROSS JOIN intervals i 
WHERE i.interval_start_time_utc < visitStartTime 
AND visitStartTime < i.interval_end_time_utc

答案 1 :(得分:1)

为了呈现想法 - 让我们简化示例
让我们记住 - 我们确实希望使用BigQuery Legacy SQL - 而不是标准SqL,它是微不足道的!

  

挑战

假设我们有visits表:

SELECT visit_time FROM 
  (SELECT 2 AS visit_time),
  (SELECT 12 AS visit_time),
  (SELECT 22 AS visit_time),
  (SELECT 32 AS visit_time)

intervals表:

SELECT before, after, event FROM 
  (SELECT 1 AS before, 5 AS after, 3 AS event),
  (SELECT 6 AS before, 10 AS after, 8 AS event),
  (SELECT 21 AS before, 25 AS after, 23 AS event),
  (SELECT 33 AS before, 37 AS after, 35 AS event)

我们希望提取事件beforeafter

内的所有访问次数

这可以通过使用CROSS JOIN完成,如下所示:

SELECT
  visit_time, event, before, after
FROM (
  SELECT visit_time FROM 
    (SELECT 2 AS visit_time),
    (SELECT 12 AS visit_time),
    (SELECT 22 AS visit_time),
    (SELECT 32 AS visit_time),
) AS visits
CROSS JOIN (
  SELECT before, after, event FROM 
    (SELECT 1 AS before, 5 AS after, 3 AS event),
    (SELECT 6 AS before, 10 AS after, 8 AS event),
    (SELECT 21 AS before, 25 AS after, 23 AS event),
    (SELECT 33 AS before, 37 AS after, 35 AS event)
) AS intervals
WHERE visit_time BETWEEN before AND after

结果为:

visit_time  event   before  after    
2           3       1       5    
22          23      21      25  
  

潜在问题

当两个表都足够大时 - 这个交叉连接变得相当昂贵!

  

提示

碰巧(来自用户的评论) - 间隔始终是事件左右两侧的x个单位。

  

解决方案

以下是使用提示/事实的建议解决方案/选项,并在两个大表之间使用JOIN而不是CROSS JOIN

这里的关键是生成(动态)新表,该表将根据事件和x

保存所有可能的间隔值
SELECT event, event + delta AS point 
FROM (
  SELECT event FROM
    (SELECT 1 AS before, 5 AS after, 3 AS event),
    (SELECT 6 AS before, 10 AS after, 8 AS event),
    (SELECT 21 AS before, 25 AS after, 23 AS event),
    (SELECT 33 AS before, 37 AS after, 35 AS event)
) AS events
CROSS JOIN (
  SELECT pos - 1 - 2 AS delta FROM (
       SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
       SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
  )))   
) AS deltas

在上面的代码中x = 2 - 但你可以在两个地方改变它,例如如果x = 5你应该有

SELECT pos - 1 - 5 AS delta FROM (
     SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + 2 * 5, '.'),'') AS h FROM (SELECT NULL)),h
)))   

以上代码中的CROSS JOIN很便宜,因为增量表非常小

所以,最后,你可以在下面找到你的结果:

SELECT
  visit_time, event 
FROM (
  SELECT visit_time FROM 
    (SELECT 2 AS visit_time),
    (SELECT 12 AS visit_time),
    (SELECT 22 AS visit_time),
    (SELECT 32 AS visit_time),
) AS visits
JOIN (
  SELECT event, event + delta AS point 
  FROM (
    SELECT event FROM
      (SELECT 1 AS before, 5 AS after, 3 AS event),
      (SELECT 6 AS before, 10 AS after, 8 AS event),
      (SELECT 21 AS before, 25 AS after, 23 AS event),
      (SELECT 33 AS before, 37 AS after, 35 AS event)
  ) AS events
  CROSS JOIN (
    SELECT pos - 1 - 2 AS delta FROM (
         SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
    )))   
  ) AS deltas
) AS points
ON points.point = visits.visit_time

预期结果

visit_time  event    
2           3    
22          23  

我认为上述方法可以为您服务 - 但您确定需要将其应用于您的特定情况 我认为如果您将所有相关时间都计算到相应的分钟

,这可以相对容易地完成

希望这会有所帮助 如果您能得到这项工作,请与我们分享结果:o)