我已在网站上运行Google Analytics,现在正尝试在特定时间间隔内确定转化率。因此,我有一个包含
的表interval_id
i.interval_start_time_utc
i.interval_stop_time_utc
可悲的是,以下将每个订单分配给某个间隔的BigQuery查询将不起作用:
SELECT
totals.transactions,
totals.visits,
i.interval_id
FROM [123456.ga_sessions_20160609]
INNER JOIN intervals i ON i.interval_start_time_utc < visitStartTime AND visitStartTime < i.interval_end_time_utc
这会引发错误
ON clause must be AND of = comparisons of one field name from each table [...]
所以我认为BigQuery根本不做范围连接。还有另一种方法可以做到这一点,不做完全加入,然后削减?对于这类事情,有完全不同的,更好的方法吗?
答案 0 :(得分:1)
BigQuery Standard SQL没有此限制 - 请参阅Enabling Standard SQL
如果您想使用BigQuery Legacy SQL - 请尝试下面的内容
SELECT
totals.transactions,
totals.visits,
i.interval_id
FROM [123456.ga_sessions_20160609]
CROSS JOIN intervals i
WHERE i.interval_start_time_utc < visitStartTime
AND visitStartTime < i.interval_end_time_utc
答案 1 :(得分:1)
为了呈现想法 - 让我们简化示例
让我们记住 - 我们确实希望使用BigQuery Legacy SQL - 而不是标准SqL,它是微不足道的!
挑战
假设我们有visits
表:
SELECT visit_time FROM
(SELECT 2 AS visit_time),
(SELECT 12 AS visit_time),
(SELECT 22 AS visit_time),
(SELECT 32 AS visit_time)
和intervals
表:
SELECT before, after, event FROM
(SELECT 1 AS before, 5 AS after, 3 AS event),
(SELECT 6 AS before, 10 AS after, 8 AS event),
(SELECT 21 AS before, 25 AS after, 23 AS event),
(SELECT 33 AS before, 37 AS after, 35 AS event)
我们希望提取事件before
和after
值
这可以通过使用CROSS JOIN
完成,如下所示:
SELECT
visit_time, event, before, after
FROM (
SELECT visit_time FROM
(SELECT 2 AS visit_time),
(SELECT 12 AS visit_time),
(SELECT 22 AS visit_time),
(SELECT 32 AS visit_time),
) AS visits
CROSS JOIN (
SELECT before, after, event FROM
(SELECT 1 AS before, 5 AS after, 3 AS event),
(SELECT 6 AS before, 10 AS after, 8 AS event),
(SELECT 21 AS before, 25 AS after, 23 AS event),
(SELECT 33 AS before, 37 AS after, 35 AS event)
) AS intervals
WHERE visit_time BETWEEN before AND after
结果为:
visit_time event before after
2 3 1 5
22 23 21 25
潜在问题
当两个表都足够大时 - 这个交叉连接变得相当昂贵!
提示
碰巧(来自用户的评论) - 间隔始终是事件左右两侧的x个单位。
解决方案
以下是使用提示/事实的建议解决方案/选项,并在两个大表之间使用JOIN
而不是CROSS JOIN
这里的关键是生成(动态)新表,该表将根据事件和x
保存所有可能的间隔值SELECT event, event + delta AS point
FROM (
SELECT event FROM
(SELECT 1 AS before, 5 AS after, 3 AS event),
(SELECT 6 AS before, 10 AS after, 8 AS event),
(SELECT 21 AS before, 25 AS after, 23 AS event),
(SELECT 33 AS before, 37 AS after, 35 AS event)
) AS events
CROSS JOIN (
SELECT pos - 1 - 2 AS delta FROM (
SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
)))
) AS deltas
在上面的代码中x = 2 - 但你可以在两个地方改变它,例如如果x = 5你应该有
SELECT pos - 1 - 5 AS delta FROM (
SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + 2 * 5, '.'),'') AS h FROM (SELECT NULL)),h
)))
以上代码中的CROSS JOIN很便宜,因为增量表非常小
所以,最后,你可以在下面找到你的结果:
SELECT
visit_time, event
FROM (
SELECT visit_time FROM
(SELECT 2 AS visit_time),
(SELECT 12 AS visit_time),
(SELECT 22 AS visit_time),
(SELECT 32 AS visit_time),
) AS visits
JOIN (
SELECT event, event + delta AS point
FROM (
SELECT event FROM
(SELECT 1 AS before, 5 AS after, 3 AS event),
(SELECT 6 AS before, 10 AS after, 8 AS event),
(SELECT 21 AS before, 25 AS after, 23 AS event),
(SELECT 33 AS before, 37 AS after, 35 AS event)
) AS events
CROSS JOIN (
SELECT pos - 1 - 2 AS delta FROM (
SELECT ROW_NUMBER() OVER() AS pos, * FROM (FLATTEN((
SELECT SPLIT(RPAD('', 1 + 2 * 2, '.'),'') AS h FROM (SELECT NULL)),h
)))
) AS deltas
) AS points
ON points.point = visits.visit_time
预期结果
visit_time event
2 3
22 23
我认为上述方法可以为您服务 - 但您确定需要将其应用于您的特定情况 我认为如果您将所有相关时间都计算到相应的分钟
,这可以相对容易地完成希望这会有所帮助 如果您能得到这项工作,请与我们分享结果:o)