BigQuery UNNEST重复值

时间:2018-06-27 16:00:57

标签: google-bigquery

我正在尝试创建Google Analytics(分析)数据的摘要:按小时和按来源属性(在我的情况下为服务)分类的会话,交易和转化率。我正在从汇总属性数据集中查询名为“ ga_realtime_view”的表。这是从“ ga_realtime_sessions_”表创建的虚拟视图,可让我们使用标准SQL。

为了具有服务列,我必须使用UNNEST操作。但是,当我这样做时,它将复制所有会话和事务值。

以下是查询:

SELECT
EXTRACT(HOUR FROM TIMESTAMP_SECONDS(visitStartTime) AT TIME ZONE 'Europe/Paris') AS Hour,
hits.sourcePropertyInfo.sourcePropertyDisplayName AS service,
IFNULL(SUM(totals.visits),0) as sessions,
IFNULL(SUM(totals.transactions),0) as transactions,
IFNULL(ROUND((SUM(totals.transactions)/SUM(totals.visits))*100,2),0) AS conversionRate
FROM `XX.ga_realtime_view` AS session, UNNEST(session.hits) AS hits
GROUP BY
Hour,
service
ORDER BY
Hour

我知道关于此重复问题还有其他主题,但是我似乎找不到适合我的情况的解决方案。

感谢您的帮助

1 个答案:

答案 0 :(得分:0)

听起来好像您要避免将表与数组连接(通过, UNNEST(session.hits)),因为这样做会导致所有总数重复。不过,尚不清楚您想对服务名进行什么处理:如果有多个服务名,您是否要返回所有这些服务的数组?这是一次方法:

SELECT
EXTRACT(HOUR FROM TIMESTAMP_SECONDS(visitStartTime) AT TIME ZONE 'Europe/Paris') AS Hour,
ARRAY(
   SELECT sourcePropertyInfo.sourcePropertyDisplayName
   FROM UNNEST(session.hits) AS hits
) AS service,
IFNULL(SUM(totals.visits),0) as sessions,
IFNULL(SUM(totals.transactions),0) as transactions,
IFNULL(ROUND((SUM(totals.transactions)/SUM(totals.visits))*100,2),0) AS conversionRate
FROM `XX.ga_realtime_view` AS session
GROUP BY
Hour,
service
ORDER BY
Hour

不过,您会注意到,查询现在给出一个错误,您无法按数组分组。如果您希望在匹配中仅提供一种服务,则可以仅提取其中一种:

SELECT
EXTRACT(HOUR FROM TIMESTAMP_SECONDS(visitStartTime) AT TIME ZONE 'Europe/Paris') AS Hour,
(SELECT MAX(sourcePropertyInfo.sourcePropertyDisplayName)
 FROM UNNEST(session.hits) AS hits) AS service,
IFNULL(SUM(totals.visits),0) as sessions,
IFNULL(SUM(totals.transactions),0) as transactions,
IFNULL(ROUND((SUM(totals.transactions)/SUM(totals.visits))*100,2),0) AS conversionRate
FROM `XX.ga_realtime_view` AS session
GROUP BY
Hour,
service
ORDER BY
Hour