我尝试在我已链接到bigquery的Google Analytics易拉语属性中显示源属性名称。问题是,当我尝试下面的一些指标变得非常膨胀。我猜测这与重复的字段有关,但不知道如何处理它。我尝试了很多解决方法,例如使用" max"但是这并没有显示每个属性名称。
除了用户和访问之外的所有指标似乎都被夸大了。
SELECT
date,
MAX(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640')) THEN 'MUG'
WHEN EXISTS (
SELECT
1
FROM
UNNEST(hits) hits
WHERE
hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(cast(visitId AS STRING),fullVisitorId)) as visits,
sum(totals.visits) as visits2,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName as display,
SUM((
SELECT
SUM(latencyTracking.pageLoadTime)
FROM
UNNEST(hits)
WHERE
page.pagePath = '/' ))/SUM((
SELECT
SUM(latencyTracking.pageLoadSample)
FROM
UNNEST(hits)
WHERE
page.pagePath = '/')) AS pageloadspeed,
SUM(totals.newVisits) AS new_,
SUM(totals.screenviews) AS PAGEVIEWS,
SUM(totals.bounces) AS BOUNCES,
sum(CASE
WHEN device.isMobile = TRUE THEN (totals.visits)
ELSE 0 END) mobilevisits,
SUM(CASE
WHEN trafficSource.medium = 'organic' THEN (totals.visits)
ELSE 0 END) organicvisits,
SUM(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
ELSE 0 END) AS NewRegistrations,
SUM(CASE
WHEN EXISTS( SELECT 1 FROM UNNEST(hits) hits WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
ELSE 0 END) AS ClickToBuy,
SUM(totals.transactions) AS Transactions
FROM
`project.dataset.ga_sessions_*`, UNNEST(hits) as h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01')
AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display
ORDER BY
visits DESC;
编辑:
我试过简单地从FROM子句中删除UNNEST(HITS)命中,这给了我以下错误:
错误:无法访问类型为ARRAY>的值的字段sourcePropertyInfo在[16:14]
我还尝试在子查询中使用它,如下所示:
(select h.sourcePropertyInfo.sourcePropertyDisplayName from unnest(hits) h) as displayname,
并收到错误:
标量子查询产生了多个元素
答案 0 :(得分:1)
你在最外面的FROM语句中展平你的表格(即这里:
FROM
project.dataset.ga_sessions_*
,UNNEST(点击)h )
所有会话级别维度,例如device。*或totals。* totals.transactions等值已经累积到会话级别,因此当您通过取消匹配来展平表格时,这些总计值会被写入多次有点击。例: 让我们说一次会话中有30次点击和2次交易,因为你压扁/取消你的点击,你将留下包含totals.transactions = 2的30行,所以当你总结它们时,结果将是本次会议共有60笔交易。 您的用户和访问不会因此而受到影响,因为您会将其区分开来,因此任何欺骗都会被淘汰。
如果您删除或修改此行
,只要删除,UNNEST(匹配)为,我就会认为您的查询是否有效h.sourcePropertyInfo.sourcePropertyDisplayName as display
因为除了这一特定行之外,你已经在select语句中删除了所需的命中。
答案 1 :(得分:1)
由于您需要在命中级别计算多个值,因此可能需要删除字段 hits 是最佳方法。缺点是您丢失了会话级别的总计字段聚合,但仍然可以解决它。
举个例子:
SELECT
date,
CASE
WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
END AS Property,
geoNetwork.country AS Country,
COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
COUNT(DISTINCT(fullVisitorId)) AS Users,
h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
COUNT(DISTINCT
CASE
WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
COUNT(CASE
WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
SUM(CASE
WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
COUNT(DISTINCT (CASE
WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
COUNT(DISTINCT (CASE
WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
SUM(CASE
WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
COUNT(h.transaction.transactionid) transactions
FROM
`project_id.dataset_id.ga_sessions_*`,
UNNEST(hits) AS h
WHERE
1 = 1
AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
GROUP BY
date,
Country,
display,
Property
我针对我们的数据集运行它,它似乎正在工作。我做了一些改变:
MAX
操作,并将其添加到组中。答案 2 :(得分:0)
使用:
- MAX(totals.screenviews)AS PAGEVIEWS,
- MAX(totals.bounces)AS BOUNCES,
- MAX(totals.transactions)AS Transactions
- ...
- ...
而不是:
- SUM(totals.screenviews)AS PAGEVIEWS,
- SUM(totals.bounces)AS BOUNCES,
- SUM(totals.transactions)AS Transactions
这应该可以部分解决您的问题。让我知道它是怎么回事?
答案 3 :(得分:0)
我认为在William Fuks的查询中计算出的跳出率要高得多的原因如下
(h.isentrance = TRUE和h.isexit = TRUE)然后1 END)弹起时
似乎isEntrance和isExit仅在PAGE命中时发生,因此不考虑事件。因此,跳出次数过多是由于单页浏览可能导致页面上发生了一个或多个交互事件。