我试图在导入大查询的属性中创建一个组的主视图,但似乎是使用不正确的(命中)SQL复制数据导致收入的值不准确等等......
我试着去理解为什么不应该造成这种情况,但我无法弄明白。
SELECT Date, hostname, channelGrouping, sum(transactionRevenue) as Revenue, sum(Shipping) as Shipping, sum(visits) as Sessions, sum(bounces) as Bounces, sum(transactions) as Transactions
FROM
(SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `102674002.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509')
Group By Date, hostname, channelGrouping
Order by Date
答案 0 :(得分:3)
这可能会起到作用:
SELECT
date,
channelGrouping,
SUM(Revenue) Revenue,
SUM(Shipping) Shipping,
SUM(bounces) bounces,
SUM(transactions) transactions,
hostname,
COUNT(date) sessions
FROM(
SELECT
date,
channelGrouping,
totals.totaltransactionrevenue / 1e6 Revenue,
ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL)) hostnames,
(SELECT SUM(hits.transaction.transactionshipping) / 1e6 FROM UNNEST(hits) hits) Shipping,
totals.bounces bounces,
totals.transactions transactions
FROM `project_id.dataset_id.ga_sessions_*`
WHERE 1 = 1
AND ARRAY_LENGTH(ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL))) > 0
AND _TABLE_SUFFIX BETWEEN '20170601' AND '20170609'
UNION ALL
(...)
),
UNNEST(hostnames) hostname
GROUP BY
date, channelGrouping, hostname
请注意,在此查询中,我避免在UNNEST
字段中应用hits
操作,而我只在子选择中执行此操作。
为了理解为什么会出现这种情况,您必须了解如何将ga data聚合到BigQuery中。请注意,我们基本上有两种类型的数据:session
级别数据和hits
级别。访问您网站的每个客户最终会在BigQuery中生成一行,如下所示:
{fullvisitorid: 1, visitid:1, date: '20170601', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:0, bounces: 0}]
如果同一个客户在一天后回来,它会在BQ中生成另一行,例如:
{fullvisitorid: 1, visitid:2, date: '20170602', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:50000000, bounces: 2}]
正如您所看到的,密钥hits
之外的字段与会话级别相关(因此每次点击,即客户在您的网站中进行的每次互动,都会在此处添加另一个条目)。当您应用UNNEST
时,基本上,将cross-join应用于数组内部的所有值到外部字段。
这就是重复发生的地方!
鉴于过去的示例,如果我们将UNNEST
应用于hits
字段,您最终会得到以下内容:
fullvisitorid visitid totals.totalTransactionRevenue hits.hitNumber
1 1 0 1
1 1 0 2
1 2 50000000 1
1 2 50000000 2
请注意,对于hits
字段中的每次匹配,都会导致totals.totalTransactionRevenue
ARRAY内发生的每个hitNumber
的外部字段(例如hits
)重复。 / p>
因此,如果稍后,您应用某些操作,例如SUM(totals.totalTransactionRevenue)
,您最终将此字段相加乘以客户在该visitid
中的每次点击。
我倾向于避免UNNEST
字段上的({1}} hits
操作(left = iter(range(15, 60, 3))
right = iter(range(0, 50, 5))
try:
i = next(left)
j = next(right)
while True:
if abs(i-j) < 1:
print("pair", i, j)
i = next(left)
j = next(right)
elif i <= j:
print("left", i, None)
i = next(left)
else:
print("right", None, j)
j = next(right)
except StopIteration:
pass
# one of the iterators may have leftover elements
for i in left:
print("left", i, None)
for j in right:
print("right", None, j)
操作('right', None, 0)
('right', None, 5)
('right', None, 10)
('pair', 15, 15)
('left', 18, None)
('right', None, 20)
('left', 21, None)
('left', 24, None)
('right', None, 25)
('left', 27, None)
('pair', 30, 30)
('left', 33, None)
('right', None, 35)
('left', 36, None)
('left', 39, None)
('right', None, 40)
('left', 42, None)
('pair', 45, 45)
('left', 51, None)
('left', 54, None)
('left', 57, None)
只在子查询中执行此操作(仅在行中发生取消操作)不重复数据的级别。)