我正在尝试在BigQuery中加入三个表;表1记录了一个事件(即每行是一个记录),表2记录了第二个事件,表3记录了类别名称。
我想根据类别和设备平台生成一个具有表1和表2计数的最终表。但是,每次运行此操作时,我都会收到一条错误,指出 joined.t3.category不是连接中任何一个表的字段。
这是我目前的代码:
Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category
from
(
SELECT
Id,
Origin,
CatId
FROM [testing.table_1] as t1
JOIN (SELECT category,
CategoryID
FROM [testing.table_3]) as t3
on t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id,
CategoryId
FROM [testing.table_2]) as t2
ON (joined.t1.CatId = t2.CategoryId)
Group by platform,category;
作为参考,这里是表1和表2之间更简单的连接,可以很好地工作:
Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform
from testing.table_1 as t1
JOIN testing.table_2 as t2
on t1.CatId = t2.CategoryId
Group by platform;
答案 0 :(得分:1)
您可以尝试使用standard SQL代替查询吗?它可以更好地处理别名,COUNT(DISTINCT ...)
将为您提供精确的结果,而不是像遗留SQL中的近似值。如果它有帮助,您应该对查询进行的唯一更改是使用反引号来转义表名而不是括号。例如:
SELECT
COUNT(DISTINCT joined.t1.Id) as t1_events,
COUNT(DISTINCT t2.Id) as t2_events,
joined.t1.Origin as platform,
joined.t3.category as category
FROM (
SELECT
Id,
Origin,
CatId
FROM `testing.table_1` AS t1
JOIN (
SELECT
category,
CategoryID
FROM `testing.table_3`
) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (
SELECT
Id,
CategoryId
FROM `testing.table_2`
) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category;
答案 1 :(得分:1)
简单的解决方法是在第一个内部category
添加SELECT
字段 - 否则最外层SELECT
不可见 - 因此错误!那就是问题!
此外,在BigQuery Legacy SQL中,您可以使用EXACT_COUNT_DISTINCT,否则您将获得统计近似值 - 请参阅COUNT([DISTINCT])
因此,对于旧版SQL,您的查询可能如下所示:
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
此外,我觉得你可以进一步简化它(假设没有模棱两可的字段)
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN [testing.table_3] AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
当然,如果你要使用它的标准SQL版本,你将需要做同样的修复(正如Elliott建议的那样:
SELECT
COUNT(DISTINCT joined.t1.Id) AS t1_events,
COUNT(DISTINCT t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM `testing.table_1` AS t1
JOIN `testing.table_3` AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
答案 2 :(得分:0)
我不知道google-bigquery,但我的SQL知识告诉我在列名前有两个别名会导致问题。尝试在t
之后移除joined
- 别名,例如使用joined.category
而不是joined.t3.category
。