BigQuery连接三个表

时间:2016-09-20 15:17:00

标签: sql google-bigquery

我正在尝试在BigQuery中加入三个表;表1记录了一个事件(即每行是一个记录),表2记录了第二个事件,表3记录了类别名称。

我想根据类别和设备平台生成一个具有表1和表2计数的最终表。但是,每次运行此操作时,我都会收到一条错误,指出 joined.t3.category不是连接中任何一个表的字段

这是我目前的代码:

Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category

from 

(

SELECT 
        Id,
        Origin,
        CatId

    FROM [testing.table_1] as t1

JOIN (SELECT category,
            CategoryID

FROM [testing.table_3]) as t3

on t1.CatId = t3.CategoryID

) AS joined

JOIN (SELECT Id,
            CategoryId

FROM [testing.table_2]) as t2

ON (joined.t1.CatId = t2.CategoryId)    

Group by platform,category;

作为参考,这里是表1和表2之间更简单的连接,可以很好地工作:

Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform

from testing.table_1 as t1

JOIN testing.table_2 as t2

on t1.CatId = t2.CategoryId

Group by platform;

3 个答案:

答案 0 :(得分:1)

您可以尝试使用standard SQL代替查询吗?它可以更好地处理别名,COUNT(DISTINCT ...)将为您提供精确的结果,而不是像遗留SQL中的近似值。如果它有帮助,您应该对查询进行的唯一更改是使用反引号来转义表名而不是括号。例如:

SELECT
  COUNT(DISTINCT joined.t1.Id) as t1_events,
  COUNT(DISTINCT t2.Id) as t2_events,
  joined.t1.Origin as platform,
  joined.t3.category as category
FROM (
  SELECT 
    Id,
    Origin,
    CatId
  FROM `testing.table_1` AS t1
  JOIN (
    SELECT
      category,
      CategoryID
    FROM `testing.table_3`
  ) AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN (
  SELECT
    Id,
    CategoryId
  FROM `testing.table_2`
) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category;

答案 1 :(得分:1)

简单的解决方法是在第一个内部category添加SELECT字段 - 否则最外层SELECT不可见 - 因此错误!那就是问题!

此外,在BigQuery Legacy SQL中,您可以使用EXACT_COUNT_DISTINCT,否则您将获得统计近似值 - 请参阅COUNT([DISTINCT])

因此,对于旧版SQL,您的查询可能如下所示:

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

此外,我觉得你可以进一步简化它(假设没有模棱两可的字段)

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN [testing.table_3] AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

当然,如果你要使用它的标准SQL版本,你将需要做同样的修复(正如Elliott建议的那样:

SELECT
  COUNT(DISTINCT joined.t1.Id) AS t1_events,
  COUNT(DISTINCT t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT 
    Id, Origin, CatId, category
  FROM `testing.table_1` AS t1
  JOIN `testing.table_3` AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category 

答案 2 :(得分:0)

我不知道google-bigquery,但我的SQL知识告诉我在列名前有两个别名会导致问题。尝试在t之后移除joined - 别名,例如使用joined.category而不是joined.t3.category