大查询:不明确的列名称

时间:2018-03-05 13:02:00

标签: sql google-bigquery reddit legacy-sql

好的,我是SQL和Big Query的新手,并且出现了一个模糊的列名错误。我已经检查了堆栈溢出的其他答案,但无法找到/理解我的问题的答案。所以我得到: 错误:2.40 - 2.68:不明确的列名subreddit。

对于这段代码(我改编自另一个人对类似事物的分析):

#legacySQL

    # Creating list of number of users who authored at least 10 posts in pairs of subreddits: 
    SELECT t1.subreddit, t2.subreddit, SUM(1) as NumOverlaps
    FROM (SELECT subreddit, author, COUNT(1) as cnt 
         FROM (TABLE_QUERY([fh-bigquery:reddit_comments],
     'table_id CONTAINS "2017_" AND length(table_id) >= 5'))
         GROUP BY subreddit, author HAVING cnt > 5) t1

    JOIN (SELECT subreddit, author, COUNT(1) as cnt 
         FROM(TABLE_QUERY([fh-bigquery:reddit_comments],
     'table_id CONTAINS "2017_" AND length(table_id) >= 5'))
         GROUP BY subreddit, author HAVING cnt > 5) t2

    ON t1.author=t2.author
    WHERE t1.subreddit!=t2.subreddit
    GROUP BY t1.subreddit, t2.subreddit

感谢您的帮助!

2 个答案:

答案 0 :(得分:0)

您获得的错误必须有一个暂时原因(可能是BigQuery缓存问题),因为我成功运行了允许大结果,启用展平结果并指定目标表的相同查询。 但是,结果并不正确,因为列出了所有的subreddits。原因必须是使用连接2个相同表的Join子句(这也可能触发了模糊的列名错误),并且BigQuery必须进行交叉连接,因此会产生相乘的结果。

我建议您创建一个包含目标公共数据集结果的表,如下所示:

SELECT subreddit, author, COUNT(1) as cnt FROM(TABLE_QUERY([fh-bigquery:reddit_comments], 'table_id CONTAINS "2017_" AND length(table_id) >= 5')) GROUP BY subreddit, author HAVING cnt > 5'

然后应用另一个查询来仅使用一个表(查询公共数据集产生的结果)获得所需结果。 建议optimize您的查询和avoid SQL anti-patterns

答案 1 :(得分:0)

您的select语句SELECT t1.subreddit, t2.subreddit, SUM(1) as NumOverlaps在输出中引入了三个字段 - 前两个字段将具有相同的名称 - subreddit因此错误消息Ambiguous column name subreddit.

要解决歧义,只需使用下面的例子中的别名 SELECT t1.subreddit as t1_subreddit, t2.subreddit as t2_subreddit, SUM(1) as NumOverlaps

那么简单! 因此,假设这是您查询中唯一的问题 - 它现在应该可以使用了!