好的,我是SQL和Big Query的新手,并且出现了一个模糊的列名错误。我已经检查了堆栈溢出的其他答案,但无法找到/理解我的问题的答案。所以我得到: 错误:2.40 - 2.68:不明确的列名subreddit。
对于这段代码(我改编自另一个人对类似事物的分析):
#legacySQL
# Creating list of number of users who authored at least 10 posts in pairs of subreddits:
SELECT t1.subreddit, t2.subreddit, SUM(1) as NumOverlaps
FROM (SELECT subreddit, author, COUNT(1) as cnt
FROM (TABLE_QUERY([fh-bigquery:reddit_comments],
'table_id CONTAINS "2017_" AND length(table_id) >= 5'))
GROUP BY subreddit, author HAVING cnt > 5) t1
JOIN (SELECT subreddit, author, COUNT(1) as cnt
FROM(TABLE_QUERY([fh-bigquery:reddit_comments],
'table_id CONTAINS "2017_" AND length(table_id) >= 5'))
GROUP BY subreddit, author HAVING cnt > 5) t2
ON t1.author=t2.author
WHERE t1.subreddit!=t2.subreddit
GROUP BY t1.subreddit, t2.subreddit
感谢您的帮助!
答案 0 :(得分:0)
您获得的错误必须有一个暂时原因(可能是BigQuery缓存问题),因为我成功运行了允许大结果,启用展平结果并指定目标表的相同查询。 但是,结果并不正确,因为列出了所有的subreddits。原因必须是使用连接2个相同表的Join子句(这也可能触发了模糊的列名错误),并且BigQuery必须进行交叉连接,因此会产生相乘的结果。
我建议您创建一个包含目标公共数据集结果的表,如下所示:
SELECT subreddit, author, COUNT(1) as cnt
FROM(TABLE_QUERY([fh-bigquery:reddit_comments],
'table_id CONTAINS "2017_" AND length(table_id) >= 5'))
GROUP BY subreddit, author HAVING cnt > 5'
然后应用另一个查询来仅使用一个表(查询公共数据集产生的结果)获得所需结果。 建议optimize您的查询和avoid SQL anti-patterns。
答案 1 :(得分:0)
您的select语句SELECT t1.subreddit, t2.subreddit, SUM(1) as NumOverlaps
在输出中引入了三个字段 - 前两个字段将具有相同的名称 - subreddit
因此错误消息Ambiguous column name subreddit.
要解决歧义,只需使用下面的例子中的别名
SELECT t1.subreddit as t1_subreddit, t2.subreddit as t2_subreddit, SUM(1) as NumOverlaps
那么简单! 因此,假设这是您查询中唯一的问题 - 它现在应该可以使用了!