使用AS和ON子句连接多个Reddit表时遇到问题

时间:2019-01-27 21:38:58

标签: google-bigquery reddit

我正在尝试将评论加入到多个表的帖子中。我需要一个AS子句,因为posts表和comments表共享一列“分数”。

我的目标是能够在所有这些表格中的数据中找到顶部帖子中的顶部注释。

#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore, 
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
comments.body, comments.score AS commentsscore, comments.id

FROM

fh-bigquery.reddit_posts.2015_12fh-bigquery.reddit_posts.2016_01fh-bigquery.reddit_posts.2016_02fh-bigquery.reddit_posts.2016_03fh-bigquery.reddit_posts.2016_04fh-bigquery.reddit_posts.2016_05fh-bigquery.reddit_posts.2016_06fh-bigquery.reddit_posts.2016_07fh-bigquery.reddit_posts.2016_08fh-bigquery.reddit_posts.2016_09fh-bigquery.reddit_posts.2016_10fh-bigquery.reddit_posts.2016_11fh-bigquery.reddit_posts.2016_12fh-bigquery.reddit_posts.2017_01fh-bigquery.reddit_posts.2017_02fh-bigquery.reddit_posts.2017_03fh-bigquery.reddit_posts.2017_04fh-bigquery.reddit_posts.2017_05fh-bigquery.reddit_posts.2017_06fh-bigquery.reddit_posts.2017_07fh-bigquery.reddit_posts.2017_08fh-bigquery.reddit_posts.2017_09fh-bigquery.reddit_posts.2017_10fh-bigquery.reddit_posts.2017_11fh-bigquery.reddit_posts.2017_12fh-bigquery.reddit_posts.2018_01fh-bigquery.reddit_posts.2018_02fh-bigquery.reddit_posts.2018_03fh-bigquery.reddit_posts.2018_04fh-bigquery.reddit_posts.2018_05fh-bigquery.reddit_posts.2018_06fh-bigquery.reddit_posts.2018_07fh-bigquery.reddit_posts.2018_08fh-bigquery.reddit_posts.2018_09fh-bigquery.reddit_posts.2018_10

AS posts

JOIN

fh-bigquery.reddit_comments.2015_12fh-bigquery.reddit_comments.2016_01fh-bigquery.reddit_comments.2016_02fh-bigquery.reddit_comments.2016_03fh-bigquery.reddit_comments.2016_04fh-bigquery.reddit_comments.2016_05fh-bigquery.reddit_comments.2016_06fh-bigquery.reddit_comments.2016_07fh-bigquery.reddit_comments.2016_08fh-bigquery.reddit_comments.2016_09fh-bigquery.reddit_comments.2016_10fh-bigquery.reddit_comments.2016_11fh-bigquery.reddit_comments.2016_12fh-bigquery.reddit_comments.2017_01fh-bigquery.reddit_comments.2017_02fh-bigquery.reddit_comments.2017_03fh-bigquery.reddit_comments.2017_04fh-bigquery.reddit_comments.2017_05fh-bigquery.reddit_comments.2017_06fh-bigquery.reddit_comments.2017_07fh-bigquery.reddit_comments.2017_08fh-bigquery.reddit_comments.2017_09fh-bigquery.reddit_comments.2017_10fh-bigquery.reddit_comments.2017_11fh-bigquery.reddit_comments.2017_12fh-bigquery.reddit_comments.2018_01fh-bigquery.reddit_comments.2018_02fh-bigquery.reddit_comments.2018_03fh-bigquery.reddit_comments.2018_04fh-bigquery.reddit_comments.2018_05fh-bigquery.reddit_comments.2018_06fh-bigquery.reddit_comments.2018_07fh-bigquery.reddit_comments.2018_08fh-bigquery.reddit_comments.2018_09fh-bigquery.reddit_comments.2018_10

AS comments

ON posts.id = SUBSTR(comments.link_id, 4)

WHERE posts.subreddit = 'Showerthoughts' AND posts.score >100 AND comments.score >100
ORDER BY posts.score DESC

我的目标是能够在所有这些表格中的数据中找到顶部帖子中的顶部注释。

1 个答案:

答案 0 :(得分:0)

好,因此此查询出现问题:

  • 小心!该查询将处理大量数据。我可以重新聚簇表以使这种方式更有效,但是我还没有。
  • 在#standardSQL中,逗号表示JOIN,而不是UNION。因此,您需要UNION表。
  • 快捷方式:您可以在表名的末尾附加一个*,以扩展到所有匹配的表。
  • 使用反引号转义表名。

话虽如此,一个有效的查询将是:

#standardSQL
SELECT posts.title, posts.url, posts.score AS postsscore, 
DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
SUBSTR(comments.body, 0, 80), comments.score AS commentsscore, comments.id

FROM `fh-bigquery.reddit_posts.2015*` AS posts
JOIN `fh-bigquery.reddit_comments.2015*` AS comments

ON posts.id = SUBSTR(comments.link_id, 4)

WHERE posts.subreddit = 'Showerthoughts' 
AND posts.score >100 
AND comments.score >100
ORDER BY posts.score DESC