BigQuery - 新手
尝试让一对同时评论前10个子评价的用户和使用BigQuery Reddit数据评论的公共子评价数
我刚开始使用BQ和SQL的初学者,我发现很难得到这个查询。有人可以给我一些指导吗?
答案 0 :(得分:2)
从来没有真正需要使用reddit数据所以下面只是为了让你开始至少投入一些东西,因为似乎没有人愿意。
快速逻辑:
Step - 1: Identify top 10 most commented subreddits
SELECT subreddit
FROM [fh-bigquery:reddit_comments.subr_rank_201505]
ORDER BY comments
DESC LIMIT 10
步骤2:对于每个subreddit,识别[solid]用户(超过50条评论)
SELECT author, subreddit, COUNT(1) AS comments
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (
SELECT subreddit
FROM [fh-bigquery:reddit_comments.subr_rank_201505]
ORDER BY comments DESC
LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit
HAVING comments > 50
步骤3:为每个subreddit标识一对普通用户(通过JOIN) 步骤4:最后,为每对用户计算共同子编号的数量
SELECT usera, userb, COUNT(1) AS subreddits
FROM (
SELECT
a.author AS usera,
b.author AS userb,
a.subreddit AS subreddit,
FROM (
SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit HAVING comments > 50 ) AS a
JOIN (
SELECT author, subreddit, COUNT(1) AS comments FROM [fh-bigquery:reddit_comments.2016_01]
WHERE subreddit IN (SELECT subreddit FROM [fh-bigquery:reddit_comments.subr_rank_201505] ORDER BY comments DESC LIMIT 10)
AND author NOT IN ('AutoModerator', '[deleted]')
GROUP BY author, subreddit HAVING comments > 50 ) AS b
ON a.subreddit = b.subreddit
WHERE a.author < b.author
)
GROUP BY usera, userb
HAVING subreddits > 3
ORDER BY subreddits DESC, usera, userb
希望这有帮助