按分数获取得分最高的前5行

时间:2019-01-30 01:08:04

标签: sql google-bigquery reddit

我正在尝试为Reddit的每个帖子按得分获得前5条评论。我只想按分数检索每个帖子标题的前N条评论。

示例:我只想为每个帖子添加评论1和2。

Post 1 | Comment 1 | Comment Score 10
Post 1 | Comment 2 | Comment Score 9
Post 1 | Comment 3 | Comment Score 8
Post 2 | Comment 1 | Comment Score 10
Post 2 | Comment 2 | Comment Score 9
Post 2 | Comment 3 | Comment Score 8

StandardSQL

SELECT 
    posts.title, 
    posts.url, 
    posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), 
    comments.score AS commentsscore, 
    comments.id
FROM 
    `fh-bigquery.reddit_posts.2015*` AS posts
    JOIN `fh-bigquery.reddit_comments.2015*` AS comments
        ON posts.id = SUBSTR(comments.link_id, 4)
WHERE 
    posts.subreddit = 'Showerthoughts' 
    AND posts.score >100 
    AND comments.score >100
ORDER BY 
    posts.score DESC, 
    posts.title DESC, 
    comments.score DESC

1 个答案:

答案 0 :(得分:3)

以下是用于BigQuery标准SQL

#standardSQL
SELECT * EXCEPT(pos) FROM (
  SELECT 
    posts.title, 
    posts.url, 
    posts.score AS postsscore, 
    DATE_TRUNC(DATE(TIMESTAMP_SECONDS(posts.created_utc)), MONTH), 
    SUBSTR(comments.body, 0, 80), 
    comments.score AS commentsscore, 
    comments.id,
    ROW_NUMBER() OVER(PARTITION BY posts.url ORDER BY comments.score DESC) pos
  FROM `fh-bigquery.reddit_posts.2015*` AS posts
  JOIN `fh-bigquery.reddit_comments.2015*` AS comments
  ON posts.id = SUBSTR(comments.link_id, 4)
  WHERE posts.subreddit = 'Showerthoughts' 
  AND posts.score >100 
  AND comments.score >100
) 
WHERE pos < 3
ORDER BY postsscore DESC, title DESC, commentsscore DESC