大查询明显和分组

时间:2018-06-15 09:49:53

标签: sql google-bigquery reddit

Select first row in each GROUP BY group?之后我试图在Google大查询中做一个非常类似的事情。

数据集:fh-bigquery:reddit_comments.2018_01

目标:对于每个link_id(Reddit提交),根据created_utc选择第一条评论

private void LoadPractices()
{
    using (var db = new ApplicationDbContext())
    {
        var practices = db.Practices.ToList();

        Practice.DataSource = practices;
        Practice.DataTextField = "PracticeName";
        Practice.DataValueField = "PracticeId";
        Practice.DataBind();
    }
}

目前它不起作用,因为它仍然没有给我独特/不同的parent_id(s)

拜托,谢谢!

修改 当说parent_id是==提交时我说错了,它实际上是link_id

2 个答案:

答案 0 :(得分:1)

我们可以在这里使用ROW_NUMBER()

SELECT body, parent_id, created_utc
FROM
(
    SELECT *, ROW_NUMBER() OVER (PARTITION BY parent_id ORDER BY created_utc) rn
    FROM [fh-bigquery:reddit_comments.2018_01]
    WHERE subreddit_id = 't5_2zkvo'
) t
WHERE rn = 1
ORDER BY parent_id ,body, created_utc DESC;

请注意,您可以继续使用当前的方法,但是您必须将查询短语作为表和子查询之间的连接,子查询找到每条注释的最早条目:

SELECT t1.*
FROM [fh-bigquery:reddit_comments.2018_01] t1
INNER JOIN
(
    SELECT parent_id, MIN(created_utc) AS first_created_utc
    FROM [fh-bigquery:reddit_comments.2018_01]
    GROUP BY parent_id
) t2
    ON t1.parent_id = t2.parent_id AND t1.created_utc = t2.first_created_utc;

答案 1 :(得分:1)

以下是BigQuery Standard SQL

#standardSQL
SELECT 
  ARRAY_AGG(body ORDER BY created_utc LIMIT 1)[OFFSET(0)] body, 
  link_id
FROM `fh-bigquery.reddit_comments.2018_01`
WHERE subreddit_id = 't5_2zkvo'
GROUP BY link_id
-- ORDER BY link_id