继Select first row in each GROUP BY group?之后我试图在Google大查询中做一个非常类似的事情。
数据集:fh-bigquery:reddit_comments.2018_01
目标:对于每个link_id(Reddit提交),根据created_utc选择第一条评论
private void LoadPractices()
{
using (var db = new ApplicationDbContext())
{
var practices = db.Practices.ToList();
Practice.DataSource = practices;
Practice.DataTextField = "PracticeName";
Practice.DataValueField = "PracticeId";
Practice.DataBind();
}
}
目前它不起作用,因为它仍然没有给我独特/不同的parent_id(s)
拜托,谢谢!
修改 当说parent_id是==提交时我说错了,它实际上是link_id
答案 0 :(得分:1)
我们可以在这里使用ROW_NUMBER()
:
SELECT body, parent_id, created_utc
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY parent_id ORDER BY created_utc) rn
FROM [fh-bigquery:reddit_comments.2018_01]
WHERE subreddit_id = 't5_2zkvo'
) t
WHERE rn = 1
ORDER BY parent_id ,body, created_utc DESC;
请注意,您可以继续使用当前的方法,但是您必须将查询短语作为表和子查询之间的连接,子查询找到每条注释的最早条目:
SELECT t1.*
FROM [fh-bigquery:reddit_comments.2018_01] t1
INNER JOIN
(
SELECT parent_id, MIN(created_utc) AS first_created_utc
FROM [fh-bigquery:reddit_comments.2018_01]
GROUP BY parent_id
) t2
ON t1.parent_id = t2.parent_id AND t1.created_utc = t2.first_created_utc;
答案 1 :(得分:1)
以下是BigQuery Standard SQL
#standardSQL
SELECT
ARRAY_AGG(body ORDER BY created_utc LIMIT 1)[OFFSET(0)] body,
link_id
FROM `fh-bigquery.reddit_comments.2018_01`
WHERE subreddit_id = 't5_2zkvo'
GROUP BY link_id
-- ORDER BY link_id