很抱歉,标题有点罗word-我将在下面创建一个示例以突出显示我所指的内容。我有以下信息表:
describe(as.formula(paste(input$resp, '~', input$expl)), test)
t1
有了这张桌子,我只想:
足够简单。但是,每个团队的日期都没有押韵或理由(我不能简单地筛选最近的5个日期,因为每个团队的日期可能不同)。我目前有以下查询框架:
date team num_val
2017-10-04 ab 7
2017-10-03 ab 6
2017-10-02 ab 8
2017-10-05 ab 3
2017-10-07 ab 12
2017-10-06 ab 3
2017-10-01 ab 5
2017-09-08 cd 4
2017-09-09 cd 8
2017-09-10 cd 2
2017-09-14 cd 1
2017-09-13 cd 5
2017-09-11 cd 6
2017-09-12 cd 13
...非常感谢您的帮助,谢谢!
答案 0 :(得分:1)
每个获取最新的5个:
SELECT team, ARRAY_AGG(num_val ORDER BY date DESC LIMIT 5) arr
FROM x
GROUP BY team
然后UNNEST(arr)
并添加这些num_vals。
SELECT team, (SELECT SUM(num_val) FROM UNNEST(arr) num_val) the_sum
FROM (previous)
答案 1 :(得分:1)
BigQuery Standard SQL的其他选项很少,因此您会看到不同的方法
选项1
#standardSQL
SELECT team, SUM(num_val) sum_num FROM (
SELECT team, num_val, ROW_NUMBER() OVER(PARTITION BY team ORDER BY DATE DESC) pos
FROM `project.dataset.table`
)
WHERE pos <= 5
GROUP BY team
选项2
#standardSQL
SELECT team, sum_num FROM (
SELECT team,
SUM(num_val) OVER(PARTITION BY team ORDER BY DATE DESC ROWS BETWEEN CURRENT ROW AND 4 FOLLOWING) AS sum_num,
ROW_NUMBER() OVER(PARTITION BY team ORDER BY DATE DESC) pos
FROM `project.dataset.table`
)
WHERE pos = 1
如果要应用于您的问题的样本数据-两者都会产生以下结果
Row team sum_num
1 ab 31
2 cd 27
虽然上述选项在某些更复杂的情况下很有用-在您的特定情况下-我会选择菲利普答案中提供的选项(类似于一个选项)
#standardSQL
SELECT team, (SELECT SUM(num_val) FROM UNNEST(num_values)) sum_num
FROM (
SELECT team, ARRAY_AGG(STRUCT(num_val) ORDER BY DATE DESC LIMIT 5) num_values
FROM `project.dataset.table`
GROUP BY team
)