我需要有效地计算Google BigQuery中数字序列的中值。是否可能?
答案 0 :(得分:10)
是的,可以使用PERCENTILE_CONT窗口功能。
返回基于线性插值的值 根据ORDER BY子句对它们进行排序后,该组的值。
必须介于0和1之间。
此窗口函数在OVER子句中需要ORDER BY。
所以一个示例查询就像(max()只是在整个组中工作,但它不是用作数学逻辑,不应该混淆你)
SELECT room,
max(median) FROM (SELECT room,
percentile_cont(0.5) OVER (PARTITION BY room
ORDER BY temperature) AS median FROM
(SELECT 1 AS room,
11 AS temperature),
(SELECT 1 AS room,
12 AS temperature),
(SELECT 1 AS room,
14 AS temperature),
(SELECT 1 AS room,
19 AS temperature),
(SELECT 1 AS room,
13 AS temperature),
(SELECT 2 AS room,
20 AS temperature),
(SELECT 2 AS room,
21 AS temperature),
(SELECT 2 AS room,
29 AS temperature),
(SELECT 3 AS room,
30 AS temperature)) GROUP BY room
返回:
+------+-------------+
| room | temperature |
+------+-------------+
| 1 | 13 |
| 2 | 21 |
| 3 | 30 |
+------+-------------+
答案 1 :(得分:7)
替代解决方案,当您不需要绝对精确的结果并且近似很好时 - 您可以使用NTH和QUANTILES聚合函数的组合。这种方法的优点是它比分析窗函数更具可扩展性,但缺点是它给出了近似的结果。
SELECT room,
NTH(50, QUANTILES(temperature, 101)) FROM
(SELECT 1 AS room,
11 AS temperature),
(SELECT 1 AS room,
12 AS temperature),
(SELECT 1 AS room,
14 AS temperature),
(SELECT 1 AS room,
19 AS temperature),
(SELECT 1 AS room,
13 AS temperature),
(SELECT 2 AS room,
20 AS temperature),
(SELECT 2 AS room,
21 AS temperature),
(SELECT 2 AS room,
29 AS temperature),
(SELECT 3 AS room,
30 AS temperature) GROUP BY room
返回
room temperature
1 13
2 21
3 30
答案 2 :(得分:5)
2018更新:
BigQuery SQL: Average, geometric mean, remove outliers, median
出于我自己的记忆目的,使用出租车数据进行查询:
近似分位数:
SELECT MONTH(pickup_datetime) month, NTH(51, QUANTILES(tip_amount,101)) median
FROM [nyc-tlc:green.trips_2015]
WHERE tip_amount > 0
GROUP BY 1
ORDER BY 1
给出与PERCENTILE_DISC相同的结果:
SELECT month, FIRST(median) median
FROM (
SELECT MONTH(pickup_datetime) month, tip_amount, PERCENTILE_DISC(0.5) OVER(PARTITION BY month ORDER BY tip_amount) median
FROM [nyc-tlc:green.trips_2015]
WHERE tip_amount > 0
)
GROUP BY 1
ORDER BY 1
StandardSQL:
#StandardSQL
SELECT DATE_TRUNC(DATE(pickup_datetime), MONTH) month, APPROX_QUANTILES(tip_amount,1000)[OFFSET(500)] median
FROM `nyc-tlc.green.trips_2015`
WHERE tip_amount > 0
GROUP BY 1
ORDER BY 1