我们正在尝试使用BigQuery实现峰值检测算法。基本上我们想要跟踪是否存在阈值超标,然后只输出开始和结束 时间。
例如,使用GSOD天气数据选择任意温度阈值。基本上对于给定的GSOD位置,找到它在70度以上但不计算的日期 这是一个事件,直到它回落到70度以下。
答案 0 :(得分:1)
LAG()和LEAD()将帮助我们查看下一天和前一天的温度。我们会知道,当前一天低于70时,事件已经开始,第二天就结束了。同样,但相反,以检测结束。
SELECT day, prevday, temp, nextday,
IF((temp>prevday and prevday <70), 'start', 'end') period
FROM (
SELECT day, temp,
LEAD(temp) OVER(ORDER BY day) nextday,
LAG(temp) OVER(ORDER BY day) prevday
FROM (
SELECT year*10000+month*100+day day, mean_temp temp
FROM [bigquery-samples:weather_geo.gsod]
WHERE station_number = 8404
)
)
WHERE (temp > 70 and prevday < 70 and nextday > 70)
OR (nextday < 70 and temp > 70 and prevday>70)
结果:
Row day prevday temp nextday period
1 20091009 77.0 74.0 68.7 end
2 20091013 69.0 72.0 74.8 start
3 20091016 73.2 70.6 68.9 end
4 20091029 69.2 72.7 75.3 start
5 20091106 73.8 72.7 67.6 end
...
(2.8s elapsed, 4.53 GB processed)