我具有一组随时间增加和减少的索引值。我希望确定值上升和下降的时间段。数据如下:
我尝试按范围对值进行分区,但我绝对不认为我做得对。这是我写的查询,充其量只能给我预定的日期
SELECT
date,
MAX(index) OVER (PARTITION BY MAX(CAST(index AS numeric))
ORDER BY
date)
FROM (
SELECT
(value1 - value2) AS index,
date
FROM
`project.dataset.table` )
GROUP BY
date,
index
ORDER BY
date
我的最终想法是,我想实现一个查询,当要求输入最小值和最大值时,都会产生类似的结果
Row | date | minimas
-------------------------------------
1 | 2017-02-04 | 0.3149100257069409
2 | 2017-12-05 | 0.5784622847441183
答案 0 :(得分:3)
处理相邻的重复值非常棘手。您没有指定如何处理这些内容。如果只想要第一个这样的值,则过滤有效:
对于局部最小值:
SELECT Row, date, f0 AS minimal
FROM (SELECT t.*,
LEAD(f0) OVER (ORDER BY DATE) as f0_lead
FROM (SELECT t.*,
LAG(f0) OVER (ORDER BY date) AS f0_lag
FROM `project.dataset.table` t
) t
WHERE f0_lag IS NULL or f0_lag <> f0
) t
WHERE (f0 < f0_lag or f0_lag is null) and
(f0 < f0_lead or f0_lead is null);
或者,如果您愿意,可以简化比较:
SELECT Row, date, f0 AS minimal
FROM (SELECT t.*,
LEAD(f0) OVER (ORDER BY DATE) as f0_lead
FROM (SELECT t.*,
LAG(f0) OVER (ORDER BY date) AS f0_lag
FROM t
) t
WHERE f0_lag IS NULL or f0 < f0_lag
) t
WHERE f0 < f0_lead or f0_lead is null;
局部最大值可以遵循相同的逻辑,将<
更改为>
。
Here是一个db <>小提琴(使用Postgres,但这没关系)。
编辑:
连续返回所有最小值/最大值更具挑战性。在BigQuery中可以使用以下功能:
WITH t AS (
SELECT 1 AS Row, '2017-01-19' AS date, 0.3904 AS f0 UNION ALL
SELECT 2, '2017-02-04', 0.3149 UNION ALL
SELECT 2.5, '2017-02-05', 0.3149 UNION ALL
SELECT 3, '2017-03-24', 0.3302 UNION ALL
SELECT 4, '2017-04-09', 0.5339 UNION ALL
SELECT 5, '2017-05-11', 0.7753 UNION ALL
SELECT 6, '2017-05-27', 0.8539 UNION ALL
SELECT 7, '2017-09-16', 0.8803 UNION ALL
SELECT 7.5, '2017-09-17', 0.8803 UNION ALL
SELECT 7.7, '2017-09-18', 0.8803 UNION ALL
SELECT 8, '2017-10-02', 0.8570 UNION ALL
SELECT 9, '2017-11-03', 0.7744 UNION ALL
SELECT 10, '2017-11-19', 0.6092 UNION ALL
SELECT 11, '2017-12-05', 0.5785
)
SELECT t.*
FROM (SELECT t.*,
MAX(f0_lag) OVER (PARTITION BY grp) as grp_f0_lag,
MAX(f0_lead) OVER (PARTITION BY grp) as grp_f0_lead
FROM (SELECT t.*,
COUNTIF(f0_lag <> f0) OVER (ORDER BY DATE) as grp,
LEAD(f0) OVER (ORDER BY DATE) as f0_lead
FROM (SELECT t.*,
LAG(f0) OVER (ORDER BY date) AS f0_lag
FROM t
) t
) t
) t
WHERE (f0 < grp_f0_lag or grp_f0_lag is null) and
(f0 < grp_f0_lead or grp_f0_lead is null) ;
基本上,这是标识相邻值的组。然后,它会在整个组中分布最大的lag()
和lead()
值(对于最大值,您要分布最小值)。
然后将整个组作为一个单元并放在结果集中。
答案 1 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT * EXCEPT(prev, next),
CASE
WHEN prev < next THEN 'min'
WHEN prev > next THEN 'max'
WHEN prev IS NULL THEN 'start'
WHEN next IS NULL THEN 'finish'
END extremum
FROM (
SELECT *,
SIGN(index - LAG(index) OVER(ORDER BY DAY)) prev,
SIGN(LEAD(index) OVER(ORDER BY DAY) - index) next
FROM `project.dataset.table`
)
WHERE IFNULL(prev != next, TRUE)
您可以使用问题中的示例数据来进行测试,如上示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2017-01-19' day, 0.39 index UNION ALL
SELECT '2017-02-04', 0.31 UNION ALL
SELECT '2017-03-24', 0.33 UNION ALL
SELECT '2017-04-09', 0.53 UNION ALL
SELECT '2017-05-11', 0.77 UNION ALL
SELECT '2017-05-27', 0.85 UNION ALL
SELECT '2017-09-16', 0.88 UNION ALL
SELECT '2017-10-02', 0.85 UNION ALL
SELECT '2017-11-03', 0.77 UNION ALL
SELECT '2017-11-19', 0.61 UNION ALL
SELECT '2017-12-05', 0.57
)
SELECT * EXCEPT(prev, next),
CASE
WHEN prev < next THEN 'min'
WHEN prev > next THEN 'max'
WHEN prev IS NULL THEN 'start'
WHEN next IS NULL THEN 'finish'
END extremum
FROM (
SELECT *,
SIGN(index - LAG(index) OVER(ORDER BY DAY)) prev,
SIGN(LEAD(index) OVER(ORDER BY DAY) - index) next
FROM `project.dataset.table`
)
WHERE IFNULL(prev != next, TRUE)
-- ORDER BY day
有结果
Row day index extremum
1 2017-01-19 0.39 start
2 2017-02-04 0.31 min
3 2017-09-16 0.88 max
4 2017-12-05 0.57 finish
答案 2 :(得分:0)
我们可以将局部最小值定义为x时间轴上的一个点,其中前后的响应值都大于最小值处的值。如果端点在任一端,则只需要一个更大的值即可。我们可以在此处尝试使用LEAD
和LAG
函数:
SELECT Row, date, f0 AS minimal
FROM
(
SELECT Row, date, f0,
LAG(f0, 1, f0 + 0.1) OVER (ORDER BY date) AS f0_lag,
LEAD(f0, 1, f0 + 0.1) OVER (ORDER BY date) AS f0_lead
FROM project.dataset.table
) t
WHERE f0 < f0_lag AND f0 < f0_lead;
以下是使用您的示例数据的demo in SQL Server。由于我的答案基于SQL Server,因为我无法访问BigQuery,因此您可能必须稍微调整一下我使用的语法。