使用SQL查找时间序列值的最大值和最小值

时间:2019-04-13 14:45:30

标签: sql google-bigquery

我具有一组随时间增加和减少的索引值。我希望确定值上升和下降的时间段。数据如下:

enter image description here

我尝试按范围对值进行分区,但我绝对不认为我做得对。这是我写的查询,充其量只能给我预定的日期

SELECT
  date,
  MAX(index) OVER (PARTITION BY MAX(CAST(index AS numeric))
  ORDER BY
    date)
FROM (
  SELECT
    (value1 - value2) AS index,
    date
  FROM
    `project.dataset.table` )
GROUP BY
  date,
  index
ORDER BY
  date

我的最终想法是,我想实现一个查询,当要求输入最小值和最大值时,都会产生类似的结果

Row |    date    |       minimas
-------------------------------------
1   | 2017-02-04 | 0.3149100257069409
2   | 2017-12-05 | 0.5784622847441183

3 个答案:

答案 0 :(得分:3)

处理相邻的重复值非常棘手。您没有指定如何处理这些内容。如果只想要第一个这样的值,则过滤有效:

对于局部最小值:

SELECT Row, date, f0 AS minimal
FROM (SELECT t.*,
             LEAD(f0) OVER (ORDER BY DATE) as f0_lead
      FROM (SELECT t.*,
                   LAG(f0) OVER (ORDER BY date) AS f0_lag
            FROM `project.dataset.table` t
           ) t
      WHERE f0_lag IS NULL or f0_lag <> f0
     ) t
WHERE (f0 < f0_lag or f0_lag is null) and
      (f0 < f0_lead or f0_lead is null);

或者,如果您愿意,可以简化比较:

SELECT Row, date, f0 AS minimal
FROM (SELECT t.*,
             LEAD(f0) OVER (ORDER BY DATE) as f0_lead
      FROM (SELECT t.*,
                   LAG(f0) OVER (ORDER BY date) AS f0_lag
            FROM t
           ) t
      WHERE f0_lag IS NULL or f0 < f0_lag
     ) t
WHERE f0 < f0_lead or f0_lead is null;

局部最大值可以遵循相同的逻辑,将<更改为>

Here是一个db <>小提琴(使用Postgres,但这没关系)。

编辑:

连续返回所有最小值/最大值更具挑战性。在BigQuery中可以使用以下功能:

WITH t AS (
    SELECT 1 AS Row, '2017-01-19' AS date, 0.3904 AS f0 UNION ALL
    SELECT 2,  '2017-02-04', 0.3149 UNION ALL
    SELECT 2.5,  '2017-02-05', 0.3149 UNION ALL
    SELECT 3,  '2017-03-24', 0.3302 UNION ALL
    SELECT 4,  '2017-04-09', 0.5339 UNION ALL
    SELECT 5,  '2017-05-11', 0.7753 UNION ALL
    SELECT 6,  '2017-05-27', 0.8539 UNION ALL
    SELECT 7,  '2017-09-16', 0.8803 UNION ALL
    SELECT 7.5,  '2017-09-17', 0.8803 UNION ALL
    SELECT 7.7,  '2017-09-18', 0.8803 UNION ALL
    SELECT 8,  '2017-10-02', 0.8570 UNION ALL
    SELECT 9,  '2017-11-03', 0.7744 UNION ALL
    SELECT 10, '2017-11-19', 0.6092 UNION ALL
    SELECT 11, '2017-12-05', 0.5785
)
SELECT t.*
FROM (SELECT t.*,
             MAX(f0_lag) OVER (PARTITION BY grp) as grp_f0_lag,
             MAX(f0_lead) OVER (PARTITION BY grp) as grp_f0_lead
      FROM (SELECT t.*,
                   COUNTIF(f0_lag <> f0) OVER (ORDER BY DATE) as grp,
                   LEAD(f0) OVER (ORDER BY DATE) as f0_lead
            FROM (SELECT t.*,
                         LAG(f0) OVER (ORDER BY date) AS f0_lag
                  FROM t
                 ) t
           ) t
     ) t
WHERE (f0 < grp_f0_lag or grp_f0_lag is null) and
      (f0 < grp_f0_lead or grp_f0_lead is null) ;

基本上,这是标识相邻值的组。然后,它会在整个组中分布最大的lag()lead()值(对于最大值,您要分布最小值)。

然后将整个组作为一个单元并放在结果集中。

答案 1 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
SELECT * EXCEPT(prev, next), 
  CASE 
    WHEN prev < next THEN 'min'
    WHEN prev > next THEN 'max'
    WHEN prev IS NULL THEN 'start'
    WHEN next IS NULL THEN 'finish'
  END extremum
FROM (
  SELECT *, 
    SIGN(index - LAG(index) OVER(ORDER BY DAY)) prev, 
    SIGN(LEAD(index) OVER(ORDER BY DAY) - index) next
  FROM `project.dataset.table`
)
WHERE IFNULL(prev != next, TRUE)

您可以使用问题中的示例数据来进行测试,如上示例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2017-01-19' day, 0.39 index UNION ALL
  SELECT '2017-02-04', 0.31 UNION ALL
  SELECT '2017-03-24', 0.33 UNION ALL
  SELECT '2017-04-09', 0.53 UNION ALL
  SELECT '2017-05-11', 0.77 UNION ALL
  SELECT '2017-05-27', 0.85 UNION ALL
  SELECT '2017-09-16', 0.88 UNION ALL
  SELECT '2017-10-02', 0.85 UNION ALL
  SELECT '2017-11-03', 0.77 UNION ALL
  SELECT '2017-11-19', 0.61 UNION ALL
  SELECT '2017-12-05', 0.57 
)
SELECT * EXCEPT(prev, next), 
  CASE 
    WHEN prev < next THEN 'min'
    WHEN prev > next THEN 'max'
    WHEN prev IS NULL THEN 'start'
    WHEN next IS NULL THEN 'finish'
  END extremum
FROM (
  SELECT *, 
    SIGN(index - LAG(index) OVER(ORDER BY DAY)) prev, 
    SIGN(LEAD(index) OVER(ORDER BY DAY) - index) next
  FROM `project.dataset.table`
)
WHERE IFNULL(prev != next, TRUE)
-- ORDER BY day

有结果

Row day         index   extremum     
1   2017-01-19  0.39    start    
2   2017-02-04  0.31    min  
3   2017-09-16  0.88    max  
4   2017-12-05  0.57    finish    

答案 2 :(得分:0)

我们可以将局部最小值定义为x时间轴上的一个点,其中前后的响应值都大于最小值处的值。如果端点在任一端,则只需要一个更大的值即可。我们可以在此处尝试使用LEADLAG函数:

SELECT Row, date, f0 AS minimal
FROM
(
    SELECT Row, date, f0,
        LAG(f0, 1, f0 + 0.1) OVER (ORDER BY date) AS f0_lag,
        LEAD(f0, 1, f0 + 0.1) OVER (ORDER BY date) AS f0_lead
    FROM project.dataset.table
) t
WHERE f0 < f0_lag AND f0 < f0_lead;

以下是使用您的示例数据的demo in SQL Server。由于我的答案基于SQL Server,因为我无法访问BigQuery,因此您可能必须稍微调整一下我使用的语法。