Finidng跨越时间范围(1小时)的值范围(5分)

时间:2016-11-22 18:51:40

标签: sql google-bigquery

我正在监控物联网应用中温度控制器的有效性。我试图在时间序列中找到“有趣的点”。这些类似于局部最小值或最大值,但在趋势开始时包括曲线上的点。这不仅仅是最小值和最大值。这是一小时内发生了5个点的范围。

来源

| inMins | unixTime         | temp |  
|--------|------------------|------|   
| 0      | 1479042000000000 | 10.0 |  
| 5      | 1479042300000000 | 11.0 |  
| 10     | 1479042600000000 | 12.0 |  
| 15     | 1479042600000000 | 13.0 |  
| 20     | 1479043200000000 | 14.0 |  
| 25     | 1479043500000000 | 15.0 |  
| 30     | 1479043800000000 | 14.0 |  
| 35     | 1479044100000000 | 13.0 |  
| 40     | 1479044400000000 | 12.0 |  
| 45     | 1479044700000000 | 11.0 |  
| 50     | 1479045000000000 | 10.0 |  
| 55     | 1479045300000000 | 9.0  |  
| 60     | 1479045600000000 | 8.0  |  
| 65     | 1479045900000000 | 9.0  |  
| 70     | 1479046200000000 | 10.0 |  
| 75     | 1479046500000000 | 11.0 |  
| 80     | 1479046800000000 | 12.0 |  
| 85     | 1479047100000000 | 13.0 |  
| 90     | 1479047400000000 | 14.0 |  

所需形状

| inMins | unixTime         | temp | coldOrHot |  
|--------|------------------|------|-----------|  
| 0      | 1479042000000000 | 10.0 | 1         |  
| 25     | 1479043500000000 | 15.0 | 2         |  
| 30     | 1479043800000000 | 14.0 | 2         |
| 35     | 1479044100000000 | 13.0 | 2         |  
| 60     | 1479045600000000 | 8.0  | 1         |  
| 65     | 1479045900000000 | 9.0  | 1         |  

我目前的结果有一些问题

| inMins | unixTime         | temp | coldOrHot |  
|--------|------------------|------|-----------|  
| 25     | 1479043500000000 | 15.0 | 2         |  
| 30     | 1479043800000000 | 14.0 | 2         |  
| 60     | 1479045600000000 | 8.0  | 1         |  
| 65     | 1479045900000000 | 9.0  | 1         |  
| 70     | 1479046200000000 | 10.0 | 1         |  
| 75     | 1479046500000000 | 11.0 | 1         |  
| 80     | 1479046800000000 | 12.0 | 1         |  
| 85     | 1479047100000000 | 13.0 | 1         |  
| 90     | 1479047400000000 | 14.0 | 1         |  

SQL

Select 
  inMins,
  unixTime,
  temp,
  coldOrHot
from 
(Select
  inMins,
  unixTime,
  temp,
  -- 1 means Cold, 2 means Hot, 0 is noise
  if(temp=theLowInWindowDesc,1,
  if(temp=theHighInWindowDesc,2,0)) as coldOrHot,
  theHighInWindowDesc,
  theLowInWindowDesc
FROM
  (SELECT
  inMins,
  unixTime,
  temp,
  theHighInWindowDesc,
  theLowInWindowDesc
  FROM
    (Select
        inMins,
        unixTime,
        temp,
        MAX(temp) OVER(ORDER BY
          unixTime desc RANGE BETWEEN 60 * 60 * 1000000 PRECEDING
          AND CURRENT ROW) AS theHighInWindowDesc,
        MIN(temp) OVER(ORDER BY
         unixTime desc RANGE BETWEEN 60 * 60 * 1000000 PRECEDING
         AND CURRENT ROW) AS theLowInWindowDesc
        FROM
        [esheetzbq:findingLocalExtrema.timeSeriesForKevin]
        ORDER BY
        inMins asc
     )
  )
)
where coldOrHot=1 or coldOrHot=2 

问题

  1. 当温度为10时,我不会在0分钟接受“冷”,并且在25分钟内增加5分
  2. 我没有在第35分钟获得“热门”值。
  3. 从分数70到90的结果没有考虑我的5分的范围标准并且正在发生,因为我当前的逻辑是基于极值而不是范围。 “”OVER“的SQL窗口函数”在数据集的最后一小时内获取不到一小时的行数。这是预期的行为,我不确定哪种逻辑最好用于排除在没有看到5点范围的情况下发出警告的记录。
  4. 这会有规模吗?我将在大约34M行的记录集上运行此逻辑。

1 个答案:

答案 0 :(得分:2)

我们走了。以下是BigQuery Standard SQL
我还没有做任何改进/优化查询的尝试 - 而是故意保持它"破坏"完全按照我编写的方式来查询子查询 - 确保逻辑很容易跟踪并因此理解 我已经包含以下数据以便于测试,但如果您想在真实数据上进行测试,则可以注释掉数据部分

玩得开心:o)

#standardSQL
WITH `esheetzbq.findingLocalExtrema.timeSeriesForKevin` AS (
  SELECT 0 AS inMins, 1479042000000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 5 AS inMins, 1479042300000000 AS unixTime, 11.0 AS temp UNION ALL   SELECT 10 AS inMins, 1479042600000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 15 AS inMins, 1479042900000000 AS unixTime, 13.0 AS temp UNION ALL       SELECT 20 AS inMins, 1479043200000000 AS unixTime, 14.0 AS temp UNION ALL   SELECT 25 AS inMins, 1479043500000000 AS unixTime, 15.0 AS temp UNION ALL   SELECT 30 AS inMins, 1479043800000000 AS unixTime, 14.0 AS temp UNION ALL   SELECT 35 AS inMins, 1479044100000000 AS unixTime, 13.0 AS temp UNION ALL
  SELECT 40 AS inMins, 1479044400000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 45 AS inMins, 1479044700000000 AS unixTime, 11.0 AS temp UNION ALL   SELECT 50 AS inMins, 1479045000000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 55 AS inMins, 1479045300000000 AS unixTime, 9.0 AS temp UNION ALL       SELECT 60 AS inMins, 1479045600000000 AS unixTime, 8.0 AS temp UNION ALL   SELECT 65 AS inMins, 1479045900000000 AS unixTime, 9.0 AS temp UNION ALL   SELECT 70 AS inMins, 1479046200000000 AS unixTime, 10.0 AS temp UNION ALL   SELECT 75 AS inMins, 1479046500000000 AS unixTime, 11.0 AS temp UNION ALL       SELECT 80 AS inMins, 1479046800000000 AS unixTime, 12.0 AS temp UNION ALL   SELECT 85 AS inMins, 1479047100000000 AS unixTime, 13.0 AS temp UNION ALL   SELECT 90 AS inMins, 1479047400000000 AS unixTime, 14.0 AS temp UNION ALL        SELECT 95 AS inMins, 1479047700000000 AS unixTime, 15 AS temp UNION ALL SELECT 100 AS inMins, 1479048000000000 AS unixTime, 16 AS temp UNION ALL  SELECT 105 AS inMins, 1479048300000000 AS unixTime, 17 AS temp UNION ALL SELECT 110 AS inMins, 1479048600000000 AS unixTime, 18 AS temp UNION ALL 
  SELECT 115 AS inMins, 1479048900000000 AS unixTime, 19 AS temp UNION ALL SELECT 120 AS inMins, 1479049200000000 AS unixTime, 20 AS temp UNION ALL      SELECT 125 AS inMins, 1479049500000000 AS unixTime, 21 AS temp UNION ALL SELECT 130 AS inMins, 1479049800000000 AS unixTime, 22 AS temp UNION ALL       SELECT 135 AS inMins, 1479050100000000 AS unixTime, 23 AS temp UNION ALL SELECT 140 AS inMins, 1479050400000000 AS unixTime, 24 AS temp UNION ALL      SELECT 145 AS inMins, 1479050700000000 AS unixTime, 25 AS temp UNION ALL SELECT 150 AS inMins, 1479051000000000 AS unixTime, 26 AS temp UNION ALL       SELECT 155 AS inMins, 1479051300000000 AS unixTime, 27 AS temp UNION ALL SELECT 160 AS inMins, 1479051600000000 AS unixTime, 28 AS temp UNION ALL      SELECT 165 AS inMins, 1479051900000000 AS unixTime, 29 AS temp UNION ALL SELECT 170 AS inMins, 1479052200000000 AS unixTime, 30 AS temp UNION ALL       SELECT 175 AS inMins, 1479052500000000 AS unixTime, 31 AS temp UNION ALL SELECT 180 AS inMins, 1479052800000000 AS unixTime, 32 AS temp UNION ALL      SELECT 185 AS inMins, 1479053100000000 AS unixTime, 33 AS temp UNION ALL SELECT 190 AS inMins, 1479053400000000 AS unixTime, 34 AS temp UNION ALL       SELECT 195 AS inMins, 1479053700000000 AS unixTime, 35 AS temp UNION ALL SELECT 200 AS inMins, 1479054000000000 AS unixTime, 36 AS temp UNION ALL      SELECT 205 AS inMins, 1479054300000000 AS unixTime, 37 AS temp UNION ALL SELECT 210 AS inMins, 1479054600000000 AS unixTime, 38 AS temp  
), y AS (
  SELECT inMins, unixTime, temp, delta,
    IFNULL(SUM(new_group_flag) OVER(ORDER BY unixTime ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING), 0) AS grp
  FROM (
    SELECT inMins, unixTime, temp,
      LEAD(temp) OVER(ORDER BY unixTime) - temp AS delta,
      CAST(SIGN(LEAD(temp) OVER(ORDER BY unixTime) - temp) != IFNULL(SIGN(temp - LAG(temp) OVER(ORDER BY unixTime)), SIGN(LEAD(temp) OVER(ORDER BY unixTime) - temp)) AS INT64) AS new_group_flag
    FROM `esheetzbq.findingLocalExtrema.timeSeriesForKevin`
  )
), yy AS (
  SELECT inMins, unixTime, temp, delta, grp FROM y UNION ALL
  SELECT inMins, unixTime, temp, delta, grp + 1 AS grp
  FROM (
    SELECT inMins, unixTime, temp, delta, grp, 
      unixTime - MAX(unixTime) OVER(PARTITION BY grp ORDER BY unixTime DESC) AS qq
    FROM y
  ) WHERE qq = 0
), v AS (
  SELECT inMins, unixTime, temp, delta,
     MIN(temp) OVER(PARTITION BY grp ORDER BY unixTime RANGE BETWEEN CURRENT ROW AND 3600000000 FOLLOWING) AS min_temp,
     MAX(temp) OVER(PARTITION BY grp ORDER BY unixTime RANGE BETWEEN CURRENT ROW AND 3600000000 FOLLOWING) AS max_temp
  FROM yy
)
SELECT inMins, unixTime, temp, IF(delta=1, 1, 2) AS coldOrHot
FROM v
WHERE  ABS(max_temp - temp) >= 5 OR ABS(min_temp - temp) >= 5

如果您按此方向行驶,请检查Enabling Standard SQLMigrating from legacy SQL以获取更多详细信息