用相邻列之间的平均值填充平均值,并限制另一列

时间:2019-03-30 18:21:44

标签: mysql sql partition-by

我有一个表,其列名称为“ id”,“ time”,“ value” 并且当“值”为空时,我希望通过该ID上的“时间”列在最近的邻居之间进行平均

我的问题正是select nearest neighbours所描述的,但是答案并不能解释我如何找到对另一列有限制的最近邻居(id应该相同)

示例: 在第二行中缺少“值”

id       | time  | value
-------------------------
11111    | 1     | 5.0
11111    | 10    | 
22222    | 7     | 32.6
33333    | 11    | 15.88
11111    | 15    | 20.0

我希望它是

id       | time  | value
-------------------------
11111    | 1     | 5.0
11111    | 10    | 12.5*
22222    | 7     | 32.6
33333    | 11    | 15.88
11111    | 15    | 20.0

为(20.0 + 5.0)/ 2 = 12.5

如何在MySQL中获得它?

3 个答案:

答案 0 :(得分:0)

假设time定义了顺序并且是唯一的(为此需要一个唯一的列,一个用于定义顺序的列),一种方法是使用子查询来获得顶部(底部)value使用timeORDER BY使用较小(较大)LIMIT的记录。

SELECT t1.id,
       t1.time,
       coalesce(t1.value,
                ((SELECT t2.value
                         FROM elbat t2
                         WHERE t2.id = t1.id
                               AND t2.time < t1.time
                         ORDER BY t2.time DESC
                         LIMIT 1)
                 +
                 (SELECT t2.value
                         FROM elbat t2
                         WHERE t2.id = t1.id
                               AND t2.time > t1.time
                         ORDER BY t2.time ASC
                         LIMIT 1)
                )
                /
                2) value
       FROM elbat t1;

db<>fiddle

但这只能填补一排宽的间隙。如果可能存在更大的差距,则必须定义这些行的下一个非null邻居是什么。

答案 1 :(得分:0)

只加入自我,但不要担心NEXT_VALUE

SELECT ID_,
   TIME_,
   CASE
      WHEN VALUE_ IS NULL THEN (LAST_VALUE + NEXT_VALUE) / 2
      ELSE VALUE_
   END AS REAL_VALUE
FROM (SELECT ROW_NUMBER () OVER (PARTITION BY ID_ ORDER BY TIME_ DESC)
              NOW_ROW_NUM,
           ID_,
           TIME_,
           VALUE_
      FROM TESTTABLE)
   LEFT JOIN (SELECT (ROW_NUMBER ()
                         OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
                     - 1
                        LAST_ROW_NUM,
                     ID_ AS LAST_ID,
                     VALUE_ AS LAST_VALUE
                FROM TESTTABLE)
      ON ID_ = LAST_ID AND NOW_ROW_NUM = LAST_ROW_NUM
   LEFT JOIN (SELECT (ROW_NUMBER ()
                         OVER (PARTITION BY ID_ ORDER BY TIME_ DESC))
                     + 1
                        NEXT_ROW_NUM,
                     ID_ AS NEXT_ID,
                     VALUE_ AS NEXT_VALUE
                FROM TESTTABLE)
      ON ID_ = LAST_ID AND NOW_ROW_NUM = NEXT_ROW_NUM

答案 2 :(得分:0)

只需使用lead()lag()。最简单的答案是:

selet t.*
      (case when value is null
            then ( lag(value) over (partition by id order by time) + lead(value) over (partition by id order by time) ) / 2
            else value
       end) as new_value
from t;

这不适用于第一个或最后一个值。您可以改用:

selet t.*
      (case when value is null
            then ( avg(value) over (partition by id order by time rows between 1 preceding and 1 following)
            else value
       end) as new_value
from t;

这将根据前一行和后一行中的可用数据来计算平均值。