从MySQL表中删除重复的连续数据

时间:2013-12-21 19:53:28

标签: mysql sql

我有一个存储时间序列数据的MySQL表 - 基本上是来自(相对)定期间隔的多个传感器的温度和湿度样本。

无论记录的温度和湿度是否发生变化,每个传感器的值都会定期存储到表格中(连同传感器的ID)。这创建了一个相对较大的表,所以我更新了应用程序,只在记录值发生变化时为传感器存储一个值 - 现在,当值发生变化时,它将存储带有前后传感器值的两行(这允许图表代码保持不变)。

现在,我想整理旧数据,以删除特定传感器的任何连续重复读数,仅保留表示第一次记录新传感器值的行,或上次记录传感器值的行。实际上,这只是删除冗余数据。

我已经尝试为此构建查询,但由于主键对于特定的单个传感器不是连续的,因此我无法识别可以删除的重复行。

使用数据提取更好地说明了这一点 - 我已经标记了要保留为粗体的行,并添加了一个描述我为什么要保留该特定行的注释。

+-----+----------+---------------------+-------------+----------+
| id  | sensorid | datetime            | temperature | humidity |
+-----+----------+---------------------+-------------+----------+
| 818 | E9       | 2012-10-23 20:59:03 |       20.00 |       72 | First val for E9
| 819 | C3       | 2012-10-23 20:59:19 |       19.50 |       69 | First val for C3
| 820 | E9       | 2012-10-23 20:59:47 |       20.00 |       72 |
| 821 | C3       | 2012-10-23 21:00:00 |       19.50 |       69 |
| 822 | E9       | 2012-10-23 21:00:29 |       20.00 |       72 |
| 823 | C3       | 2012-10-23 21:00:41 |       19.50 |       69 |
| 824 | E9       | 2012-10-23 21:01:12 |       20.00 |       72 |
| 825 | C3       | 2012-10-23 21:01:22 |       19.50 |       69 |
| 826 | E9       | 2012-10-23 21:01:55 |       20.00 |       72 |
| 827 | C3       | 2012-10-23 21:02:03 |       19.50 |       69 |
| 828 | E9       | 2012-10-23 21:02:38 |       20.00 |       72 |
| 829 | C3       | 2012-10-23 21:02:44 |       19.50 |       69 |
| 830 | E9       | 2012-10-23 21:03:21 |       20.00 |       72 |
| 831 | C3       | 2012-10-23 21:03:25 |       19.50 |       69 |
| 832 | E9       | 2012-10-23 21:04:04 |       20.00 |       72 |
| 833 | C3       | 2012-10-23 21:04:06 |       19.50 |       69 |
| 834 | EC       | 2012-10-23 21:04:32 |       13.90 |       91 | First val for EC
| 835 | EC       | 2012-10-23 21:04:32 |       13.90 |       91 |
| 836 | C3       | 2012-10-23 21:04:47 |       19.50 |       69 |
| 837 | E9       | 2012-10-23 21:04:47 |       20.00 |       72 |
| 838 | EC       | 2012-10-23 21:05:11 |       13.90 |       91 |
| 839 | C3       | 2012-10-23 21:05:28 |       19.50 |       69 |
| 840 | E9       | 2012-10-23 21:05:31 |       20.00 |       72 |
| 841 | EC       | 2012-10-23 21:05:50 |       13.90 |       91 |
| 842 | C3       | 2012-10-23 21:06:09 |       19.50 |       69 |
| 843 | E9       | 2012-10-23 21:06:13 |       20.00 |       72 | The last time E9 has a temp of 20
| 844 | EC       | 2012-10-23 21:06:29 |       13.90 |       91 |
| 845 | C3       | 2012-10-23 21:06:50 |       19.50 |       69 |
| 846 | E9       | 2012-10-23 21:06:56 |       19.90 |       72 | The first time E9 has a temp of 19.9
| 847 | EC       | 2012-10-23 21:07:08 |       13.90 |       91 |
| 848 | C3       | 2012-10-23 21:07:31 |       19.50 |       69 |
| 849 | E9       | 2012-10-23 21:07:39 |       19.90 |       72 |
| 850 | EC       | 2012-10-23 21:07:47 |       13.90 |       91 |
| 851 | C3       | 2012-10-23 21:08:12 |       19.50 |       69 |
| 852 | E9       | 2012-10-23 21:08:22 |       19.90 |       72 |
| 853 | EC       | 2012-10-23 21:08:26 |       13.90 |       91 |
| 854 | C3       | 2012-10-23 21:08:53 |       19.50 |       69 |
| 855 | EC       | 2012-10-23 21:09:05 |       13.90 |       91 |
| 856 | E9       | 2012-10-23 21:09:05 |       19.90 |       72 |
| 857 | C3       | 2012-10-23 21:09:34 |       19.50 |       69 | The last time C3 has a temp of 19.5 
| 858 | EC       | 2012-10-23 21:09:44 |       13.90 |       91 |
| 859 | E9       | 2012-10-23 21:09:49 |       19.90 |       72 |
| 860 | C3       | 2012-10-23 21:10:15 |       19.60 |       69 | The first time C3 has a temp of 19.6 
| 861 | EC       | 2012-10-23 21:10:23 |       13.90 |       91 |
| 862 | E9       | 2012-10-23 21:10:32 |       19.90 |       72 |
| 863 | EC       | 2012-10-23 21:11:02 |       13.90 |       91 |
| 864 | C3       | 2012-10-23 21:11:37 |       19.60 |       69 |
| 865 | E9       | 2012-10-23 21:11:58 |       19.90 |       72 | Last val for E9
| 866 | C3       | 2012-10-23 21:12:18 |       19.60 |       69 | Last val for C3
| 867 | EC       | 2012-10-23 21:12:20 |       13.90 |       91 | Last val for EC
+-----+----------+---------------------+-------------+----------+

2 个答案:

答案 0 :(得分:3)

使用user variables跟踪sensoridtemperaturehumidity列的“最后”值(按sensor和{datetime排序整个表时{1}}),可以识别每个记录属于哪个“组”,然后在此基础上进行汇总:

SELECT sensorid, temperature, humidity,
       MIN(datetime) dt_min, MAX(datetime) dt_max
FROM (
  SELECT   datetime,
           @group := @group + IF(
             @last_sensor   <=> sensorid
         AND @last_temp     <=> temperature
         AND @last_humidity <=> humidity
           , 0, 1) gp,
             @last_sensor   :=  sensorid    sensorid,
             @last_temp     :=  temperature temperature,
             @last_humidity :=  humidity    humidity
  FROM     my_table, (SELECT
             @group         :=  0,
             @last_sensor   :=  NULL,
             @last_temp     :=  NULL,
             @last_humidity :=  NULL
           ) init
  ORDER BY sensorid, datetime
) t GROUP BY t.gp

sqlfiddle上查看。

可以使用此查询执行反连接以从原始表中删除所有其他记录:

DELETE my_table.*
FROM   my_table LEFT JOIN (
         <above query>
       ) x
   ON  my_table.sensorid    = x.sensorid
   AND my_table.temperature = x.temperature
   AND my_table.humidity    = x.humidity
   AND my_table.datetime IN (x.dt_min, x.dt_max)
WHERE x.sensorid IS NULL

sqlfiddle上查看。

请注意,如果两个(相同的)读数来自相同datetime的相同传感器,则不清楚应保留/删除哪些记录(特别是因为您注意到“对于特定的单个传感器,主键不是连续的“):因此上述查询将保留记录id = 835

答案 1 :(得分:1)

基本上你需要加入每条记录来检查它的上一个和下一个温度:

SELECT t.id,
       t.sensorid,
       t.temperature,
       t.comment,
       prev.id prev_id,
       prev.temperature prev_temp,
       next.id next_id,
       next.temperature next_temp
FROM table1 t
LEFT JOIN table1 prev
  ON prev.id = ( SELECT max(id)
                 FROM table1 t1
                 WHERE t1.sensorid = t.sensorid
                   AND t1.id < t.id
                )
LEFT JOIN table1 next
  ON next.id =  ( SELECT min(id)
                  FROM table1 t1
                  WHERE t1.sensorid = t.sensorid
                    AND t1.id > t.id
                )
ORDER BY t.sensorid, t.id
;

演示 - &gt; http://www.sqlfiddle.com/#!2/297ab/4

使用此查询,您可以获取需要删除的记录,并检查以下条件:

current-row-temperature = previous-temperature 
   AND
current-row-temperature = next-temperature 

查询是:

SELECT t.id,
       t.sensorid,
       t.temperature,
       t.comment,
       prev.id prev_id,
       prev.temperature prev_temp,
       next.id next_id,
       next.temperature next_temp
FROM table1 t
LEFT JOIN table1 prev
  ON prev.id = ( SELECT max(id)
                 FROM table1 t1
                 WHERE t1.sensorid = t.sensorid
                   AND t1.id < t.id
                )
LEFT JOIN table1 next
  ON next.id =  ( SELECT min(id)
                  FROM table1 t1
                  WHERE t1.sensorid = t.sensorid
                    AND t1.id > t.id
                )
WHERE t.temperature = prev.temperature
  AND t.temperature = next.temperature
ORDER BY t.sensorid, t.id
;

该查询可用作多表删除中的子查询:

DETELE table1 t1, 
(
   the above query 
) x1
WHERE t1.id = x1.id

你也可以否定那个条件来只检索你想要保留的记录。

SELECT t.id,
       t.sensorid,
       t.temperature,
       t.comment,
       prev.id prev_id,
       prev.temperature prev_temp,
       next.id next_id,
       next.temperature next_temp
FROM table1 t
LEFT JOIN table1 prev
  ON prev.id = ( SELECT max(id)
                 FROM table1 t1
                 WHERE t1.sensorid = t.sensorid
                   AND t1.id < t.id
                )
LEFT JOIN table1 next
  ON next.id =  ( SELECT min(id)
                  FROM table1 t1
                  WHERE t1.sensorid = t.sensorid
                    AND t1.id > t.id
                )
WHERE t.temperature <> prev.temperature
   OR t.temperature <> next.temperature
   OR prev.temperature IS NULL
   OR next.temperature IS NULL
ORDER BY t.sensorid, t.id
;

您可以使用此查询将所选记录复制到新表:

CREATE TABLE new_table AS
SELECT t.*
FROM table1 t
LEFT JOIN table1 prev
  ON prev.id = ( SELECT max(id)
                 FROM table1 t1
                 WHERE t1.sensorid = t.sensorid
                   AND t1.id < t.id
                )
LEFT JOIN table1 next
  ON next.id =  ( SELECT min(id)
                  FROM table1 t1
                  WHERE t1.sensorid = t.sensorid
                    AND t1.id > t.id
                )
WHERE t.temperature <> prev.temperature
   OR t.temperature <> next.temperature
   OR prev.temperature IS NULL
   OR next.temperature IS NULL
ORDER BY t.sensorid, t.id
;