我有一个存储时间序列数据的MySQL表 - 基本上是来自(相对)定期间隔的多个传感器的温度和湿度样本。
无论记录的温度和湿度是否发生变化,每个传感器的值都会定期存储到表格中(连同传感器的ID)。这创建了一个相对较大的表,所以我更新了应用程序,只在记录值发生变化时为传感器存储一个值 - 现在,当值发生变化时,它将存储带有前后传感器值的两行(这允许图表代码保持不变)。
现在,我想整理旧数据,以删除特定传感器的任何连续重复读数,仅保留表示第一次记录新传感器值的行,或上次记录传感器值的行。实际上,这只是删除冗余数据。
我已经尝试为此构建查询,但由于主键对于特定的单个传感器不是连续的,因此我无法识别可以删除的重复行。
使用数据提取更好地说明了这一点 - 我已经标记了要保留为粗体的行,并添加了一个描述我为什么要保留该特定行的注释。
+-----+----------+---------------------+-------------+----------+ | id | sensorid | datetime | temperature | humidity | +-----+----------+---------------------+-------------+----------+ | 818 | E9 | 2012-10-23 20:59:03 | 20.00 | 72 | First val for E9 | 819 | C3 | 2012-10-23 20:59:19 | 19.50 | 69 | First val for C3 | 820 | E9 | 2012-10-23 20:59:47 | 20.00 | 72 | | 821 | C3 | 2012-10-23 21:00:00 | 19.50 | 69 | | 822 | E9 | 2012-10-23 21:00:29 | 20.00 | 72 | | 823 | C3 | 2012-10-23 21:00:41 | 19.50 | 69 | | 824 | E9 | 2012-10-23 21:01:12 | 20.00 | 72 | | 825 | C3 | 2012-10-23 21:01:22 | 19.50 | 69 | | 826 | E9 | 2012-10-23 21:01:55 | 20.00 | 72 | | 827 | C3 | 2012-10-23 21:02:03 | 19.50 | 69 | | 828 | E9 | 2012-10-23 21:02:38 | 20.00 | 72 | | 829 | C3 | 2012-10-23 21:02:44 | 19.50 | 69 | | 830 | E9 | 2012-10-23 21:03:21 | 20.00 | 72 | | 831 | C3 | 2012-10-23 21:03:25 | 19.50 | 69 | | 832 | E9 | 2012-10-23 21:04:04 | 20.00 | 72 | | 833 | C3 | 2012-10-23 21:04:06 | 19.50 | 69 | | 834 | EC | 2012-10-23 21:04:32 | 13.90 | 91 | First val for EC | 835 | EC | 2012-10-23 21:04:32 | 13.90 | 91 | | 836 | C3 | 2012-10-23 21:04:47 | 19.50 | 69 | | 837 | E9 | 2012-10-23 21:04:47 | 20.00 | 72 | | 838 | EC | 2012-10-23 21:05:11 | 13.90 | 91 | | 839 | C3 | 2012-10-23 21:05:28 | 19.50 | 69 | | 840 | E9 | 2012-10-23 21:05:31 | 20.00 | 72 | | 841 | EC | 2012-10-23 21:05:50 | 13.90 | 91 | | 842 | C3 | 2012-10-23 21:06:09 | 19.50 | 69 | | 843 | E9 | 2012-10-23 21:06:13 | 20.00 | 72 | The last time E9 has a temp of 20 | 844 | EC | 2012-10-23 21:06:29 | 13.90 | 91 | | 845 | C3 | 2012-10-23 21:06:50 | 19.50 | 69 | | 846 | E9 | 2012-10-23 21:06:56 | 19.90 | 72 | The first time E9 has a temp of 19.9 | 847 | EC | 2012-10-23 21:07:08 | 13.90 | 91 | | 848 | C3 | 2012-10-23 21:07:31 | 19.50 | 69 | | 849 | E9 | 2012-10-23 21:07:39 | 19.90 | 72 | | 850 | EC | 2012-10-23 21:07:47 | 13.90 | 91 | | 851 | C3 | 2012-10-23 21:08:12 | 19.50 | 69 | | 852 | E9 | 2012-10-23 21:08:22 | 19.90 | 72 | | 853 | EC | 2012-10-23 21:08:26 | 13.90 | 91 | | 854 | C3 | 2012-10-23 21:08:53 | 19.50 | 69 | | 855 | EC | 2012-10-23 21:09:05 | 13.90 | 91 | | 856 | E9 | 2012-10-23 21:09:05 | 19.90 | 72 | | 857 | C3 | 2012-10-23 21:09:34 | 19.50 | 69 | The last time C3 has a temp of 19.5 | 858 | EC | 2012-10-23 21:09:44 | 13.90 | 91 | | 859 | E9 | 2012-10-23 21:09:49 | 19.90 | 72 | | 860 | C3 | 2012-10-23 21:10:15 | 19.60 | 69 | The first time C3 has a temp of 19.6 | 861 | EC | 2012-10-23 21:10:23 | 13.90 | 91 | | 862 | E9 | 2012-10-23 21:10:32 | 19.90 | 72 | | 863 | EC | 2012-10-23 21:11:02 | 13.90 | 91 | | 864 | C3 | 2012-10-23 21:11:37 | 19.60 | 69 | | 865 | E9 | 2012-10-23 21:11:58 | 19.90 | 72 | Last val for E9 | 866 | C3 | 2012-10-23 21:12:18 | 19.60 | 69 | Last val for C3 | 867 | EC | 2012-10-23 21:12:20 | 13.90 | 91 | Last val for EC +-----+----------+---------------------+-------------+----------+
答案 0 :(得分:3)
使用user variables跟踪sensorid
,temperature
和humidity
列的“最后”值(按sensor
和{datetime
排序整个表时{1}}),可以识别每个记录属于哪个“组”,然后在此基础上进行汇总:
SELECT sensorid, temperature, humidity,
MIN(datetime) dt_min, MAX(datetime) dt_max
FROM (
SELECT datetime,
@group := @group + IF(
@last_sensor <=> sensorid
AND @last_temp <=> temperature
AND @last_humidity <=> humidity
, 0, 1) gp,
@last_sensor := sensorid sensorid,
@last_temp := temperature temperature,
@last_humidity := humidity humidity
FROM my_table, (SELECT
@group := 0,
@last_sensor := NULL,
@last_temp := NULL,
@last_humidity := NULL
) init
ORDER BY sensorid, datetime
) t GROUP BY t.gp
在sqlfiddle上查看。
可以使用此查询执行反连接以从原始表中删除所有其他记录:
DELETE my_table.*
FROM my_table LEFT JOIN (
<above query>
) x
ON my_table.sensorid = x.sensorid
AND my_table.temperature = x.temperature
AND my_table.humidity = x.humidity
AND my_table.datetime IN (x.dt_min, x.dt_max)
WHERE x.sensorid IS NULL
在sqlfiddle上查看。
请注意,如果两个(相同的)读数来自相同datetime
的相同传感器,则不清楚应保留/删除哪些记录(特别是因为您注意到“对于特定的单个传感器,主键不是连续的“):因此上述查询将保留记录id = 835
。
答案 1 :(得分:1)
基本上你需要加入每条记录来检查它的上一个和下一个温度:
SELECT t.id,
t.sensorid,
t.temperature,
t.comment,
prev.id prev_id,
prev.temperature prev_temp,
next.id next_id,
next.temperature next_temp
FROM table1 t
LEFT JOIN table1 prev
ON prev.id = ( SELECT max(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id < t.id
)
LEFT JOIN table1 next
ON next.id = ( SELECT min(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id > t.id
)
ORDER BY t.sensorid, t.id
;
演示 - &gt; http://www.sqlfiddle.com/#!2/297ab/4
使用此查询,您可以获取需要删除的记录,并检查以下条件:
current-row-temperature = previous-temperature
AND
current-row-temperature = next-temperature
查询是:
SELECT t.id,
t.sensorid,
t.temperature,
t.comment,
prev.id prev_id,
prev.temperature prev_temp,
next.id next_id,
next.temperature next_temp
FROM table1 t
LEFT JOIN table1 prev
ON prev.id = ( SELECT max(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id < t.id
)
LEFT JOIN table1 next
ON next.id = ( SELECT min(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id > t.id
)
WHERE t.temperature = prev.temperature
AND t.temperature = next.temperature
ORDER BY t.sensorid, t.id
;
该查询可用作多表删除中的子查询:
DETELE table1 t1,
(
the above query
) x1
WHERE t1.id = x1.id
你也可以否定那个条件来只检索你想要保留的记录。
SELECT t.id,
t.sensorid,
t.temperature,
t.comment,
prev.id prev_id,
prev.temperature prev_temp,
next.id next_id,
next.temperature next_temp
FROM table1 t
LEFT JOIN table1 prev
ON prev.id = ( SELECT max(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id < t.id
)
LEFT JOIN table1 next
ON next.id = ( SELECT min(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id > t.id
)
WHERE t.temperature <> prev.temperature
OR t.temperature <> next.temperature
OR prev.temperature IS NULL
OR next.temperature IS NULL
ORDER BY t.sensorid, t.id
;
您可以使用此查询将所选记录复制到新表:
CREATE TABLE new_table AS
SELECT t.*
FROM table1 t
LEFT JOIN table1 prev
ON prev.id = ( SELECT max(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id < t.id
)
LEFT JOIN table1 next
ON next.id = ( SELECT min(id)
FROM table1 t1
WHERE t1.sensorid = t.sensorid
AND t1.id > t.id
)
WHERE t.temperature <> prev.temperature
OR t.temperature <> next.temperature
OR prev.temperature IS NULL
OR next.temperature IS NULL
ORDER BY t.sensorid, t.id
;