我使用名为OpenHAB的家庭自动化软件,该软件将数据写入如下表格:
http://sqlfiddle.com/#!9/5e35e4/1
+---------------------+-------+
| time | value |
+---------------------+-------+
| 2016-10-31 22:00:00 | 11.1 |
| 2016-10-31 22:07:08 | 10.8 |
| 2016-10-31 22:20:02 | 10.8 |
| 2016-10-31 22:30:28 | 10.8 |
| 2016-10-31 22:39:29 | 10.8 |
| 2016-10-31 22:44:04 | 10.8 |
| 2016-10-31 22:49:02 | 10.5 |
| 2016-10-31 23:00:00 | 10.5 |
| 2016-10-31 23:42:02 | 10 |
| 2016-11-01 00:00:00 | 10 |
| 2016-11-01 00:30:02 | 9.5 |
| 2016-11-01 01:00:00 | 9.5 |
| 2016-11-01 01:11:02 | 9.3 |
| 2016-11-01 01:22:02 | 9.1 |
我现在正在努力清理这些值,因为从我开始使用OpenHAB开始并且没有正确设置日志记录系统时,有许多重复项(100k +)。
如果值(可以是double或varchar类型)在几个连续的行中没有变化,则应删除除第一行和最后一行之外的每一行。鉴于上面的例子,最佳输出将如下所示:
+---------------------+-------+
| time | value |
+---------------------+-------+
| 2016-10-31 22:00:00 | 11.1 |
| 2016-10-31 22:07:08 | 10.8 | <-- only here
| 2016-10-31 22:44:04 | 10.8 |
| 2016-10-31 22:49:02 | 10.5 |
| 2016-10-31 23:00:00 | 10.5 |
| 2016-10-31 23:42:02 | 10 |
| 2016-11-01 00:00:00 | 10 |
| 2016-11-01 00:30:02 | 9.5 |
| 2016-11-01 01:00:00 | 9.5 |
| 2016-11-01 01:11:02 | 9.3 |
| 2016-11-01 01:22:02 | 9.1 |
答案 0 :(得分:0)
如果我正确理解了您的问题,那么您只对重复项中的第一个和最后一个值感兴趣。
我认为这个查询应该通过在按值分组后省略给定值的MAX或MIN时间内的所有内容来解决问题:
DELETE FROM Item67
WHERE time NOT IN (SELECT max(time) FROM item67 GROUP BY value)
AND time NOT IN (SELECT min(time) FROM item67 GROUP BY value);
答案 1 :(得分:0)
我想我有一个可能的查询。我还在评估它是否正常运行:
http://sqlfiddle.com/#!9/5e35e4/43
SELECT t1.time, t1.value
FROM Item67 AS t1
WHERE t1.value = (SELECT t2.value
FROM Item67 AS t2
WHERE t1.time > t2.time
ORDER BY t2.time
DESC LIMIT 1)
and t1.value = (SELECT t3.value
FROM Item67 AS t3
WHERE t1.time < t3.time
ORDER BY t3.time
LIMIT 1
);