我编写了以下查询以查找日期范围内的重复时间戳,目的是删除那些ID较大的重复项。但是这个选择永远不会完成。
SELECT
*
FROM
data
WHERE
id NOT IN (SELECT
MIN(id)
FROM
data
WHERE
datapoint_name LIKE 'Temp%'
AND timestamp BETWEEN '2012-07-31' AND '2012-08-03'
group by timestamp , datapoint_name)
AND datapoint_name LIKE 'Temp%'
AND timestamp BETWEEN '2012-07-31' AND '2012-08-03';
我发现它很奇怪,因为单个组件运行速度非常快,并且没有那么多行。具体做法是:
换句话说,有14个重复,但NOT IN()操作似乎需要花费过多的时间。事实上,我从来没有耐心看看它是否会完成。我该怎么做才能加快速度呢?我做过一些根本错误的事吗?
答案 0 :(得分:1)
原因可能是正在为每个被比较的行重新运行子查询。尝试将子查询移动到from
并使用left join
:
SELECT d.*
FROM data d LEFT JOIN
(SELECT timestamp, datpoint_name, MIN(id) as minid
FROM data
WHERE datapoint_name LIKE 'Temp%' AND
timestamp BETWEEN '2012-07-31' AND '2012-08-03'
GROUP BY timestamp , datapoint_name
) dd
ON d.datapoint_name = dd.datapoint_name and
d.timestamp = dd.timestamp and
d.id = dd.minid
WHERE d.datapoint_name LIKE 'Temp%' AND
d.timestamp BETWEEN '2012-07-31' AND '2012-08-03' AND
dd.minid IS NULL;