查询以查找重复的时间戳MySQL

时间:2015-03-27 14:06:55

标签: mysql timestamp duplicate-removal

我编写了以下查询以查找日期范围内的重复时间戳,目的是删除那些ID较大的重复项。但是这个选择永远不会完成。

SELECT 
    *
FROM
    data
WHERE
id NOT IN (SELECT 
        MIN(id)
    FROM
        data
    WHERE
        datapoint_name LIKE 'Temp%'
            AND timestamp BETWEEN '2012-07-31' AND '2012-08-03'
    group by timestamp , datapoint_name)
 AND datapoint_name LIKE 'Temp%'
 AND timestamp BETWEEN '2012-07-31' AND '2012-08-03';

我发现它很奇怪,因为单个组件运行速度非常快,并且没有那么多行。具体做法是:

  • SELECT MIN(ID)... GROUP BY子查询返回476行.7秒。
  • 没有id的外部SELECT * NOT IN()在0.001秒内返回490行。

换句话说,有14个重复,但NOT IN()操作似乎需要花费过多的时间。事实上,我从来没有耐心看看它是否会完成。我该怎么做才能加快速度呢?我做过一些根本错误的事吗?

1 个答案:

答案 0 :(得分:1)

原因可能是正在为每个被比较的行重新运行子查询。尝试将子查询移动到from并使用left join

SELECT d.*
FROM data d LEFT JOIN
     (SELECT timestamp, datpoint_name, MIN(id) as minid
      FROM data
      WHERE datapoint_name LIKE 'Temp%' AND
            timestamp BETWEEN '2012-07-31' AND '2012-08-03'
     GROUP BY timestamp , datapoint_name
    ) dd
    ON d.datapoint_name = dd.datapoint_name and
       d.timestamp = dd.timestamp and
       d.id = dd.minid
WHERE d.datapoint_name LIKE 'Temp%' AND
      d.timestamp BETWEEN '2012-07-31' AND '2012-08-03' AND
      dd.minid IS NULL;