如何查找多列匹配的重复项

时间:2018-01-10 22:40:59

标签: mysql

我希望在X,Y和Z匹配的表格中找到dublicates,以便最终清理由时间戳标识的旧版本。

+------------+-----+----+-----+
| Created    | X   | Y  | Z   | 
+------------+-----+----+-----+
| 1515622543 | 334 | 72 | 269 |
| 1515622544 | 334 | 72 | 270 | 
| 1515622601 | 334 | 72 | 268 | 
| 1515622953 | 334 | 72 | 268 | 
+------------+-----+----+-----+

在这个例子中,X=334, Y=72, Z=268有一个dublicate。我想列出它们 - 所以它最终看起来像:

+------------+-----+----+-----+
| 1515622601 | 334 | 72 | 268 | 
| 1515622953 | 334 | 72 | 268 | 
+------------+-----+----+-----+

已经尝试过:

select count(distinct X), count(distinct Y), count(distinct Z) from decayworld; - 只计算并且不显示所有3(X,Y,Z)匹配的位置。

SELECT X, Y, Z, COUNT(*) FROM decayworld GROUP BY X, Y, Z HAVING COUNT(*) > 1; +-----+----+-----+----------+ | X | Y | Z | COUNT(*) | +-----+----+-----+----------+ | 334 | 72 | 268 | 2 | +-----+----+-----+----------+ - 它会对结果进行计数,但不会列出结果。

5 个答案:

答案 0 :(得分:0)

夫妻俩:

使用count()

即表示您已接近答案

您应该group您的结果,或选择不同以获得重复。然后使用having子句仅过滤dupes

完成后,请加入您的表格以获取ID

不要将Table用作表名,而是保留字

#test for a temp table for your dupes
select x,y,z, count(*) from mytable group by x, y, z having count(*) >1;

#one type of solution to find your IDs
select mytable.id, dupes.* from 
(select x,y,z, count(*) from mytable group by x, y, z having count(*) >1) dupes
left join mytable on mytable.x = dupes.x and mytable.y = dupes.y and mytable.z = dupes.z
;
Rextester中的

Sample

你可以得到不同的输出,无论你喜欢搞乱上面的那种

答案 1 :(得分:0)

您需要使用GROUP BY来获取匹配列的计数。

在这种情况下,查询将类似于:

SELECT X, Y, Z, COUNT(*)
FROM decayworld
GROUP BY X, Y, Z;

这将为所有具有相同X,Y,Z值及其计数值的行提供。要获得每行的最小ID,您可以执行以下操作:

SELECT X, Y, Z, COUNT(*), MIN(Created)
FROM decayworld
GROUP BY X, Y, Z;

请记住,这仅适用于仅包含两行的重复项。要删除重复项,可以选择具有MAX ID的所有行并删除其余行。希望有道理。

答案 2 :(得分:0)

http://sqlfiddle.com/#!9/f85e0f/3

查询只是为了获得MIN(已创建)

SELECT MIN(created)
FROM `events`
GROUP BY X, Y, Z
HAVING COUNT(created)>1

如果你想删除它们:

DELETE e FROM `events` e
JOIN (SELECT MIN(created) to_delete
FROM `events`
GROUP BY X, Y, Z
HAVING COUNT(created)>1) d
ON e.created = d.to_delete

答案 3 :(得分:0)

作为测试以查看将被删除的内容:

select not_keep_rows.*
from your_table as not_keep_rows
   inner join (
      select MIN(created) as min_date
      from your_table
      group by x,y,z
      having count(*) > 1
   ) as keep_rows on keep_rows.min_date = not_keep_rows.created

删除的实际执行

delete not_keep_rows.*
from your_table as not_keep_rows
   inner join (
      select MIN(created) as min_date
      from your_table
      group by x,y,z
      having count(*) > 1
   ) as keep_rows on keep_rows.min_date = not_keep_rows.created

答案 4 :(得分:0)

鉴于OP想要删除旧记录的所有(在原始问题被提出后在评论中规定),此查询会给出该结果。此解决方案假设列id是唯一的,并从最旧到最新升序排序:

DELETE FROM mytable WHERE NOT EXISTS (
    SELECT * FROM (
        SELECT MAX(id) AS id FROM mytable GROUP BY x, y, z
    ) AS keepers
    WHERE keepers.id = mytable.id
);

Rextester link