在MySQL中查找重复值

时间:2009-03-27 04:22:12

标签: mysql

我有一个包含varchar列的表,我想在此列中找到所有具有重复值的记录。我可以用什么来查找重复的最佳查询?

26 个答案:

答案 0 :(得分:1392)

使用SELECT条款执行GROUP BY。假设 name 是您要在其中找到重复项的列:

SELECT name, COUNT(*) c FROM table GROUP BY name HAVING c > 1;

这将返回第一列中 name 值的结果,以及该值在第二列中出现的次数。

答案 1 :(得分:210)

SELECT varchar_col
FROM table
GROUP BY varchar_col
HAVING COUNT(*) > 1;

答案 2 :(得分:155)

SELECT  *
FROM    mytable mto
WHERE   EXISTS
        (
        SELECT  1
        FROM    mytable mti
        WHERE   mti.varchar_column = mto.varchar_column
        LIMIT 1, 1
        )

此查询返回完整记录,而不仅仅是不同的varchar_column

此查询不使用COUNT(*)。如果有很多重复项,COUNT(*)代价很高,并且您不需要整个COUNT(*),则只需要知道是否有两行具有相同的值。

varchar_column上建立索引当然会大大加快此查询的速度。

答案 3 :(得分:124)

根据levik的答案来获取重复行的ID,如果你的服务器支持它,你可以做GROUP_CONCAT(这将返回逗号分隔的id列表)。

SELECT GROUP_CONCAT(id), name, COUNT(*) c FROM documents GROUP BY name HAVING c > 1;

答案 4 :(得分:12)

SELECT * 
FROM `dps` 
WHERE pid IN (SELECT pid FROM `dps` GROUP BY pid HAVING COUNT(pid)>1)

答案 5 :(得分:11)

假设您的表名为TableABC,而您想要的列是Col,而T1的主键是Key。

SELECT a.Key, b.Key, a.Col 
FROM TableABC a, TableABC b
WHERE a.Col = b.Col 
AND a.Key <> b.Key

这种方法优于上述答案的优势在于它提供了密钥。

答案 6 :(得分:9)

要查找Employee中名称列中有多少记录重复,下面的查询会很有用;

Select name from employee group by name having count(*)>1;

答案 7 :(得分:7)

我没有看到任何JOIN方法,它们在重复方面有很多用途。

这种方法可以为您提供实际的双倍结果。

SELECT t1.* FROM my_table as t1 
LEFT JOIN my_table as t2 
ON t1.name=t2.name and t1.id!=t2.id 
WHERE t2.id IS NOT NULL 
ORDER BY t1.name

答案 8 :(得分:7)

SELECT t.*,(select count(*) from city as tt
  where tt.name=t.name) as count
  FROM `city` as t
  where (
     select count(*) from city as tt
     where tt.name=t.name
  ) > 1 order by count desc

城市替换为您的表格。 将名称替换为您的字段名称

答案 9 :(得分:7)

我的最后一个问题包含了一些有用的答案 - 将group by,count&amp; GROUP_CONCAT。

SELECT GROUP_CONCAT(id), `magento_simple`, COUNT(*) c 
FROM product_variant 
GROUP BY `magento_simple` HAVING c > 1;

这提供了两个示例(逗号分隔)的ID,我需要的条形码以及重复的数量。

相应地更改表格和列。

答案 10 :(得分:5)

如果您需要检查重复的单列值,我看到上面的结果和查询将正常工作。例如电子邮件。

但是如果您需要检查更多列并想检查结果的组合,那么此查询将正常工作:

SELECT COUNT(CONCAT(name,email)) AS tot,
       name,
       email
FROM users
GROUP BY CONCAT(name,email)
HAVING tot>1 (This query will SHOW the USER list which ARE greater THAN 1
              AND also COUNT)

答案 11 :(得分:4)

进一步@maxyfc's answer,我需要找到带有重复值的所有行,因此我可以在MySQL Workbench中对其进行编辑:

const auth_url = "/connect/facebook";
<form className="signin__form" action={auth_url} method="post">
    <input type="hidden" name="scope" value="user_friends" />
    <button type="submit" className="signin__form-btn">
    SIGN IN WITH FACEBOOK
    </button>
</form>

答案 12 :(得分:3)

CREATE TABLE tbl_master
    (`id` int, `email` varchar(15));

INSERT INTO tbl_master
    (`id`, `email`) VALUES
    (1, 'test1@gmail.com'),
    (2, 'test2@gmail.com'),
    (3, 'test1@gmail.com'),
    (4, 'test2@gmail.com'),
    (5, 'test5@gmail.com');

QUERY : SELECT id, email FROM tbl_master
WHERE email IN (SELECT email FROM tbl_master GROUP BY email HAVING COUNT(id) > 1)

答案 13 :(得分:3)

以下内容将查找多次使用的所有product_id。您只能获得每个product_id的单个记录。

SELECT product_id FROM oc_product_reward GROUP BY product_id HAVING count( product_id ) >1

代码取自:http://chandreshrana.blogspot.in/2014/12/find-duplicate-records-based-on-any.html

答案 14 :(得分:3)

SELECT 
    t.*,
    (SELECT COUNT(*) FROM city AS tt WHERE tt.name=t.name) AS count 
FROM `city` AS t 
WHERE 
    (SELECT count(*) FROM city AS tt WHERE tt.name=t.name) > 1 ORDER BY count DESC

答案 15 :(得分:2)

SELECT DISTINCT a.email FROM `users` a LEFT JOIN `users` b ON a.email = b.email WHERE a.id != b.id;

答案 16 :(得分:2)

我更喜欢使用窗口函数(MySQL 8.0+)查找重复项,因为我可以看到整行:

WITH cte AS (
  SELECT *
    ,COUNT(*) OVER(PARTITION BY col_name) AS num_of_duplicates_group
    ,ROW_NUMBER() OVER(PARTITION BY col_name ORDER BY col_name2) AS pos_in_group
  FROM table
)
SELECT *
FROM cte
WHERE num_of_duplicates_group > 1;

DB Fiddle Demo

答案 17 :(得分:1)

要删除包含多个字段的重复行,首先将它们设置为为唯一不同的行指定的新唯一键,然后使用“group by”命令删除具有相同新唯一键的重复行:

Create TEMPORARY table tmp select concat(f1,f2) as cfs,t1.* from mytable as t1;
Create index x_tmp_cfs on tmp(cfs);
Create table unduptable select f1,f2,... from tmp group by cfs;

答案 18 :(得分:1)

Select column_name, column_name1,column_name2, count(1) as temp from table_name group by column_name having temp > 1

答案 19 :(得分:1)

一个非常晚的贡献......万一它可以帮助任何人下线...我有一个任务是在银行应用中找到匹配的交易对(实际上是账户到账户转账的两面),到确定哪些是&#39;来自&#39;和&#39;到&#39;对于每个账户间转账交易,我们最终得到了这个:

SELECT 
    LEAST(primaryid, secondaryid) AS transactionid1,
    GREATEST(primaryid, secondaryid) AS transactionid2
FROM (
    SELECT table1.transactionid AS primaryid, 
        table2.transactionid AS secondaryid
    FROM financial_transactions table1
    INNER JOIN financial_transactions table2 
    ON table1.accountid = table2.accountid
    AND table1.transactionid <> table2.transactionid 
    AND table1.transactiondate = table2.transactiondate
    AND table1.sourceref = table2.destinationref
    AND table1.amount = (0 - table2.amount)
) AS DuplicateResultsTable
GROUP BY transactionid1
ORDER BY transactionid1;

结果是DuplicateResultsTable提供了包含匹配(即重复)事务的行,但它在第二次匹配同一对时也提供了相同的事务ID,因此外部{ {1}}可以按第一个交易ID进行分组,使用SELECTLEAST来确保两个交易代码在结果中始终保持相同的顺序,这使得第一个GREATEST安全,从而消除了所有重复的匹配。超过近百万条记录,并在不到2秒的时间内确定了12,000多场比赛。当然,transactionid是主要索引,这确实有帮助。

答案 20 :(得分:1)

SELECT ColumnA, COUNT( * )
FROM Table
GROUP BY ColumnA
HAVING COUNT( * ) > 1

答案 21 :(得分:1)

如果要删除重复项,请使用DISTINCT

否则使用此查询:

SELECT users.*,COUNT(user_ID) as user FROM users GROUP BY user_name HAVING user > 1;

答案 22 :(得分:1)

作为 Levik 答案的一个变体,它允许您同时找到重复结果的 id,我使用了以下内容:

SELECT * FROM table1 WHERE column1 IN (SELECT column1 AS duplicate_value FROM table1 GROUP BY column1 HAVING COUNT(*) > 1)

答案 23 :(得分:0)

尝试使用此查询:

SELECT name, COUNT(*) value_count FROM company_master GROUP BY name HAVING value_count > 1;

答案 24 :(得分:0)

要获取所有包含重复项的数据,我使用了以下方法:

SELECT * FROM TableName INNER JOIN(
  SELECT DupliactedData FROM TableName GROUP BY DupliactedData HAVING COUNT(DupliactedData) > 1 order by DupliactedData)
  temp ON TableName.DupliactedData = temp.DupliactedData;

TableName =您正在使用的表。

DupliactedData =您要查找的重复数据。

答案 25 :(得分:0)

我对此有所改进:

SELECT 
    col, 
    COUNT(col)
FROM
    table_name
GROUP BY col
HAVING COUNT(col) > 1;