查找具有重复/类似列值MySQL的行

时间:2011-02-08 17:41:15

标签: mysql select duplicates

我想从下表中选择fname列中具有相似值的所有行作为其顺序中的第一行。 IOW从这个表中我想检索带有ids 2,5和7的行(因为“ anna ”出现在“ anna ”之后,而“ michaela “和” michaal “来自” michael “)。

+----+------------+----------+
| id | fname      | lname    |
+----+------------+----------+
|  1 | anna       | milski   |
|  2 |  anna      | nguyen   |
|  3 | michael    | michaels |
|  4 | james      | bond     |
|  5 | michaela   | king     |
|  6 | bruce      | smart    |
|  7 | michaal    | hardy    |
+----+------------+----------+

到目前为止我所拥有的是:

select *, count(fname) cnt 
from users group by soundex(fname) 
having count(soundex(fname)) > 1;

但是因为我正在对它进行分组,结果是

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  1 | anna     | milski   |   2 |
|  3 | michael  | michaels |   3 |
+----+----------+----------+-----+

我想要检索的是:

+----+----------+----------+-----+
| id | fname    | lname    | cnt |
+----+----------+----------+-----+
|  2 |  anna    | nyugen   |   2 |
|  5 | michaela | king     |   3 |
|  7 | michaal  | hardy    |   3 |
+----+----------+----------+-----+

我应该如何更改查询?我尝试删除“分组依据”,但它改变了结果(我可能错了,没有广泛测试)。

2 个答案:

答案 0 :(得分:2)

我已经重新阅读了您的初步问题,我提出了以下解决方案:

SELECT *
FROM   users
WHERE  id IN
       (SELECT id
       FROM    users t4
               INNER JOIN
                       (SELECT  soundex(fname) AS snd,
                                COUNT(*)       AS cnt
                       FROM     users          AS t5
                       GROUP BY snd
                       HAVING   cnt > 1
                       )
                       AS t6
               ON      soundex(t4.fname)=snd
       )
AND    id NOT IN
       (SELECT  MIN(t2.id) AS wanted
       FROM     users t2
                INNER JOIN
                         (SELECT  soundex(fname) AS snd,
                                  COUNT(*)       AS cnt
                         FROM     users          AS t1
                         GROUP BY snd
                         HAVING   cnt > 1
                         )
                         AS t3
                ON       soundex(t2.fname)=snd
       GROUP BY snd
       );

它有点过于复杂,但它可以正常运行并提供您所要求的内容:)

答案 1 :(得分:0)

你似乎得到了你所要求的东西 - SOUNDEX(fname)只能从名字中取出Soundex,而不是整个字符串。您可以调查的一些选项:

SELECT *, COUNT(SOUNDEX(CONCAT(fname, lname))) AS cnt
GROUP BY SOUNDEX(CONCAT(fname, lname))
HAVING cnt > 1;

SELECT *, COUNT(SOUNDEX(fname)) AS cnt1, COUNT(SOUNDEX(lname)) AS cnt2
GROUP BY SOUNDEX(fname), SOUNDEX(lname)
HAVING cnt1 > 1 OR cnt2 > 1

这取决于你想要达到的目的:计算类似的名字,姓氏或两者的一些合成哈希。