我想从下表中选择fname列中具有相似值的所有行作为其顺序中的第一行。 IOW从这个表中我想检索带有ids 2,5和7的行(因为“ anna ”出现在“ anna ”之后,而“ michaela “和” michaal “来自” michael “)。
+----+------------+----------+
| id | fname | lname |
+----+------------+----------+
| 1 | anna | milski |
| 2 | anna | nguyen |
| 3 | michael | michaels |
| 4 | james | bond |
| 5 | michaela | king |
| 6 | bruce | smart |
| 7 | michaal | hardy |
+----+------------+----------+
到目前为止我所拥有的是:
select *, count(fname) cnt
from users group by soundex(fname)
having count(soundex(fname)) > 1;
但是因为我正在对它进行分组,结果是
+----+----------+----------+-----+
| id | fname | lname | cnt |
+----+----------+----------+-----+
| 1 | anna | milski | 2 |
| 3 | michael | michaels | 3 |
+----+----------+----------+-----+
我想要检索的是:
+----+----------+----------+-----+
| id | fname | lname | cnt |
+----+----------+----------+-----+
| 2 | anna | nyugen | 2 |
| 5 | michaela | king | 3 |
| 7 | michaal | hardy | 3 |
+----+----------+----------+-----+
我应该如何更改查询?我尝试删除“分组依据”,但它改变了结果(我可能错了,没有广泛测试)。
答案 0 :(得分:2)
我已经重新阅读了您的初步问题,我提出了以下解决方案:
SELECT *
FROM users
WHERE id IN
(SELECT id
FROM users t4
INNER JOIN
(SELECT soundex(fname) AS snd,
COUNT(*) AS cnt
FROM users AS t5
GROUP BY snd
HAVING cnt > 1
)
AS t6
ON soundex(t4.fname)=snd
)
AND id NOT IN
(SELECT MIN(t2.id) AS wanted
FROM users t2
INNER JOIN
(SELECT soundex(fname) AS snd,
COUNT(*) AS cnt
FROM users AS t1
GROUP BY snd
HAVING cnt > 1
)
AS t3
ON soundex(t2.fname)=snd
GROUP BY snd
);
它有点过于复杂,但它可以正常运行并提供您所要求的内容:)
答案 1 :(得分:0)
你似乎得到了你所要求的东西 - SOUNDEX(fname)
只能从名字中取出Soundex,而不是整个字符串。您可以调查的一些选项:
SELECT *, COUNT(SOUNDEX(CONCAT(fname, lname))) AS cnt GROUP BY SOUNDEX(CONCAT(fname, lname)) HAVING cnt > 1;
或
SELECT *, COUNT(SOUNDEX(fname)) AS cnt1, COUNT(SOUNDEX(lname)) AS cnt2
GROUP BY SOUNDEX(fname), SOUNDEX(lname)
HAVING cnt1 > 1 OR cnt2 > 1
这取决于你想要达到的目的:计算类似的名字,姓氏或两者的一些合成哈希。