如何查找具有类似字符串部分的重复行

时间:2014-07-14 06:38:59

标签: mysql sql group-by duplicates

我在表格中有数千行。哪些行具有相似的关键字但可以归类为同一组。例如:

Table : Birds_Name
    +-------+---------------------+
    |ID     |Name                 |
    +-------+---------------------+
    |1      |Blue Peckwood        |
    +-------+---------------------+
    |2      |North Peckwood       |
    +-------+---------------------+
    |3      |Northern Peckwood    |
    +-------+---------------------+
    |4      |Northern Peckwood    |
    +-------+---------------------+
    |5      |Red Heron            |
    +-------+---------------------+
    |6      |Red Heron            |
    +-------+---------------------+

上表中应该有2组鸟类。他们是 Peckwook Heron

但在我运行mySQL后,我得到了:

SELECT *
FROM birds_name
WHERE name IN (
    SELECT name
    FROM birds_name
    GROUP BY name
    HAVING COUNT(*) > 1
)

运行查询后。这就是我所拥有的:

    +-------+---------------------+
    |3      |Northern Peckwood    |
    +-------+---------------------+
    |4      |Northern Peckwood    |
    +-------+---------------------+
    |5      |Red Heron            |
    +-------+---------------------+
    |6      |Red Heron            |
    +-------+---------------------+

实际上,我希望选择任何共享相似字符串的行(在这种情况下,它是 Peckwood 。所以它应该只有2个组 - Peckwood 苍鹭

有可能这样做吗?以及如何调整mysql代码来实现呢?

问候。

3 个答案:

答案 0 :(得分:2)

试试这个

SELECT SUBSTRING_INDEX(name,' ',-1),count(*)
FROM birds_name
GROUP BY SUBSTRING_INDEX(name,' ',-1) HAVING count(*)>0;
对于mysql中的SUBSTRING_INDEX函数,

Manual

答案 1 :(得分:0)

你能试试吗?

SELECT count(id),name
  FROM birds_name
 group by name
having count(id) >1

谢谢

SQL Fiddle

答案 2 :(得分:0)

我认为您可以使用MySQL String functions分隔这些字词,如下所示:

mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
        -> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
        -> 'mysql.com'

然后,在查询的GROUP BY子句中使用它。

更新:

这是我的SQLFiddle