mySQL找到相似但不完全相同的记录

时间:2016-05-24 13:42:19

标签: mysql

所有,我发现我的用户输入的客户名称都错了。以下是他们如何输入客户名称的示例。我猜他们认为他们需要为这个人拥有的每个住所都有一个帐户。我也有类似的条目,但假的中间首字母在姓氏之前。如果我想提取一份共享姓名和电子邮件的客户列表,我将如何处理这个问题?我已经使用了一个查询,我将在我的示例数据下面包含它,但它在我的示例数据中缺少结果。相反,它返回我希望它返回的其他重复项,而不是像下面1,2那样的记录。

示例:

ID | first Name | last Name | email          | Residence     |
---+------------+-----------+----------------+---------------+
1  | Bill A     | Bob       | bill@bob.com   | 1-2 broad st  |
2  | Bill B     | Bob       | bill@bob.com   | 1-3 broad st  |
3  | Fred       | Jones     | f.jones@me.com | 1 example st  |
4  | Fred       | Jones     | f.jones@me.com | 200 South ave |
5  | Alex       | Man       | Manley@grt.com | 25 N Main st  |
6  | Alex       | Man       | Manley@grt.com | 39 Front st   |

查询:

SELECT C.ID, R.Customer_ID , C.orgName, C.fName, C.lName, C.email, R.hNumber, R.street, R.aNumber, R.city
FROM Customer C
LEFT JOIN Residence R ON C.ID = R.Customer_ID
JOIN (
    SELECT X.fName, X.lName
    FROM Customer X 
    GROUP BY X.fName, X.lName 
    HAVING COUNT(*) > 1
) X ON X.fName = C.fName AND X.lName = C.lName
ORDER BY C.fName, C.lName

2 个答案:

答案 0 :(得分:0)

我不认为没有...每种方式都可能涉及手动识别已使用的模式并对其进行修改,例如使用大型语句...这不是“自动” “

最接近的是使用soundex来判断它们听起来是否相同...... http://dev.mysql.com/doc/refman/5.7/en/string-functions.html#function_soundex

如果您可以使用其他编程语言,那么我建议使用类似...... http://php.net/manual/en/function.similar-text.php的内容,但它会很重要

答案 1 :(得分:0)

你可以使用(至少对于mysql)

SELECT C.ID, R.Customer_ID , C.orgName, C.fName, C.lName, C.email,
       R.hNumber, R.street, R.aNumber, R.city
FROM Customer C
LEFT JOIN Residence R ON C.ID = R.Customer_ID
JOIN Customer C1 on C.ID <> C1.id
LEFT JOIN Residence R1 ON C1.ID = R1.Customer_ID
where 
      C1.fName = C.fName AND C1.lName = C.lName
   or C1.email = C.email
   or <whatever else you like to compare, eg. same adress + same lastname>
group by C.ID

或者,更一般地说,

SELECT C.ID, R.Customer_ID , C.orgName, C.fName, C.lName, C.email,
       R.hNumber, R.street, R.aNumber, R.city
FROM Customer C
LEFT JOIN Residence R ON C.ID = R.Customer_ID
where exists (
   select * from 
   Customer C1 
   LEFT JOIN Residence R1 ON C1.ID = R1.Customer_ID
   where 
      C.ID <> C1.id          
      and (
            C1.fName = C.fName AND C1.lName = C.lName
            or C1.email = C.email
            or <whatever else you like to compare, eg. same adress + same lastname>
          )
 )  

当然,这只会给你一个有限的重复检查,特别是如果有人故意试图绕过这个(例如在商店系统中,但有工具和程序来帮助你)。