这是为了尝试在我的系统中识别潜在的重复用户
会创建两个条目:
my.name.totoro@shibly。 com 和my.name.totoro@shibly。 net
如何查找所有匹配项,唯一的区别是域名的最后一部分?
答案 0 :(得分:0)
这不是很有效但它可以完成工作:
SELECT *
FROM mytable
WHERE LEFT(email, LENGTH(email) - LOCATE('.', REVERSE(email))) IN
(
SELECT LEFT(email, LENGTH(email) - LOCATE('.', REVERSE(email))) AS common
FROM mytable
GROUP BY LEFT(email, LENGTH(email) - LOCATE('.', REVERSE(email)))
HAVING COUNT(*) > 1
)
答案 1 :(得分:0)
http://sqlfiddle.com/#!9/2a6fa/2
SELECT
u.email,
GROUP_CONCAT(c.email) clones
FROM users u
INNER JOIN users c
ON u.id <> c.id
AND SUBSTRING(u.email,1,LENGTH(u.email)-LENGTH(SUBSTRING_INDEX(u.email,'.',-1)))
= SUBSTRING(c.email,1,LENGTH(c.email)-LENGTH(SUBSTRING_INDEX(c.email,'.',-1)))
GROUP BY u.id
此查询返回的记录太多,例如.com = .net
以及.net = .com
:
rec1 = my.name.totoro@shibly.com my.name.totoro@shibly.net
rec2 = my.name.totoro@shibly.net my.name.totoro@shibly.com
但可以通过添加:
来解决WHERE u.email = "my.name.totoro@shibly.com"
将在需要时返回特定电子邮件的所有重复项。