我在地址表中搜索重复项,使用SOUNDEX查找重复项。这很好,它需要所有5个soundex列匹配才能分组
但是,我想在我的5个SOUNDEX列中任意3个匹配的GROUP。
这是我当前的查询:
SELECT `Address`.`id`,
SOUNDEX(`Address`.`address_company_name`) as soundex_address_company_name,
SOUNDEX(`Address`.`contact_name`) as soundex_contact_name,
SOUNDEX(`Address`.`street_address`) as soundex_street_address,
SOUNDEX(`Address`.`suburb`) as soundex_suburb,
SOUNDEX(`Address`.`city`) as soundex_city,
`Address`.`address_country_id`,
`Address`.`address_zone_id`,
`Address`.`postcode`,
COUNT(*)
FROM
`addresses` AS `Address`
WHERE
((`Address`.`address_company_name` IS NOT NULL)
OR (`Address`.`contact_name` IS NOT NULL))
GROUP BY
SOUNDEX(address_company_name),
SOUNDEX(contact_name),
SOUNDEX(street_address),
SOUNDEX(suburb),
SOUNDEX(city),
address_country_id,
address_zone_id,
postcode
HAVING
COUNT(*) > 1
我理解如何使用多个查询来执行此操作,即:遍历数据库中的每个地址,然后在数据库中重新查询与5列中的任意3列匹配的地址,但是我希望在更少的查询中执行此操作因为上面的查询执行得非常快。
我也明白,如果这可能,有些记录可能被分组多次,我不介意是否是这种情况但是我不确定这是否会在MySQL逻辑面前飞行?
答案 0 :(得分:0)
您可以尝试这样的事情
SELECT a.id, b.id id2, COUNT(*) no_matches
FROM
(
SELECT id,
column_id,
CASE column_id
WHEN 1 THEN SOUNDEX(address_company_name)
WHEN 2 THEN SOUNDEX(contact_name)
WHEN 3 THEN SOUNDEX(street_address)
WHEN 4 THEN SOUNDEX(suburb)
WHEN 5 THEN SOUNDEX(city)
END column_value
FROM addresses a CROSS JOIN
(
SELECT 1 column_id UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) i
WHERE address_company_name IS NOT NULL
OR contact_name IS NOT NULL
) a CROSS JOIN
(
SELECT id,
column_id,
CASE column_id
WHEN 1 THEN SOUNDEX(address_company_name)
WHEN 2 THEN SOUNDEX(contact_name)
WHEN 3 THEN SOUNDEX(street_address)
WHEN 4 THEN SOUNDEX(suburb)
WHEN 5 THEN SOUNDEX(city)
END column_value
FROM addresses a CROSS JOIN
(
SELECT 1 column_id UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5
) i
WHERE address_company_name IS NOT NULL
OR contact_name IS NOT NULL
) b
WHERE a.column_value = b.column_value
AND a.id < b.id
GROUP BY a.id, b.id
HAVING COUNT(*) > 2
示例输出:
| ID | ID2 | NO_MATCHES | |----|-----|------------| | 1 | 2 | 4 | | 4 | 5 | 3 |
这是 SQLFiddle 演示