我们正在制作一个大约有1300万行的表格。我们的目标是在这个表中只查找一个餐厅(约300,000行)的重复项。我们的重复标准是相同的姓氏,同名的前2个字母,以及相同的电话或电子邮件。每个都是他们自己的专栏。我们现在的策略是为餐厅的所有行创建两个相同的临时表,然后按照上述条件加入它们,然后从第一个表返回id,名字,姓氏,电话和电子邮件。
SELECT
DISTINCT t1.id, t1.firstname, t1.lastname, t1.phone, t1.email
FROM
(
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
) AS t1
INNER JOIN
(
SELECT lmoc2.id, lmoc2.firstname, lmoc2.lastname, lmoc2.phone, lmoc2.email
FROM loyalty_member_opentable_customer lmoc2
WHERE lmoc2.opentable_restaurant_id=2296
AND lmoc2.lastname NOT LIKE '%Tour%'
) AS t2
ON STRCMP(t1.lastname,t2.lastname)=0
AND t1.id!=t2.id
AND STRCMP(LEFT(t1.firstname,2),LEFT(t2.firstname,2))=0
AND (STRCMP(t1.phone,t2.phone)=0 OR STRCMP(t1.email,t2.email)=0)
ORDER BY t1.lastname, t1.firstname
问题是此查询需要48小时才能运行。任何人都可以想到一种更有效的方式来运行它吗?我们需要所有重复项,以便餐厅可以按照他们认为合适的方式将它们组合在一起。
答案 0 :(得分:1)
为什么不简单地做
SELECT lmoc.lastname, lmoc.firstname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
GROUP BY lmoc.lastname, LEFT(lmoc.firstname, 2), lmoc.phone, lmoc.email
HAVING COUNT(*) > 1;
答案 1 :(得分:1)
此SQL将帮助您找到重复项
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
AND lmoc.lastname BETWEEN 'ha' AND 'i'
GROUP BY lmoc.opentable_restaurant_id, lmoc.id, LEFT(lmoc.firstname,2), lmoc.lastname, lmoc.phone, lmoc.email
HAVING COUNT(*) > 1
如果你有一个主键,你可以轻松保留最近的主键并清除旧主键,使用这个SQL
DELETE
lmoc.primary_id
FROM loyalty_member_opentable_customer lmoc
LEFT JOIN
(SELECT
MAX(lmoc.primary_id) AS id
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
AND lmoc.lastname BETWEEN 'ha' AND 'i'
GROUP BY lmoc.opentable_restaurant_id, lmoc.id, LEFT(lmoc.firstname,2), lmoc.lastname, lmoc.phone, lmoc.email
) nodup
ON adjuster.id = nodup.id
WHERE lmoc.opentable_restaurant_id=2296
AND lmoc.lastname NOT LIKE '%Tour%'
AND lmoc.lastname BETWEEN 'ha' AND 'i'
AND nodup.id IS NULL";
答案 2 :(得分:1)
您不是在创建临时表,而是使用子查询,而且行速为1300万行。使用您需要的所有数据创建一个真实的临时表(SELECT INTO
)。
这就是我的尝试:
/* Creating a temporary table */
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
INTO tempRestaurant
FROM loyalty_member_opentable_customer AS lmoc
WHERE
lmoc.opentable_restaurant_id=2296 AND
lmoc.lastname NOT LIKE '%Tour%'
/* Select duplicates */
SELECT * FROM
tempRestaurant AS t1
INNER JOIN tempRestaurant AS t2 ON
STRCMP(t1.lastname,t2.lastname)=0
AND t1.id!=t2.id
WHERE
STRCMP(LEFT(t1.firstname,2), LEFT(t2.firstname,2))=0 AND
( STRCMP(t1.phone,t2.phone)=0 OR STRCMP(t1.email,t2.email)=0 )