优化SQL重复搜索

时间:2014-07-22 14:51:51

标签: mysql

我们正在制作一个大约有1300万行的表格。我们的目标是在这个表中只查找一个餐厅(约300,000行)的重复项。我们的重复标准是相同的姓氏,同名的前2个字母,以及相同的电话或电子邮件。每个都是他们自己的专栏。我们现在的策略是为餐厅的所有行创建两个相同的临时表,然后按照上述条件加入它们,然后从第一个表返回id,名字,姓氏,电话和电子邮件。

SELECT 
    DISTINCT t1.id, t1.firstname, t1.lastname, t1.phone, t1.email
FROM 
(
    SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
    FROM loyalty_member_opentable_customer lmoc
    WHERE lmoc.opentable_restaurant_id=2296 
      AND lmoc.lastname NOT LIKE '%Tour%' 
) AS t1
INNER JOIN 
(
    SELECT lmoc2.id, lmoc2.firstname, lmoc2.lastname, lmoc2.phone, lmoc2.email
    FROM loyalty_member_opentable_customer lmoc2
    WHERE lmoc2.opentable_restaurant_id=2296 
      AND lmoc2.lastname NOT LIKE '%Tour%' 
) AS t2 
   ON STRCMP(t1.lastname,t2.lastname)=0 
  AND t1.id!=t2.id 
  AND STRCMP(LEFT(t1.firstname,2),LEFT(t2.firstname,2))=0 
  AND (STRCMP(t1.phone,t2.phone)=0 OR STRCMP(t1.email,t2.email)=0)
ORDER BY t1.lastname, t1.firstname

问题是此查询需要48小时才能运行。任何人都可以想到一种更有效的方式来运行它吗?我们需要所有重复项,以便餐厅可以按照他们认为合适的方式将它们组合在一起。

3 个答案:

答案 0 :(得分:1)

为什么不简单地做

SELECT lmoc.lastname, lmoc.firstname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296 
  AND lmoc.lastname NOT LIKE '%Tour%'
GROUP BY lmoc.lastname, LEFT(lmoc.firstname, 2), lmoc.phone, lmoc.email
HAVING COUNT(*) > 1;

答案 1 :(得分:1)

此SQL将帮助您找到重复项

SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
FROM loyalty_member_opentable_customer lmoc
WHERE lmoc.opentable_restaurant_id=2296 
  AND lmoc.lastname NOT LIKE '%Tour%' 
  AND lmoc.lastname BETWEEN 'ha' AND 'i'
GROUP BY lmoc.opentable_restaurant_id, lmoc.id, LEFT(lmoc.firstname,2), lmoc.lastname, lmoc.phone, lmoc.email
HAVING COUNT(*) > 1    

如果你有一个主键,你可以轻松保留最近的主键并清除旧主键,使用这个SQL

DELETE 
        lmoc.primary_id
FROM loyalty_member_opentable_customer lmoc
LEFT JOIN
    (SELECT 
        MAX(lmoc.primary_id) AS id
    FROM loyalty_member_opentable_customer lmoc
    WHERE lmoc.opentable_restaurant_id=2296 
        AND lmoc.lastname NOT LIKE '%Tour%' 
        AND lmoc.lastname BETWEEN 'ha' AND 'i'
    GROUP BY lmoc.opentable_restaurant_id, lmoc.id, LEFT(lmoc.firstname,2), lmoc.lastname, lmoc.phone, lmoc.email
    ) nodup 
    ON adjuster.id = nodup.id
WHERE lmoc.opentable_restaurant_id=2296 
        AND lmoc.lastname NOT LIKE '%Tour%' 
        AND lmoc.lastname BETWEEN 'ha' AND 'i'
        AND nodup.id IS NULL";

答案 2 :(得分:1)

您不是在创建临时表,而是使用子查询,而且行速为1300万行。使用您需要的所有数据创建一个真实的临时表(SELECT INTO)。

这就是我的尝试:

/* Creating a temporary table */
SELECT lmoc.id, lmoc.firstname, lmoc.lastname, lmoc.phone, lmoc.email
INTO tempRestaurant
FROM loyalty_member_opentable_customer AS lmoc
WHERE
  lmoc.opentable_restaurant_id=2296 AND
  lmoc.lastname NOT LIKE '%Tour%' 

/* Select duplicates */
SELECT * FROM 
  tempRestaurant AS t1 
INNER JOIN tempRestaurant AS t2 ON 
  STRCMP(t1.lastname,t2.lastname)=0 
  AND t1.id!=t2.id 
WHERE
  STRCMP(LEFT(t1.firstname,2), LEFT(t2.firstname,2))=0 AND
  ( STRCMP(t1.phone,t2.phone)=0 OR STRCMP(t1.email,t2.email)=0 )