从相关的两个表中查找重复项

时间:2014-02-26 05:13:28

标签: mysql

Table 1 : Contacts

id   | name
------------
1    | John
2    | Shawn
3    | Rachael 


Table 2 : emails

id | contact_id | email_addr
----------------------------
1  |     1      | j@gmail.com
2  |     2      | j@gmail.com
3  |     3      | r@gmail.com 

假设我在email_address上找到重复项,我应该得到结果

contact_id | name  | email_addr
---------------------------------
     1     | John  | j@gmail.com
     2     | Shawn | j@gmail.com

即我应该通过重复的电子邮件获得所有联系人。

我使用了以下查询

SELECT contact_id
  FROM email_address
 WHERE email_addr IN (SELECT S.email_addr
                        FROM contacts R
                       INNER JOIN email_addr S ON R.id = S.contact_id
                       GROUP BY email_addr
                      HAVING COUNT(S.contact_id) > 1
                     );

例如,此查询需要很长时间才能执行1000条记录。 请帮助优化查询。

5 个答案:

答案 0 :(得分:0)

您应该通过使用连接来避免使用IN,并且应该避免在子查询中使用连接:

SELECT A.contact_id, A.name, A.email_addr
  FROM email_address AS A
  JOIN (SELECT S.email_addr
          FROM email_addr
         GROUP BY email_addr
        HAVING COUNT(*) > 1
       ) AS C
    ON C.email_addr = A.email_addr;

答案 1 :(得分:0)

尝试这些索引

CREATE INDEX idx_email ON emails(email_addr,contact_id);

CREATE INDEX idx_id ON Contacts(id);

答案 2 :(得分:0)

此查询将返回电子邮件表中包含多个电子邮件的所有电子邮件

SELECT tbl2 . * FROM emails tbl1 LEFT JOIN emails tbl2 ON 
    tbl1.email_addr = tbl2.email_addr AND tbl1.id <> tbl2.contact_id 
    WHERE tbl2.id >0 GROUP BY contact_id    

答案 3 :(得分:0)

这更快:

select e.contact_id, c.name,e.email_addr from Contacts as c inner join emails as e on c.id=e.contact_id group by e.email_addr having count(e.email_addr)>1

答案 4 :(得分:0)

尝试以下查询:

SELECT a.contact_id FROM email_addr a, (SELECT S.email_addr FROM contacts R JOIN email_addr S ON R.id = S.contact_id GROUP BY email_addr HAVING COUNT(S.contact_id) > 1) b WHERE a.email_addr=b.email_addr;

注意:更好的结果,应将email_addr字段编入索引。