更新问题使用join进行查询

时间:2013-11-28 07:44:01

标签: mysql performance join

我有两个名为TempTable和AnotherTable的表,它们具有以下定义的结构。另外,我给出了下表中的一些示例行内容。

TempTable定义

CREATE TABLE `TempTable` (
  `ROWNUMBER` bigint(19) NOT NULL DEFAULT '0',
  `email` text,
  `someid` bigint(19) DEFAULT NULL,
  `mappedid` bigint(19) DEFAULT NULL,
  PRIMARY KEY (`ROWNUMBER`),
  KEY `IDX_1` (`email`(100))
) ENGINE=InnoDB DEFAULT CHARSET=utf8    

AnotherTable定义

CREATE TABLE `AnotherTable` (
  `primaryid` bigint(19) NOT NULL DEFAULT '0',
  `email` text,
  PRIMARY KEY (`primaryid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

mysql> Select * from TempTable;

+-----------+----------------------+--------+----------+
| ROWNUMBER | email                | someid | mappedid |
+-----------+----------------------+--------+----------+
|         1 | email1@somewhere.com |    101 |     NULL |
|         2 | email1@somewhere.com |    102 |     NULL |
|         3 | email1@somewhere.com |    103 |     NULL |
|         4 | email1@somewhere.com |    104 |     NULL |
|         5 | email2@somewhere.com |    105 |     NULL |
|         6 | email2@somewhere.com |    106 |     NULL |
|         7 | email2@somewhere.com |    107 |     NULL |
|         8 | email3@somewhere.com |    108 |     NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)

mysql> Select * from AnotherTable;

+-----------+----------------------+
| primaryid | email                |
+-----------+----------------------+
|       201 | email1@somewhere.com |
|       202 | email1@somewhere.com |
|       203 | email1@somewhere.com |
|       204 | email2@somewhere.com |
+-----------+----------------------+
4 rows in set (0.00 sec)

这里,在TempTable中,列mappedid与AnotherTable上的primaryid相关。我的目标是根据与TempTable和AnotherTable匹配的电子邮件更新TempTable上的mappedid。我需要匹配 仅基于“电子邮件”字段。所以,我想要的结果有点如下:

mysql> Select * from TempTable;

+-----------+----------------------+--------+----------+
| ROWNUMBER | email                | someid | mappedid |
+-----------+----------------------+--------+----------+
|         1 | email1@somewhere.com |    101 |     201  |
|         2 | email1@somewhere.com |    102 |     202  |
|         3 | email1@somewhere.com |    103 |     203  |
|         4 | email1@somewhere.com |    104 |     NULL |
|         5 | email2@somewhere.com |    105 |     204  |
|         6 | email2@somewhere.com |    106 |     NULL |
|         7 | email2@somewhere.com |    107 |     NULL |
|         8 | email3@somewhere.com |    108 |     NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)

这里,201,202,203,204只出现一次,其他未映射的应为空。 TempTable中不应该有任何重复的映射。

注意: 在现实世界中,我认为不建议在AnotherTable上执行选择查询,因为记录数量将以百万计。所以,我正在寻找一种替代/有效的方式 更新TempTable中的数据。 TempTable是一个临时表,欢迎临时表上的任意数量的操作。

mysql> update TempTable inner join AnotherTable on TempTable.email= AnotherTable.email and TempTable.email!='' set TempTable.mappedid=AnotherTable.primaryid WHERE TempTable.mappedid is null;

查询OK,7行受影响(0.01秒) 匹配的行数:7已更改:7警告:0

mysql> Select * from TempTable;

+-----------+----------------------+--------+----------+
| ROWNUMBER | email                | someid | mappedid |
+-----------+----------------------+--------+----------+
|         1 | email1@somewhere.com |    101 |      201 |
|         2 | email1@somewhere.com |    102 |      201 |
|         3 | email1@somewhere.com |    103 |      201 |
|         4 | email1@somewhere.com |    104 |      201 |
|         5 | email2@somewhere.com |    105 |      204 |
|         6 | email2@somewhere.com |    106 |      204 |
|         7 | email2@somewhere.com |    107 |      204 |
|         8 | email3@somewhere.com |    108 |     NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)

我尝试使用内部联接进行上述更新查询。但它在TempTable上创建了重复的mappedid条目,如上所示。要删除冗余条目,我当前的选项是取消所有重复的条目,并根据电子邮件对AnotherTable进行选择。在删除冗余条目后,表格如下所示:

mysql> Select * from TempTable;

+-----------+----------------------+--------+----------+
| ROWNUMBER | email                | someid | mappedid |
+-----------+----------------------+--------+----------+
|         1 | email1@somewhere.com |    101 |      201 |
|         2 | email1@somewhere.com |    102 |     NULL |
|         3 | email1@somewhere.com |    103 |     NULL |
|         4 | email1@somewhere.com |    104 |     NULL |
|         5 | email2@somewhere.com |    105 |      204 |
|         6 | email2@somewhere.com |    106 |     NULL |
|         7 | email2@somewhere.com |    107 |     NULL |
|         8 | email3@somewhere.com |    108 |     NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)

mysql> Select * from AnotherTable;

+-----------+----------------------+
| primaryid | email                |
+-----------+----------------------+
|       201 | email1@somewhere.com |
|       202 | email1@somewhere.com |
|       203 | email1@somewhere.com |
|       204 | email2@somewhere.com |
+-----------+----------------------+
4 rows in set (0.00 sec)

然后,我必须做一个“从AnotherTable中选择primaryid,其中email ='email1@somewhere.com'”然后根据ResultSet内容我必须更新TempTable中的mappedid。问题是因为我有2个重复的电子邮件(email1@somewhere.com和email2@somewhere.com),我需要查询AnotherTable 2次。但是,如果重复数量增加到100,那基本上意味着我必须查询已经是100次重表的AnotherTable(BTW电子邮件列将在AnotherTable中编入索引)。我知道这不是正确的解决方案。在处理大量记录时,你能帮助我提出一个有效的解决方案吗?

1 个答案:

答案 0 :(得分:1)

事实是,email列本身不足以正确加入您的表。此外,每个电子邮件都需要某种位置编号。

SET @n1 := 0, @g1 := NULL;
SET @n2 := 0, @g2 := NULL;

UPDATE temptable t JOIN
(
  SELECT a.rownumber, b.primaryid
    FROM
  (
    SELECT rownumber, email, @n1 := IF(@g1 = email, @n1 + 1, 1) rnum, @g1 := email
      FROM temptable
     ORDER BY email, rownumber
  ) a LEFT JOIN
  (
    SELECT primaryid, email, @n2 := IF(@g2 = email, @n2 + 1, 1) rnum, @g2 := email
      FROM anothertable
     ORDER BY email, primaryid
  ) b 
      ON a.email = b.email 
     AND a.rnum = b.rnum
   WHERE b.primaryid IS NOT NULL
) s 
    ON t.rownumber = s.rownumber
   SET t.mappedid = s.primaryid;

这是 SQLFiddle 演示