我有两个名为TempTable和AnotherTable的表,它们具有以下定义的结构。另外,我给出了下表中的一些示例行内容。
TempTable定义
CREATE TABLE `TempTable` (
`ROWNUMBER` bigint(19) NOT NULL DEFAULT '0',
`email` text,
`someid` bigint(19) DEFAULT NULL,
`mappedid` bigint(19) DEFAULT NULL,
PRIMARY KEY (`ROWNUMBER`),
KEY `IDX_1` (`email`(100))
) ENGINE=InnoDB DEFAULT CHARSET=utf8
AnotherTable定义
CREATE TABLE `AnotherTable` (
`primaryid` bigint(19) NOT NULL DEFAULT '0',
`email` text,
PRIMARY KEY (`primaryid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> Select * from TempTable;
+-----------+----------------------+--------+----------+
| ROWNUMBER | email | someid | mappedid |
+-----------+----------------------+--------+----------+
| 1 | email1@somewhere.com | 101 | NULL |
| 2 | email1@somewhere.com | 102 | NULL |
| 3 | email1@somewhere.com | 103 | NULL |
| 4 | email1@somewhere.com | 104 | NULL |
| 5 | email2@somewhere.com | 105 | NULL |
| 6 | email2@somewhere.com | 106 | NULL |
| 7 | email2@somewhere.com | 107 | NULL |
| 8 | email3@somewhere.com | 108 | NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)
mysql> Select * from AnotherTable;
+-----------+----------------------+
| primaryid | email |
+-----------+----------------------+
| 201 | email1@somewhere.com |
| 202 | email1@somewhere.com |
| 203 | email1@somewhere.com |
| 204 | email2@somewhere.com |
+-----------+----------------------+
4 rows in set (0.00 sec)
这里,在TempTable中,列mappedid与AnotherTable上的primaryid相关。我的目标是根据与TempTable和AnotherTable匹配的电子邮件更新TempTable上的mappedid。我需要匹配 仅基于“电子邮件”字段。所以,我想要的结果有点如下:
mysql> Select * from TempTable;
+-----------+----------------------+--------+----------+
| ROWNUMBER | email | someid | mappedid |
+-----------+----------------------+--------+----------+
| 1 | email1@somewhere.com | 101 | 201 |
| 2 | email1@somewhere.com | 102 | 202 |
| 3 | email1@somewhere.com | 103 | 203 |
| 4 | email1@somewhere.com | 104 | NULL |
| 5 | email2@somewhere.com | 105 | 204 |
| 6 | email2@somewhere.com | 106 | NULL |
| 7 | email2@somewhere.com | 107 | NULL |
| 8 | email3@somewhere.com | 108 | NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)
这里,201,202,203,204只出现一次,其他未映射的应为空。 TempTable中不应该有任何重复的映射。
注意: 在现实世界中,我认为不建议在AnotherTable上执行选择查询,因为记录数量将以百万计。所以,我正在寻找一种替代/有效的方式 更新TempTable中的数据。 TempTable是一个临时表,欢迎临时表上的任意数量的操作。
mysql> update TempTable inner join AnotherTable
on TempTable.email= AnotherTable.email and TempTable.email!=''
set TempTable.mappedid=AnotherTable.primaryid
WHERE TempTable.mappedid is null;
查询OK,7行受影响(0.01秒) 匹配的行数:7已更改:7警告:0
mysql> Select * from TempTable;
+-----------+----------------------+--------+----------+
| ROWNUMBER | email | someid | mappedid |
+-----------+----------------------+--------+----------+
| 1 | email1@somewhere.com | 101 | 201 |
| 2 | email1@somewhere.com | 102 | 201 |
| 3 | email1@somewhere.com | 103 | 201 |
| 4 | email1@somewhere.com | 104 | 201 |
| 5 | email2@somewhere.com | 105 | 204 |
| 6 | email2@somewhere.com | 106 | 204 |
| 7 | email2@somewhere.com | 107 | 204 |
| 8 | email3@somewhere.com | 108 | NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)
我尝试使用内部联接进行上述更新查询。但它在TempTable上创建了重复的mappedid条目,如上所示。要删除冗余条目,我当前的选项是取消所有重复的条目,并根据电子邮件对AnotherTable进行选择。在删除冗余条目后,表格如下所示:
mysql> Select * from TempTable;
+-----------+----------------------+--------+----------+
| ROWNUMBER | email | someid | mappedid |
+-----------+----------------------+--------+----------+
| 1 | email1@somewhere.com | 101 | 201 |
| 2 | email1@somewhere.com | 102 | NULL |
| 3 | email1@somewhere.com | 103 | NULL |
| 4 | email1@somewhere.com | 104 | NULL |
| 5 | email2@somewhere.com | 105 | 204 |
| 6 | email2@somewhere.com | 106 | NULL |
| 7 | email2@somewhere.com | 107 | NULL |
| 8 | email3@somewhere.com | 108 | NULL |
+-----------+----------------------+--------+----------+
8 rows in set (0.00 sec)
mysql> Select * from AnotherTable;
+-----------+----------------------+
| primaryid | email |
+-----------+----------------------+
| 201 | email1@somewhere.com |
| 202 | email1@somewhere.com |
| 203 | email1@somewhere.com |
| 204 | email2@somewhere.com |
+-----------+----------------------+
4 rows in set (0.00 sec)
然后,我必须做一个“从AnotherTable中选择primaryid,其中email ='email1@somewhere.com'”然后根据ResultSet内容我必须更新TempTable中的mappedid。问题是因为我有2个重复的电子邮件(email1@somewhere.com和email2@somewhere.com),我需要查询AnotherTable 2次。但是,如果重复数量增加到100,那基本上意味着我必须查询已经是100次重表的AnotherTable(BTW电子邮件列将在AnotherTable中编入索引)。我知道这不是正确的解决方案。在处理大量记录时,你能帮助我提出一个有效的解决方案吗?
答案 0 :(得分:1)
事实是,email
列本身不足以正确加入您的表。此外,每个电子邮件都需要某种位置编号。
SET @n1 := 0, @g1 := NULL;
SET @n2 := 0, @g2 := NULL;
UPDATE temptable t JOIN
(
SELECT a.rownumber, b.primaryid
FROM
(
SELECT rownumber, email, @n1 := IF(@g1 = email, @n1 + 1, 1) rnum, @g1 := email
FROM temptable
ORDER BY email, rownumber
) a LEFT JOIN
(
SELECT primaryid, email, @n2 := IF(@g2 = email, @n2 + 1, 1) rnum, @g2 := email
FROM anothertable
ORDER BY email, primaryid
) b
ON a.email = b.email
AND a.rnum = b.rnum
WHERE b.primaryid IS NOT NULL
) s
ON t.rownumber = s.rownumber
SET t.mappedid = s.primaryid;
这是 SQLFiddle 演示