Question

我有一个超过100万行的目标表。每次我将得到5万行，其中可能包含多个重复的条目。因此，我决定将CSV数据存储到临时表中，然后通过比较两个表之间的行从temp_table到target_table ...

如果发现重复的条目，则将数据从temp_table追加到target_table否则插入表中……我在这里使用分区，因此ON DUPLICATE键更新在这里不起作用..在temp_table中我不使用任何键

我有两个看起来像下面的表

temp_table

Name |  Type
John |  Civil
John |  Mech

target_table

Name | Type
John | Civil

当我在查询下面运行时，我得到的是单行输出

 UPDATE target_table JOIN temp_table
 ON  temp_table.Name = target_table.Name
 SET target_table.Type = IF((LOCATE(temp_table.Type, target_table.Type) > 0) 
 target_table.Type,CONCAT(target_table.Type,',',temp_table.Type))


target_table

Name | Type
John | Civil

我希望输出如下所示

target_table

Name | Type
John | Civil, Mech

我能知道哪里出了问题吗？

Answer 1

您应该使用group_concat并在连接中使用子查询

 UPDATE target_table 
 JOIN (
    select name, group_concat(Type) grouped
    from temp_table 
    group by name
 ) t ON t.Name = target_table.Name
 SET target_table.Type = t.grouped

Answer 2

我怀疑（但不确定），希望知道的人会加入并纠正我，因为更新联接不会像选择那样创建笛卡尔乘积。作为尝试的证据

truncate table temp_table;
insert into temp_table values
( 'John' , 'mech' ),
( 'John' , 'abc'  );
truncate table target_table;
insert into target_table values
('john', 'civil', 9 );

UPDATE target_table JOIN temp_table
ON  temp_table.Name = target_table.Name
set target_table.type = (concat(target_table.Type,',',temp_table.Type));

select * from target_table;

+------+------------+------+
| Name | Type       | LOC  |
+------+------------+------+
| john | civil,mech |    9 |
+------+------------+------+
1 row in set (0.00 sec)

请注意，将忽略temp_table中的abc，而纯粹是偶然选择机械。

如果我们更改temp_table中的顺序

truncate table temp_table;
insert into temp_table values
( 'John' , 'abc' ),
( 'John' , 'mech'  );
truncate table target_table;
insert into target_table values
('john', 'civil', 9 );

UPDATE target_table JOIN temp_table
ON  temp_table.Name = target_table.Name
set target_table.type = (concat(target_table.Type,',',temp_table.Type));

select * from target_table;

我们得到

+------+-----------+------+
| Name | Type      | LOC  |
+------+-----------+------+
| john | civil,abc |    9 |
+------+-----------+------+
1 row in set (0.02 sec)

纯粹是偶然地选择abc。

在我看来，最安全的方法是逐行（即使用游标）。

使用JOIN的UPDATE仅将第一行值追加到另一个表，为什么？

2 个答案: