Question

我正在尝试从csv文件中导入大量数据集~400MB，行数为900000。该文件包含两个关系表的信息。例如：

[ “primary_key”， “姓名”， “姓”， “电话”，work_id”， “work_name”]

我必须检查每一行是否存在插入主键或更新（如果需要），我还需要验证工作，因为新作品可以出现在此数据集中。

我的人员表有更多csv文件的列，所以我不能用mysqlimport替换这行。

关于如何使用它的任何想法？

Answer 1

Please read the documentation for LOAD DATA INFILE;它是加载数据的好选择，甚至是非常大的文件。引自Reference manual: Speed of insert statements：

从文本文件加载表格时，请使用LOAD DATA INFILE。这通常比使用INSERT语句快20倍

假设您的表的列数多于.csv文件的列数，那么您必须编写如下内容：

load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, ...) -- The matching column list goes here

请参阅my own question on the subject and its answer。

此外，如果您需要更快的插入，您可以：

在执行SET foreign_key_checks = 0;和/或

load data

在执行alter table yourTable disable keys;之前使用load data禁用表的索引，然后使用alter table yourTable enable keys;重建它们

未经测试： 如果您的.csv文件的列数多于表格，我认为您可以将文件中的“超出”列分配给temp变量：

load data local infile 'path/to/your/file.csv'
into table yourTable
fields terminated by ',' optionally enclosed by '"' lines terminated by '\n'
ignore 1 lines -- if it has column headers
(col1, col2, col3, @dummyVar1, @dummyVar2, col4) -- The '@dummyVarX` variables
                                                 -- are simply place-holders for
                                                 -- columns in the .csv file that
                                                 -- don't match the columns in 
                                                 -- your table

在MYSQL中导入大量数据集csv

1 个答案: