Question

所以我在excel文件中有大约1000万条记录，必须以特定的方式解析（我不能只转换为CSV并像这样插入）并插入到mysql数据库的不同表中。我已经把它从整晚拿走了几个小时。但是我想进一步减少这一点。任何人都有任何可以帮助我的技巧或提示？我正在使用Java和JDBC来解析和连接。

Answer 1

Mysql允许你load from a file。也许你应该做的是：读取10000条记录并创建一个文件。在开始阅读下一条10000条记录时，开始并行运行load data infile。

所以这应该让你更接近快速解决方案：

并行化读取和加载
使用批量数据加载工具

Answer 2

查看使用executeBatch并执行1000左右的块。这将有很大帮助。

Answer 3

一个想法......

在mysql中创建一个临时（临时）数据库，其中一个名为excel_staging的表与excel文件的结构相匹配 - 对此表使用myisam引擎。

使用load data infile将excel文件（保存为csv）加载到excel_staging表中 - 不应该花费超过几分钟来填充，特别是因为它是myisam。

truncate table excel_staging;

load data infile 'excel_staging.csv'
into table excel_staging
fields terminated by...
lines terminated by..
(
field1,
field2,
...
);

将大量select into输出到outfile语句中，这些语句将excel_staging表中的数据提取到您将用于加载到各个 innodb 生产数据库表中的各个csv文件中。如果有必要，您可以在这一点上非常有创意 - 您甚至可能需要加载额外的数据来支持连接等，这样您就可以生成格式良好的csv输出。

select distinct cust_id, name into outfile 'customers.csv' 
fields termniated by...
lines terminated by...
from
 excel_staging
order by
 cust_id; -- order for innodb import

select distinct dept_id, name into outfile 'departments.csv' 
fields termniated by...
lines terminated by...
from
 excel_staging
order by
 dept_id;

使用加载数据infile将主键csv文件格式正确，已清理并按顺序加载到生产的innodb表中...

load data infile 'customers.csv'
into table customers
fields terminated by...
lines terminated by..
(
cust_id,
name
);

...

排除编写解决方案的时间（比如说30分钟）应该能够加载到分段，输出到csv并加载到生产表中大约2分钟...端口到端口。

希望这有帮助。

Answer 4

确保在插入时禁用外键检查（仅影响InnoDB），速度非常快。当然，当你完成后重新启用外键。

Answer 5

一些JDBC性能提示，将连接对象的autoCommit设置为false。但请确保在大量插入（每100K或更多）之后提交。另外，在普通的Statement对象上使用和重用PreparedStatement对象。

解析并在SQL中插入1000万条记录

5 个答案: