我正在尝试加快从CSV插入1000条记录的过程。我有一个联系表,可以连接到联系电话表。
这是我相关的SQL结构:
Contact Table
+----+-----------+----------+------------------+------------+----------------+
| id | firstName | lastName | primaryEmail | locationId | organizationId |
+----+-----------+----------+------------------+------------+----------------+
| 1 | John | Doe | jdoe@noemail.com | 1 | 1 |
+----+-----------+----------+------------------+------------+----------------+
Contact Phone Table
+----+-----------+--------------+---------+----------------+
| id | contactId | number | primary | organizationId |
+----+-----------+--------------+---------+----------------+
| 1 | 1 | +15555555555 | 1 | 1 |
+----+-----------+--------------+---------+----------------+
| 2 | 1 | +11231231234 | 0 | 1 |
+----+-----------+--------------+---------+----------------+
如果电话和/或电子邮件尚不存在,则我需要使用设置为主电话的单个电话插入新联系人。联系人在CSV中不能包含多个电话号码,但可以在添加后手动对其进行更新。
这是我想到的MySQL存储过程
DELIMITER $$
CREATE PROCEDURE `save_bulk_contact`(IN last_name VARCHAR(128), IN first_name VARCHAR(128), IN email VARCHAR(320), IN location_id BIGINT, IN organization_id BIGINT, IN phone_number VARCHAR(15))
BEGIN
DECLARE CheckExists INT;
DECLARE insert_id BIGINT;
SELECT COUNT(*) INTO CheckExists FROM contact
LEFT JOIN contact_phone ON contact.id = contact_phone.contactId
WHERE contact.organizationId = organization_id
AND contact.locationId = location_id
AND ((`primaryEmail` <> '' AND `primaryEmail` = email) OR `number` = phone_number);
IF (CheckExists = 0) THEN
INSERT INTO contact
(`lastName`, `firstName`, `primaryEmail`, `locationId`, `organizationId`)
VALUE (last_name, first_name, email, location_id, organization_id);
SET insert_id = LAST_INSERT_ID();
INSERT INTO contact_phone
(`contactId`, `number`, `type`, `primary`, `organizationId`)
VALUE (insert_id, phone_number, 'CELL', 1, organization_id);
END IF;
END$$
DELIMITER ;
我正在将此存储过程与Spring JDBC模板批处理更新一起使用。联系人CSV可以包含50,000+个联系人。我已经尝试了许多解决该问题的方法,但似乎没有一个很好。这是另一种尝试:Insert 1000s of records with relationship and ignore duplicates using JDBC & MySQL,但我没有收到任何答案。我使用Java重载方法处理了一个CSV文件,该文件包含100,000个联系人,而我的数据库中已经有大约5000个联系人,这花了将近3个小时。
大约30分钟前,我使用上述存储过程从Web应用程序开始了50,000个联系人的CSV上传。到目前为止,它已经增加了大约23,000个。
我该怎么做才能使此过程更有效率并更快地完成?
更新:我刚刚完成了50,000个插入操作,花了1.7个小时。
答案 0 :(得分:0)
首先。如果尚未在两个表中的组织ID和位置ID中添加索引,请添加索引。将您的支票分成两个语句以使用内部联接并摆脱“ OR”
SELECT COUNT(*) INTO CheckExists FROM contact
INNER JOIN contact_phone ON contact.id =
contact_phone.contactId
WHERE contact.organizationId = organization_id
AND contact.locationId = location_id
AND ((`primaryEmail` <> '' AND `primaryEmail` = email);
SELECT COUNT(*) INTO CheckExistsTwo FROM contact
INNER JOIN contact_phone ON contact.id =
contact_phone.contactId
WHERE contact.organizationId = organization_id
AND contact.locationId = location_id
AND `number` = phone_number;
IF (CheckExists = 0 OR CheckExistsTwo = 0)