Question

我正在尝试加快从CSV插入1000条记录的过程。我有一个联系表，可以连接到联系电话表。

这是我相关的SQL结构：

Contact Table
+----+-----------+----------+------------------+------------+----------------+
| id | firstName | lastName |     primaryEmail | locationId | organizationId |
+----+-----------+----------+------------------+------------+----------------+
|  1 |      John |      Doe | jdoe@noemail.com |          1 |              1 |
+----+-----------+----------+------------------+------------+----------------+

Contact Phone Table
+----+-----------+--------------+---------+----------------+
| id | contactId |       number | primary | organizationId |
+----+-----------+--------------+---------+----------------+
|  1 |         1 | +15555555555 |       1 |              1 |
+----+-----------+--------------+---------+----------------+
|  2 |         1 | +11231231234 |       0 |              1 |
+----+-----------+--------------+---------+----------------+

如果电话和/或电子邮件尚不存在，则我需要使用设置为主电话的单个电话插入新联系人。联系人在CSV中不能包含多个电话号码，但可以在添加后手动对其进行更新。

这是我想到的MySQL存储过程

DELIMITER $$

CREATE PROCEDURE `save_bulk_contact`(IN last_name VARCHAR(128), IN first_name VARCHAR(128), IN email VARCHAR(320), IN location_id BIGINT, IN organization_id BIGINT, IN phone_number VARCHAR(15))
BEGIN

    DECLARE CheckExists INT;
    DECLARE insert_id BIGINT;

    SELECT COUNT(*) INTO CheckExists FROM contact
    LEFT JOIN contact_phone ON contact.id = contact_phone.contactId
    WHERE contact.organizationId = organization_id 
        AND contact.locationId = location_id
        AND ((`primaryEmail` <> '' AND `primaryEmail` = email) OR `number` = phone_number);

    IF (CheckExists = 0) THEN
        INSERT INTO contact
            (`lastName`, `firstName`, `primaryEmail`, `locationId`, `organizationId`)
        VALUE (last_name, first_name, email, location_id, organization_id);
        SET insert_id = LAST_INSERT_ID();

        INSERT INTO contact_phone
            (`contactId`, `number`, `type`, `primary`, `organizationId`)
        VALUE (insert_id, phone_number, 'CELL', 1, organization_id);
    END IF;

END$$

DELIMITER ;

我正在将此存储过程与Spring JDBC模板批处理更新一起使用。联系人CSV可以包含50,000+个联系人。我已经尝试了许多解决该问题的方法，但似乎没有一个很好。这是另一种尝试：Insert 1000s of records with relationship and ignore duplicates using JDBC & MySQL，但我没有收到任何答案。我使用Java重载方法处理了一个CSV文件，该文件包含100,000个联系人，而我的数据库中已经有大约5000个联系人，这花了将近3个小时。

大约30分钟前，我使用上述存储过程从Web应用程序开始了50,000个联系人的CSV上传。到目前为止，它已经增加了大约23,000个。

我该怎么做才能使此过程更有效率并更快地完成？

更新：我刚刚完成了50,000个插入操作，花了1.7个小时。

Answer 1

首先。如果尚未在两个表中的组织ID和位置ID中添加索引，请添加索引。将您的支票分成两个语句以使用内部联接并摆脱“ OR”

SELECT COUNT(*) INTO CheckExists FROM contact
INNER JOIN contact_phone ON contact.id = 
contact_phone.contactId
WHERE contact.organizationId = organization_id 
    AND contact.locationId = location_id
    AND ((`primaryEmail` <> '' AND `primaryEmail` = email);

SELECT COUNT(*) INTO CheckExistsTwo FROM contact
INNER JOIN contact_phone ON contact.id = 
contact_phone.contactId
WHERE contact.organizationId = organization_id 
    AND contact.locationId = location_id
    AND  `number` = phone_number;

IF (CheckExists = 0 OR CheckExistsTwo = 0)

加快MySQL插入存储过程

1 个答案: