MariaDB JDBC驱动程序与SQL Server无法有效地批量更新

时间:2018-05-22 17:30:10

标签: mysql sql-server jdbc mariadb

我在我的应用程序中对插入/更新/删除例程的性能进行了基准测试,我将其从SQL Server移植到MariaDB。

  • 使用i7 2.80GHz CPU + 16GB RAM的本地Win10工作站上的Java 1.8
  • JDBC org.mariadb.jdbc:mariadb-java-client:2.2.4
  • 10.2.12-MariaDB-log MariaDB Server on AWS

基准测试会触发50,000次插入,相同的更新和删除。

SQL Server通过net.sourceforge.jtds JDBC驱动程序在1秒内处理它们。

使用MariaDB-java-client驱动程序的MariaDB可以更快地执行插入操作,但更新(和删除)在3.5秒时会慢得多。

两个数据库中的模式是相同的,我假设因为MariaDB中的插入很快,这可能会排除索引问题或服务器配置错误。

我已经为JDBC连接字符串尝试了多种变体,以最快的速度结束:

  ?verifyServerCertificate=true\
  &useSSL=true\
  &requireSSL=true\
  &allowMultiQueries=true\
  &cachePrepStmts=true\
  &cacheResultSetMetadata=true\
  &cacheServerConfiguration=true\
  &elideSetAutoCommits=true\
  &maintainTimeStats=false\
  &prepStmtCacheSize=50000\
  &prepStmtCacheSqlLimit=204800\
  &rewriteBatchedStatements=false\
  &useBatchMultiSend=true\
  &useBatchMultiSendNumber=50000\
  &useBulkStmts=true\
  &useLocalSessionState=true\
  &useLocalTransactionState=true\
  &useServerPrepStmts=true

在所有情况下,mysql和mysql-connectorj的性能都比mariadb差。

我现在已经看了一个星期了,并且正在考虑使用我之前的问题How do I increase the speed of a large series of UPDATEs in mySQL vs SQL Server?

中建议的解决方法

以防万一可能是服务器配置错误,以下是我为关键变量所做的事情:

key_buffer_size                16MB
innodb_buffer_pool_size        24GB (mem 30GB)
innodb_log_file_size           134MB
innodb_log_buffer_size         8MB
innodb_flush_log_at_trx_commit 0
max_allowed_packet             16MB

我的50,000次写入只是少量数据 - 大约2MB。但是使用SQL语法,当它超过JDBC连接时,这大概是10倍 - 这是正确的吗?

这里是SQL和解释计划:

Describe `data`
+---------------+------------------+------+-----+---------------------+-------------------------------+
| Field         | Type             | Null | Key | Default             | Extra                         |
+---------------+------------------+------+-----+---------------------+-------------------------------+
| parentId      | int(10) unsigned | NO   | PRI | NULL                |                               |
| modifiedDate  | date             | NO   | PRI | NULL                |                               |
| valueDate     | date             | NO   | PRI | NULL                |                               |
| value         | float            | NO   |     | NULL                |                               |
| versionstamp  | int(10) unsigned | NO   |     | 1                   |                               |
| createdDate   | datetime         | YES  |     | current_timestamp() |                               |
| last_modified | datetime         | YES  |     | NULL                | on update current_timestamp() |
+---------------+------------------+------+-----+---------------------+-------------------------------+

INSERT INTO `data` (`value`, `parentId`, `modifiedDate`, `valueDate`) VALUES (4853.16314229298,52054,'20-Apr-18','28-Dec-18')

+------+-------------+-------+------+---------------+------+---------+------+------+-------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra |
+------+-------------+-------+------+---------------+------+---------+------+------+-------+
|    1 | INSERT      | data  | ALL  | NULL          | NULL | NULL    | NULL | NULL | NULL  |
+------+-------------+-------+------+---------------+------+---------+------+------+-------+



UPDATE `data` SET `value` = 4853.16314229298 WHERE `parentId` = 52054 AND `modifiedDate` = '20-Apr-18' AND `valueDate` = '28-Dec-18'

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | data  | range | PRIMARY       | PRIMARY | 10      | NULL |    1 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+


DELETE FROM `data` WHERE `parentId` = 52054 AND `modifiedDate` = '20-Apr-18' AND `valueDate` = '29-Jan-16'

+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | data  | range | PRIMARY       | PRIMARY | 10      | NULL |    1 | Using where |
+------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+

[UPDATE]

JDBC用法 - 这是一个简化的版本,所以请原谅任何严重的错误:

    final Connection connection = dataSource.getConnection();
    connection.setAutoCommit(false);
    try (PreparedStatement statement = connection.prepareStatement(
                 "UPDATE data SET value = ? " +
                         "WHERE parentId = ? " +
                         "AND modifiedDate = ? " +
                         "AND valueDate = ? ")) {
        // timeSeries is a list of 50,000 data points
        Arrays.stream(timeSeries)
                .forEach(ts -> {
            try {
                statement.setDouble(1, value);
                statement.setLong(2, parentId);
                statement.setDate(3, new java.sql.Date(
                        modifiedDate.getTime()));
                statement.setDate(4, new java.sql.Date(
                        valueDate.getTime()));
                statement.addBatch();
            } catch (SQLException e) {
                throw new RuntimeException(
                        "Bad batch statement handling", e);
            }
        });
        int[] results = statement.executeBatch();
        connection.commit();
    } catch (SQLException e) {
        connection.rollback();
        throw e;
    } finally {
        connection.close();
    }

我也有一些来自general_log的数据显示了传入的JDBC调用,它看起来很基本 - 一个'准​​备'调用设置语句,然后单独更新。

这让我感到惊讶 - 似乎没有批处理:

13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Query   set autocommit=0
13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Prepare UPDATE `data` SET `value` = ? WHERE `parentId` = ? AND `modifiedDate` = ? AND `valueDate` = ?
13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Execute UPDATE `data` SET `value` = ? WHERE `parentId` = ? AND `modifiedDate` = ? AND `valueDate` = ?
13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Execute UPDATE `data` SET `value` = ? WHERE `parentId` = ? AND `modifiedDate` = ? AND `valueDate` = ?
13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Execute UPDATE `data` SET `value` = ? WHERE `parentId` = ? AND `modifiedDate` = ? AND `valueDate` = ?
13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Execute UPDATE `data` SET `value` = ? WHERE `parentId` = ? AND `modifiedDate` = ? AND `valueDate` = ?
13/06/2018 15:09    service_user_t[service_user_t] @  [9.177.2.31]  75954   298206495   Execute UPDATE `data` SET `value` = ? WHERE `parentId` = ? AND `modifiedDate` = ? AND `valueDate` = ?
etc
etc

1 个答案:

答案 0 :(得分:0)

在批处理中的某些行之间添加“begin”和“commit”语句。 或者在批处理之前启动事务,然后提交。 这将比成千上万的个人陈述快得多。

如果你只做插入,rewriteBatchStatements = true应该大大加快它,没有事务。此外,您还可以将max_packet_size增加到1GB,这样可以进行更多批处理,也许您的整个批处理将转换为1个非常大的多插入。