Question

我正在努力提高我的Java应用程序的性能，我现在关注一个必须在mysql中插入大量数据的端点。

我使用普通JDBC和MariaDB Java客户端驱动程序：

try (PreparedStatement stmt = connection.prepareStatement(
            "INSERT INTO data (" +
                    "fId, valueDate, value, modifiedDate" +
                    ") VALUES (?,?,?,?)") {
    for (DataPoint dp : datapoints) {
        stmt.setLong(1, fId);
        stmt.setDate(2, new java.sql.Date(dp.getDate().getTime()));
        stmt.setDouble(3, dp.getValue());
        stmt.setDate(4, new java.sql.Date(modifiedDate.getTime()));
        stmt.addBatch();
    }        
    int[] results = statement.executeBatch();
}

从转储文件中填充新数据库，我知道max_allowed_packet很重要，我已将其设置为536,870,912字节。

在https://dev.mysql.com/doc/refman/5.7/en/insert-optimization.html中，它声明：

如果您同时从同一客户端插入多行，使用带有多个VALUES列表的INSERT语句来插入多个一次排。这速度要快得多（有些速度要快很多倍） case）比使用单独的单行INSERT语句。如果你是将数据添加到非空表，您可以调整 bulk_insert_buffer_size variable可以更快地插入数据。请参阅Section 5.1.7, “Server System Variables”。

在我的数据库上，这设置为8MB

我还阅读了key_buffer_size（目前设置为16MB）。

我担心这最后两个可能还不够。我可以对这个算法的JSON输入做一些粗略的计算，因为它看起来像这样：

[{"actualizationDate":null,"data":[{"date":"1999-12-31","value":0},
{"date":"2000-01-07","value":0},{"date":"2000-01-14","value":3144},
{"date":"2000-01-21","value":358},{"date":"2000-01-28","value":1049},
{"date":"2000-02-04","value":-231},{"date":"2000-02-11","value":-2367},
{"date":"2000-02-18","value":-2651},{"date":"2000-02-25","value":-
393},{"date":"2000-03-03","value":1725},{"date":"2000-03-10","value":-
896},{"date":"2000-03-17","value":2210},{"date":"2000-03-24","value":1782},

如果不是bulk_insert_buffer_size，看起来很容易超过为key_buffer_size配置的8MB。

但是MySQL文档只提到MyISAM引擎表，而我目前正在使用InnoDB表。

我可以设置一些测试，但如果有的话，最好知道它会如何破坏或降级。

[编辑]我有--rewriteBatchedStatements=true。实际上这是我的连接字符串：

jdbc:p6spy:mysql://myhost.com:3306/mydb\
    ?verifyServerCertificate=true\
    &useSSL=true\
    &requireSSL=true\
    &cachePrepStmts=true\
    &cacheResultSetMetadata=true\
    &cacheServerConfiguration=true\
    &elideSetAutoCommits=true\
    &maintainTimeStats=false\
    &prepStmtCacheSize=250\
    &prepStmtCacheSqlLimit=2048\
    &rewriteBatchedStatements=true\
    &useLocalSessionState=true\
    &useLocalTransactionState=true\
    &useServerPrepStmts=true

（来自https://github.com/brettwooldridge/HikariCP/wiki/MySQL-Configuration）

Answer 1

另一种方法是不时执行批处理。这样可以减少批量的大小，让您专注于更重要的问题。

int batchSize = 0;

for (DataPoint dp : datapoints) {
    stmt.setLong(1, fId);
    stmt.setDate(2, new java.sql.Date(dp.getDate().getTime()));
    stmt.setDouble(3, dp.getValue());
    stmt.setDate(4, new java.sql.Date(modifiedDate.getTime()));
    stmt.addBatch();

    //When limit reach, execute and reset the counter
    if(batchSize++ >= BATCH_LIMIT){
        statement.executeBatch();

        batchSize = 0;
    }
}        

// To execute the remaining items
if(batchSize > 0){
    statement.executeBatch();
}

我通常使用基于DAO实现的常量或参数来更加动态，但是一批10_000行是一个良好的开端。

private static final int BATCH_LIMIT = 10_000;

请注意，执行后无需清除批处理。即使Statement.executeBatch文档中未指定，但这也在JDBC specification 4.3

中

14批量更新
  14.1批量更新说明
  14.1.2成功执行

调用executeBatch方法会关闭调用Statement对象的当前结果集（如果有的话）   一旦executeBatch返回，语句的批处理将重置为空。

结果的管理有点复杂，但如果需要，您仍然可以连接结果。这可以在任何时候进行分析，因为不再需要ResultSet。

使用许多VALUES（），（），（）;

1 个答案: