Question

我需要从一个数据库中读取数百万行并将其写入另一个数据库。我想使用PreparedStatement.addBatch来进行大批量写入（可能是1000行）。我不需要他们参与交易。我在Scala 2.9.2中编写代码。

执行此操作的一种方法如下：

val sourceResultSet = ...
val targetStatement = targetConnection.prepareStatement(...)
var rowCount = 0
while (sourceResultSet.next()) {
  // Read values from sourceResultSet and write them to targetStatement
  targetStatement.addBatch()
  rowCount += 1
  if (rowCount % 1000 == 0) {
    targetStatement.executeBatch()
    rowCount = 0
  }
}

如何在不使用var rowCount的情况下以更实用的方式执行此操作？我还需要考虑RAM的使用情况;我正在阅读数百万行，因此任何涉及内存中所有源行的解决方案都将失败。

Answer 1

sourceResultSet的类型是什么？我根据你的用法假设一个Iterator / Stream，但不管怎样，你可以使用Scala集合的take一次抓取1000个元素（这适用于Lists，Sets，Iterators，Streams等）。要在功能上做更多（虽然只是副作用，所以不是纯函数），定义内联函数：

def processSource(sourceResultSet: Iterator): Unit = {
  if(sourceResultSet.hasNext) {
    sourceResultSet.take(1000).foreach(row => /* Add to batch */)
    targetStatement.executeBatch()
    processResult(sourceResultSet) // How you handle the recursion depends on what sourceResultSet is
  }
}

val sourceResultSet = ...
val targetStatement = targetConnection.prepareStatement(...)
processSource(sourceResultSet)

只要sourceResultSet是惰性的（Stream或Iterator），这将避免一次将其加载到内存中。

在函数式Scala代码中使用JDBC PreparedStatement.addBatch

1 个答案: