Question

我最近阅读了很多关于从大量数据填充grails表的文章，但似乎已达到上限。我的代码如下：

class LoadingService {
    def sessionFactory
    def dataSource
    def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP

    def insertFile(fileName) {
        InputStream inputFile = getClass().classLoader.getResourceAsStream(fileName)
        def pCounter = 1
        def mCounter = 1
        Sql sql = new Sql(dataSource)
        inputFile.splitEachLine(/\n|\r|,/) { line -> 
            line.each { value ->
                if(value.equalsIgnoreCase('0') { 
                    pCounter++
                    return
                }
                sql.executeInsert("insert into Patient_MRNA (patient_id, mrna_id, value) values (${pCounter}, ${mCounter}, ${value.toFloat()})")
                pCounter++
            }
            if(mCounter % 100 == 0) {
                cleanUpGorm()
            }
            pCounter = 1
            mCounter++
        }
    }

    def cleanUpGorm() {
        session.currentSession.clear()
        propertyInstanceMap.get().clear()
    }
}

我已禁用二级缓存，我正在使用已分配的ID，我通过域明确处理这种多对多的关系，而不是hasMany和belongsTo。

在应用这些方法后，我的速度有了极大的提升，但过了一段时间后，插入速度减慢到几乎停止的程度，而开始时则为每分钟约623,000。

我是否应该注意其他一些内存泄漏，或者我是否只是在Grails中批量插入时达到了上限？

要明确的是，插入120万行需要大约2分钟，但随后它们开始变慢。

Answer 1

尝试批量插入，效率更高

def updateCounts = sql.withBatch { stmt ->
     stmt.addBatch("insert into TABLENAME ...")
     stmt.addBatch("insert into TABLENAME ...")
     stmt.addBatch("insert into TABLENAME ...")
     ...
 }

Answer 2

我在Grails的早期版本中与此斗争过。那时我只是简单地在适当的块中手动运行批处理，或者使用其他工具进行批量导入，例如Pentaho Data Integration（或其他ETL工具或DIY）。

在grails中插入10,000,000多行

2 个答案: