Question

我使用Groovy Sql.withBatch处理CSV文件并加载Postgres数据库中的所有数据。

这是我的方法：

def processCSV() {
    def logger = Logger.getLogger('groovy.sql')
    logger.level = Level.FINE
    logger.addHandler(new ConsoleHandler(level: Level.FINE))

    def fileName = "file.csv"
    def resource = this.getClass().getResource( '/csv/' + fileName )

    File file = new File(resource.path)

    String year = '2016'

    char separator = ','

    def lines = CSV
            .separator(separator)
            .skipLines(1)
            .quote(CSVParser.DEFAULT_QUOTE_CHARACTER)
            .escape(CSVParser.DEFAULT_ESCAPE_CHARACTER)
            .charset('UTF-8')
            .create()
            .reader(file)
            .readAll()

    def totalLines = lines.size()

    Sql sql = getDatabaseInstance()

    println("Delete existing rows for " + year + " if exists")
    String dQuery = "DELETE FROM table1 WHERE year = ?"
    sql.execute(dQuery, [year])

    def statement = 'INSERT INTO table1 (column1, column2, column3, coulmn4, year) VALUES (?, ?, ?, ?, ?)'

    println("Total lines in the CSV files: " + totalLines)

    def batches = []

    sql.withBatch(BATCH_SIZE, statement) { ps ->
        lines.each { fields ->
            String coulmn1 = fields[0]
            String coulmn2 = fields[1]
            String column3 = fields[2]
            String column4 = fields[3]

            def params = [column1, coulmn2, column3, column4, year]

            def batch = ['params': params, 'error': false]
            try {
                ps.addBatch(params)
            }
            catch (all) {
                batch['error'] = true
                throw all
            }

            batches << batch
        }
    }

    def recordsAddedInDB = sql.firstRow("SELECT count(*) FROM " + tableName + " WHERE year = ?", year)[0]

    sql.close()

    println("")
    println("Processed lines: " + line)
    println("Batches: " + batches.size())
    println("Batches in error: " + batches.findAll{ it.error }.size())
    println("Record in DB for " + year + ": " + recordsAddedInDB)
}

CSV文件中的行（排除标题行）为23758。此方法的输出如下：

Delete existing rows for 2016 if exists
Total lines in the CSV files: 23758
Processed lines: 23758
Batches: 23758
Batches in error: 0
Record in DB for 2016 year: 23580

如果我启用日志记录，BATCH_SIZE为500，我可以看到：

47次，句子＆＃34;用500个命令成功执行批次＆＃34;
1次句子＆＃34;用258个命令成功执行批处理＆＃34;

这意味着已经处理了23758个插入语句。

任何人都知道为什么数据库中的行数少于处理过的行数？

Answer 1

解决。 INSERT语句有一个子查询，当子查询没有返回值时，忽略INSERT语句。

Groovy Sql WithBatch在DB中丢失记录

1 个答案: