如何使用JpaRepository进行批量(多行)插入?

时间:2018-06-09 08:11:10

标签: hibernate spring-boot kotlin spring-data-jpa cockroachdb

当从服务层调用saveAllJpaRepository List<Entity>方法时,Hibernate的跟踪日志记录会显示每个实体发出的单个SQL语句。

我是否可以强制它进行批量插入(即多行)而无需手动摆弄EntityManger,事务等甚至原始SQL语句字符串?

对于多行插入,我的意思不仅仅是转换:

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

为:

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

但改为:

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

在PROD中,我使用的是CockroachDB,性能差异很大。

以下是重现问题的最小示例(为简单起见H2)。

./src/main/kotlin/ThingService.kt

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

执行命令

./gradlew bootRun

触发DB INSERT:

curl http://localhost:8080/test_trigger

日志输出:

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)

5 个答案:

答案 0 :(得分:31)

要使用Sring Boot和Spring Data JPA获得批量插入,您只需要两件事:

  1. 将选项spring.jpa.properties.hibernate.jdbc.batch_size设置为您需要的适当值(例如:20)。

  2. 使用您的仓库的saveAll()方法和准备插入的实体列表。

  3. 工作示例是here

    关于将insert语句转换为类似的内容:

    INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
    

    这在PostgreSQL中可用:您可以在jdbc连接字符串中将选项reWriteBatchedInserts设置为true:

    jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true
    

    然后jdbc驱动程序将执行this transformation

    有关批处理的其他信息,您可以找到here

    <强>已更新

    Kotlin的演示项目:sb-kotlin-batch-insert-demo

    <强>已更新

      

    Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.

答案 1 :(得分:6)

基础问题是SimpleJpaRepository中的以下代码:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

除了批量大小属性设置之外,还必须确保SimpleJpaRepository类调用是持久的而不是合并。有几种方法可以解决这个问题:使用不查询序列的@Id生成器,例如

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

或者强制持久性通过让您的实体实现Persistable并覆盖isNew()调用

来将记录视为新记录
@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

或覆盖save(List)并使用实体管理员致电persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

以上代码基于以下链接:

答案 2 :(得分:3)

您可以将Hibernate配置为批量DML。看看Spring Data JPA - concurrent Bulk inserts/updates。我认为答案的第2部分可以解决您的问题:

  

启用批处理DML语句启用批处理支持   这样可以减少到数据库的往返次数   插入/更新相同数量的记录。

     

从批处理INSERT和UPDATE语句引用:

     

hibernate.jdbc.batch_size = 50

     

hibernate.order_inserts = true

     

hibernate.order_updates = true

     

hibernate.jdbc.batch_versioned_data = true

更新:您必须在application.properties文件中以不同方式设置hibernate属性。它们位于名称空间下:spring.jpa.properties.*。示例可能如下所示:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

答案 3 :(得分:0)

所有提到的方法都可以使用,但是会很慢,特别是如果插入数据的源位于其他表中时。首先,即使使用batch_size>1,插入操作也将在多个SQL查询中执行。其次,如果源数据位于另一个表中,则需要使用其他查询来获取数据(在最坏的情况下,将所有数据加载到内存中),并将其转换为静态大容量插入。第三,对每个实体(即使启用了批处理)分别进行persist()调用,您将使用所有这些实体实例来膨胀实体管理器的一级缓存。

但是Hibernate还有另一个选择。如果您将Hibernate用作JPA提供程序,则可以回退到{Q3}}的HQL,它本身可以从另一个表中进行子选择。示例:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

这是否可行取决于您的ID生成策略。如果Entity.id由数据库生成(例如MySQL自动递增),它将成功执行。如果Entity.id是由您的代码生成的(对于UUID生成器尤其如此),它将因“不支持的ID生成方法”异常而失败。

但是,在后一种情况下,可以通过自定义SQL函数解决此问题。例如,在PostgreSQL中,我使用supports bulk inserts扩展名,该扩展名提供了uuid_generate_v4()函数,最后我在自定义对话框中注册了该函数:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

    public MyPostgresDialect() {
        registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
    }
}

然后我将此类注册为休眠对话框:

hibernate.dialect=MyPostgresDialect

最后,我可以在批量插入查询中使用此功能:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

最重要的是Hibernate生成的用于完成此操作的基础SQL,它只是一个查询:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]

答案 4 :(得分:-2)

首先:我添加了两个配置

  1. spring.jpa.properties.hibernate.jdbc.batch_size = 5000
  2. spring.datasource.url = jdbc:mysql://127.0.0.1:3306 / test?&reWriteBatchedInserts = true

然后  我使用方法saveAll()

userInfoRepository.saveAll(userInfoList);

但是日志显示如下:

Hibernate: insert into user_info (time_inst, time_upd, adress, age, education, login_name, login_pwd, phone, sex, user_name) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into user_info (time_inst, time_upd, adress, age, education, login_name, login_pwd, phone, sex, user_name) values (?, ?

它不起作用。