需要在5秒内使用hibernate在mysql中插入100000行

时间:2017-05-29 13:24:08

标签: java mysql hibernate jpa batch-insert

我试图使用Hibernate(JPA)在5秒内在MYSQL表中插入100,000行。我已经尝试过hibernate提供的每一个技巧,但仍然不能超过35秒。

第一次优化:我开始使用IDENTITY序列生成器,这导致插入60秒。我后来放弃了序列生成器并开始通过阅读@Id并使用MAX(id)自己分配字段来自行分配AtomicInteger.incrementAndGet()字段。这将插入时间缩短到35秒。

第二次优化:我通过添加

启用了批量插入

<prop key="hibernate.jdbc.batch_size">30</prop> <prop key="hibernate.order_inserts">true</prop> <prop key="hibernate.current_session_context_class">thread</prop> <prop key="hibernate.jdbc.batch_versioned_data">true</prop>

到配置。我很震惊地发现批量插入绝对没有减少插入时间。现在还有35秒!

现在,我正在考虑尝试使用多个线程插入。 任何人有任何指针?我应该选择MongoDB吗?

以下是我的配置: 1. Hibernate配置 `

<bean id="entityManagerFactoryBean" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
        <property name="dataSource" ref="dataSource" />
        <property name="packagesToScan" value="com.progresssoft.manishkr" />
        <property name="jpaVendorAdapter">
            <bean class="org.springframework.orm.jpa.vendor.HibernateJpaVendorAdapter" />
        </property>
        <property name="jpaProperties">
            <props>
                <prop key="hibernate.hbm2ddl.auto">${hibernate.hbm2ddl.auto}</prop>
                <prop key="hibernate.dialect">${hibernate.dialect}</prop>
                <prop key="hibernate.show_sql">${hibernate.show_sql}</prop>
                <prop key="hibernate.format_sql">${hibernate.format_sql}</prop>
                <prop key="hibernate.jdbc.batch_size">30</prop>
                <prop key="hibernate.order_inserts">true</prop>
                <prop key="hibernate.current_session_context_class">thread</prop>
                <prop key="hibernate.jdbc.batch_versioned_data">true</prop>
            </props>
        </property>
    </bean>

    <bean class="org.springframework.jdbc.datasource.DriverManagerDataSource"
          id="dataSource">
        <property name="driverClassName" value="${database.driver}"></property>
        <property name="url" value="${database.url}"></property>
        <property name="username" value="${database.username}"></property>
        <property name="password" value="${database.password}"></property>
    </bean>

    <bean id="transactionManager" class="org.springframework.orm.jpa.JpaTransactionManager">
        <property name="entityManagerFactory" ref="entityManagerFactoryBean" />
    </bean>



    <tx:annotation-driven transaction-manager="transactionManager" />

`

  1. 实体配置:
  2. `

    @Entity
    @Table(name = "myEntity")
    public class MyEntity {
    
        @Id
        private Integer id;
    
        @Column(name = "deal_id")
        private String dealId;
    
        ....
        ....
    
        @Temporal(TemporalType.TIMESTAMP)
        @Column(name = "timestamp")
        private Date timestamp;
    
        @Column(name = "amount")
        private BigDecimal amount;
    
        @OneToOne(cascade = CascadeType.ALL)
        @JoinColumn(name = "source_file")
        private MyFile sourceFile;
    
        public Deal(Integer id,String dealId, ....., Timestamp timestamp, BigDecimal amount, SourceFile sourceFile) {
            this.id = id;
            this.dealId = dealId;
            ...
            ...
            ...
            this.amount = amount;
            this.sourceFile = sourceFile;
        }
    
    
        public String getDealId() {
            return dealId;
        }
    
        public void setDealId(String dealId) {
            this.dealId = dealId;
        }
    
       ...
    
       ...
    
    
        ....
    
        public BigDecimal getAmount() {
            return amount;
        }
    
        public void setAmount(BigDecimal amount) {
            this.amount = amount;
        }
    
        ....
    
    
        public Integer getId() {
            return id;
        }
    
        public void setId(Integer id) {
            this.id = id;
        }
    

    `

    1. 持久代码(服务):
    2. `

      @Service
      @Transactional
      public class ServiceImpl implements MyService{
      
          @Autowired
          private MyDao dao;
      ....
      
      `void foo(){
              for(MyObject d : listOfObjects_100000){
                  dao.persist(d);
              }
      }
      

      ` 道类:

      `

      @Repository
      public class DaoImpl implements MyDao{
      
          @PersistenceContext
          private EntityManager em;
      
          public void persist(Deal deal){
              em.persist(deal);
          }
      }
      

      `

      日志: `

      DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:32.906 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:32.906 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:32.906 [http-nio-8080-exec-2] 
      

      ... ...

      DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.b.internal.AbstractBatchImpl - Reusing batch statement
      18:26:34.002 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - insert into deal (amount, deal_id, timestamp, from_currency, source_file, to_currency, id) values (?, ?, ?, ?, ?, ?, ?)
      18:26:34.002 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 27
      18:26:34.011 [http-nio-8080-exec-2] DEBUG org.hibernate.SQL - update deal_source_file set invalid_rows=?, source_file=?, valid_rows=? where id=?
      18:26:34.015 [http-nio-8080-exec-2] DEBUG o.h.e.j.batch.internal.BatchingBatch - Executing batch size: 1
      18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - committed JDBC Connection
      18:26:34.018 [http-nio-8080-exec-2] DEBUG o.h.e.t.i.jdbc.JdbcTransaction - re-enabling autocommit
      18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.orm.jpa.JpaTransactionManager - Closing JPA EntityManager [org.hibernate.jpa.internal.EntityManagerImpl@2354fb09] after transaction
      18:26:34.032 [http-nio-8080-exec-2] DEBUG o.s.o.jpa.EntityManagerFactoryUtils - Closing JPA EntityManager
      18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.internal.JdbcCoordinatorImpl - HHH000420: Closing un-released batch
      18:26:34.032 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Releasing JDBC connection
      18:26:34.033 [http-nio-8080-exec-2] DEBUG o.h.e.j.i.LogicalConnectionImpl - Released JDBC connection
      

4 个答案:

答案 0 :(得分:10)

在尝试了所有可能的解决方案之后,我终于找到了一个在5秒内插入100,000行的解决方案!

我尝试的事情:

1)使用AtomicInteger通过自己生成的ID替换hibernate /数据库的自动编码/生成ID

2)使用batch_size = 50

启用batch_inserts

3)在每个批处理调整之后刷新缓存&#39; persist()调用次数

4)多线程(没有尝试过这个)

最后有效的方法是使用本机多插入查询并在一个sql插入查询中插入1000行,而不是在每个实体上使用 persist()。为了插入100,000个实体,我创建了一个本地查询,如"INSERT into MyTable VALUES (x,x,x),(x,x,x).......(x,x,x)" [在一个sql插入查询中插入1000行]

现在插入100,000条记录大约需要3秒钟!所以瓶颈就是兽人本身!对于批量插入,唯一可行的是本机插入查询!

答案 1 :(得分:2)

  1. 您正在使用Spring来管理事务,但使用hibernate.current_session_context_class作为当前会话上下文来中断它。使用Spring管理您的交易时,请不要使用DriverManagerDataSource属性。去掉它。

  2. 不要使用flush使用像HikariCP这样的正确连接池。

  3. 在for循环中,您应定期clearEntityManager @Service @Transactional public class ServiceImpl implements MyService{ @Autowired private MyDao dao; @PersistenceContext private EntityManager em; void foo(){ int count = 0; for(MyObject d : listOfObjects_100000){ dao.persist(d); count++; if ( (count % 30) == 0) { em.flush(); em.clear(); } } } ,最好与批量相同。如果你没有一个持久化时间越来越长,因为当你这样做时,Hibernate检查第一级缓存是否有脏对象,对象越多,花费的时间就越多。使用10或100是可以接受的,但每次持续检查10000个对象将会造成损失。

  4. -

    return Socialite::driver('google')
        ->scopes() // For any extra scopes you need, see https://developers.google.com/identity/protocols/googlescopes for a full list; alternatively use constants shipped with Google's PHP Client Library
        ->with(["access_type" => "offline", "prompt" => "consent select_account"])
        ->redirect();
    

    有关更深入的说明,请参阅this blogthis blog

答案 2 :(得分:1)

另一个需要考虑的选择是StatelessSession

  

面向命令的API,用于对a执行批量操作   数据库中。

     

无状态会话也不实现第一级缓存   与任何二级缓存交互,也不实现   事务性后写或自动脏检查,也不做   操作级联到关联的实例。集合被忽略   无国籍的会议。通过无状态会话执行的操作   绕过Hibernate的事件模型和拦截器。无国籍会议   由于缺少a,很容易受到数据混叠效应的影响   第一级缓存。

     

对于某些类型的事务,可以执行无状态会话   比有状态会话快一点。

相关讨论: Using StatelessSession for Batch processing

答案 3 :(得分:-1)

UFF。 你可以做很多事情来提高速度。

1。)使用@DynamicInsert和@DynamicUpdate防止数据库插入非空列并更新更改的列。

2.。)尝试直接将列插入(不使用hibernate)到数据库中,看看hibernate是否真的是你的瓶颈。

3。)使用sessionfactory并且每次只提交您的交易。 100个插页。或者只打开和关闭一次事务,每100次插入就刷新一次数据。

4.使用ID生成策略&#34;序列&#34;让hibernate预先分配(通过参数allocationsize)ID。

5.使用缓存。

如果使用不当,某些可能的解决方案可能存在时序劣势。但是你有很多机会。