我目前正面临众所周知且常见的Hibernate插入批处理问题。
我需要保存500万行的批次。我首先尝试使用更轻的有效载荷。由于我必须只插入两种类型的实体(首先是A类型的所有记录,然后是B类型的所有记录,都指向常见类型C ManyToOne
父级),我想从JDBC批处理中获取最大优势插入
我已经阅读了很多文档,但我没有尝试过。
AUTO_INCREMENT
ID,并设置了ID诀窍:SELECT MAX(ID) FROM ENTITIES
,并且每次都会增加。hibernate.jdbc.batch_size
设置为与我的应用程序的批量大小一致,因此我将其设置为LocalSessionFactoryBean
(Spring ORM集成)< / LI>
这是我的实体
共同的父实体。这将首先插入到单个事务中。我不关心自动增量列此处。每批作业只有一个记录
@Entity
@Table(...)
@SequenceGenerator(...)
public class Deal
{
@Id
@Column(
name = "DEAL_ID",
nullable = false)
@GeneratedValue(
strategy = GenerationType.AUTO)
protected Long id;
................
}
其中一个孩子(让我们说每批2.5M记录)
@Entity
@Table(
name = "TA_LOANS")
public class Loan
{
@Id
@Column(
name = "LOAN_ID",
nullable = false)
protected Long id;
@ManyToOne(
optional = false,
targetEntity = Deal.class,
fetch = FetchType.LAZY)
@JoinColumn(
name = "DEAL_ID",
nullable = false)
protected Deal deal;
.............
}
其他孩子的类型。让我们说其他2.5M记录
@Entity
@Table(
name = "TA_BONDS")
public class Bond
{
@Id
@Column(
name = "BOND_ID")
@ManyToOne(
fetch = FetchType.LAZY,
optional = false,
targetEntity = Deal.class)
@JoinColumn(
name = "DEAL_ID",
nullable = false,
updatable = false)
protected Deal deal;
}
插入记录的简化代码
long loanIdCounter = loanDao.getMaxId(), bondIdCounter = bondDao.getMaxId(); //Perform SELECT MAX(ID)
Deal deal = null;
List<Bond> bondList = new ArrayList<Bond>(COMMIT_BATCH_SIZE); //500 constant value
List<Loan> loanList = new ArrayList<Loan>(COMMIT_BATCH_SIZE);
for (String msg: inputStreamReader)
{
log.debug(msg.toString());
if (this is a deal)
{
Deal deal = parseDeal(msg.getMessage());
deal = dealManager.persist(holder.deal); //Called in a separate transaction using Spring annotation @Transaction(REQUIRES_NEW)
}
else if (this is a loan)
{
Loan loan = parseLoan(msg.getMessage());
loan.setId(++loanIdCounter);
loan.setDeal(deal);
loanList.add(loan);
if (loanList.size() == COMMIT_BATCH_SIZE)
{
loanManager.bulkInsert(loanList); //Perform a bulk insert in a single transaction, not annotated but handled manually this time
loanList.clear();
}
}
else if (this is a bond)
{
Bond bond = parseBond(msg.getMessage());
bond.setId(++bondIdCounter);
bond.setDeal(deal);
bondList.add(bond);
if (bondList.size() == COMMIT_BATCH_SIZE) //As above
{
bondManager.bulkInsert(bondList);
bondList.clear();
}
}
}
if (!bondList.isEmpty())
bondManager.bulkInsert(bondList);
if (!loanList.isEmpty())
loanManager.bulkInsert(loanList);
//Flush remaining items, not important
bulkInsert
的实施:
@Override
public void bulkInsert(Collection<Bond> bonds)
{
// StatelessSession session = sessionFactory.openStatelessSession();
Session session = sessionFactory.openSession();
try
{
Transaction t = session.beginTransaction();
try
{
for (Bond bond : bonds)
// session.persist(bond);
// session.insert(bond);
session.save(bond);
}
catch (RuntimeException ex)
{
t.rollback();
}
finally
{
t.commit();
}
}
finally
{
session.close();
}
}
正如您从评论中看到的那样,我尝试了几种有状态/无状态session
的组合。没有工作。
我的dataSource
是ComboPooledDataSource
,其中包含以下网址
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true" />
我的SessionFactory
<b:bean id="sessionFactory" class="class.that.extends.org.springframework.orm.hibernate3.LocalSessionFactoryBean" lazy-init="false" depends-on="dataSource">
<b:property name="dataSource" ref="phoenixDataSource" />
<b:property name="hibernateProperties">
<b:props>
<b:prop key="hibernate.dialect">${hibernate.dialect}</b:prop> <!-- MySQL5InnoDb-->
<b:prop key="hibernate.show_sql">${hibernate.showSQL}</b:prop>
<b:prop key="hibernate.jdbc.batch_size">500</b:prop>
<b:prop key="hibernate.jdbc.use_scrollable_resultset">false</b:prop>
<b:prop key="hibernate.cache.use_second_level_cache">false</b:prop>
<b:prop key="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</b:prop>
<b:prop key="hibernate.cache.use_query_cache">false</b:prop>
<b:prop key="hibernate.validator.apply_to_ddl">false</b:prop>
<b:prop key="hibernate.validator.autoregister_listeners">false</b:prop>
<b:prop key="hibernate.order_inserts">true</b:prop>
<b:prop key="hibernate.order_updates">true</b:prop>
</b:props>
</b:property>
</b:bean>
即使我的项目范围的类扩展LocalSessionFactoryBean
,也不会覆盖其方法(只添加一些项目范围的方法)
@Autowire
我的类)。我的所有尝试只产生了许多单独的INSERT
语句
我错过了什么?
答案 0 :(得分:18)
您的查询可能正在被重写,但您不会通过查看Hibernate SQL日志来了解它。 Hibernate不会重写insert语句--MySQL驱动程序会重写它们。换句话说,Hibernate会向驱动程序发送多个insert语句,然后驱动程序将重写它们。所以Hibernate日志只显示SQL Hibernate发送给驱动程序的内容,而不是驱动程序发送给数据库的SQL。
您可以通过在连接网址中启用MySQL的profileSQL参数来验证这一点:
<b:property name="jdbcUrl" value="jdbc:mysql://server:3306/db?autoReconnect=true&rewriteBatchedStatements=true&profileSQL=true" />
使用与您类似的示例,这就是我的输出:
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
insert into Person (firstName, lastName, id) values (?, ?, ?)
Wed Feb 05 13:29:52 MST 2014 INFO: Profiler Event: [QUERY] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) duration: 1 ms, connection-id: 81, statement-id: 33, resultset-id: 0, message: insert into Person (firstName, lastName, id) values ('person1', 'Name', 1),('person2', 'Name', 2),('person3', 'Name', 3),('person4', 'Name', 4),('person5', 'Name', 5),('person6', 'Name', 6),('person7', 'Name', 7),('person8', 'Name', 8),('person9', 'Name', 9),('person10', 'Name', 10)
Hibernate正在记录前10行,但这并不是实际发送到MySQL数据库的内容。最后一行来自MySQL驱动程序,它清楚地显示了具有多个值的单个批处理插入,这实际上是发送到MySQL数据库的。