我有大量使用EMF / Texo组合生成和注释的类。我使用JPA / Eclipselink将它们保存在SQL Server数据库上。
这很好用,但是当需要持久存在大量对象时,性能会很糟糕。所以我编写了两个测试用例(参见TestBulkInserts.java
),它将使用框架(foo
)的批量插入的性能与普通的JDBC批量插入(bar
)进行比较。 / p>
插入10000个对象时,这是一个低于平均大小的批量插入。 foo()
和bar()
给出以下时间:
持续时间JPA / Texo:19.620ms
持续时间普通JDBC:892ms
我想知道为什么会有这么大的差异(超过因子20!)。尺寸越大,情况就越差。
DatabaseObject
类扩展PersistableObjectClass.java
(见下文),并使用Texo + EMF生成两者(包括相应的DAO类)。
除了必要的连接详情外,我没有在persistence.xml
中添加任何特定设置。
TestBulkInserts.java:
import java.sql.Connection;
import java.sql.Date;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
...
import com.ownproject.loader.generated.DbModelPackage;
import com.ownproject.loader.DatabaseObject;
import com.ownproject.loader.dao.DatabaseObjectDao;
import javax.persistence.Persistence;
import org.eclipse.emf.texo.server.store.EntityManagerProvider;
import org.junit.Test;
public class TestBulkInserts {
private static final int NUM_LOOPS = 10000;
@Test
public void foo() {
TestMethods.connectTestDBandEMF();
// basically does this
// DbModelPackage.initialize();
// EntityManagerProvider.getInstance().setEntityManagerFactory(Persistence.createEntityManagerFactory(PERSISTENCE_UNIT_TEST));
Stopwatch sw = Stopwatch.createStarted();
DatabaseObjectDao dao = new DatabaseObjectDao();
dao.getEntityManager().getTransaction().begin();
for (int i = 0; i < NUM_LOOPS; i++) {
DatabaseObject dbo = new DatabaseObject();
dbo.setString(UUID.randomUUID().toString());
dbo.setInsert_time(Date.valueOf(LocalDate.now()));
dao.insert(dbo);
}
dao.getEntityManager().getTransaction().commit();
sw.stop();
System.out.println(String.format("Duration JPA/Texo: %,dms", sw.elapsed(TimeUnit.MILLISECONDS)));
}
@Test
public void bar() throws ClassNotFoundException, SQLException {
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");
String connectionUrl = "jdbc:sqlserver://hostname:1433;databaseName=local_test;user=sa;password=blablub;";
Connection con = DriverManager.getConnection(connectionUrl);
con.setAutoCommit(false);
Stopwatch sw = Stopwatch.createStarted();
PreparedStatement insertStatement = con.prepareStatement("INSERT INTO DatabaseObject(b_id, insert_time) VALUES (?, ?)");
for (int i = 0; i < NUM_LOOPS; i++) {
insertStatement.setString(1, UUID.randomUUID().toString());
insertStatement.setDate(2, Date.valueOf(LocalDate.now()));
insertStatement.addBatch();
}
insertStatement.executeBatch();
con.commit();
con.close();
sw.stop();
System.out.println(String.format("Duration plain JDBC: %,dms", sw.elapsed(TimeUnit.MILLISECONDS)));
}
}
PersistableObjectClass.java:
import javax.persistence.Basic;
...
import javax.persistence.TemporalType;
@Entity(name = "PersistableObjectClass")
@MappedSuperclass()
@Inheritance(strategy = InheritanceType.TABLE_PER_CLASS)
public abstract class PersistableObjectClass {
@Basic()
@Temporal(TemporalType.TIMESTAMP)
private Date insert_time = null;
@Id()
@GeneratedValue(strategy = GenerationType.IDENTITY)
private int s_id = 0;
...
}
答案 0 :(得分:3)
正如this article中所解释的那样,您不仅需要使用batch updates,还需要确保定期提交事务,否则您将遇到正在运行事务,这在2PL或MVCC数据库引擎上都很糟糕。
因此,这就是批处理作业的样子:
int entityCount = 50;
int batchSize = 25;
EntityManager entityManager = null;
EntityTransaction transaction = null;
try {
entityManager = entityManagerFactory()
.createEntityManager();
transaction = entityManager.getTransaction();
transaction.begin();
for ( int i = 0; i < entityCount; ++i ) {
if ( i > 0 && i % batchSize == 0 ) {
entityManager.flush();
entityManager.clear();
transaction.commit();
transaction.begin();
}
Post post = new Post(
String.format( "Post %d", i + 1 )
);
entityManager.persist( post );
}
transaction.commit();
} catch (RuntimeException e) {
if ( transaction != null &&
transaction.isActive()) {
transaction.rollback();
}
throw e;
} finally {
if (entityManager != null) {
entityManager.close();
}
}