Question

我的JPA（使用OpenJPA）编写的批处理作业遇到性能问题，该JPA作为纯Java应用程序运行。我正在尝试插入庞大的对象列表，例如超过1000万条记录。我知道这种设计是不正确的。但是我会突然得到如此大量的数据，而无法拆分整个工作。

我将列表分为几个子列表，每个子列表的大小为100000。我为每个子列表调用JPA事务方法。在每个此类事务中，当列表达到2000时，我都会刷新该列表。据我了解，对于一百万条记录，它将进行100次事务调用。

一旦开始工作，我可以看到大约15-20分钟内插入了600万条记录，平均30万只花了1分钟。但是在达到6-650万之后，工作运行非常缓慢，例如10在4-6分钟内达到千分，感觉就像停止了。但是它可以继续运行，也不会出现堆内存不足的情况。

谁能告诉我代码中有什么错误。我尝试使用不同的块大小（25K，50K，100K）作为子列表。我对乔布斯中期后导致这种缓慢的原因一无所知。我应该在每次交易后清除EM吗？我也增加了连接池的大小。

这是我的代码：

importFrom

Answer 1

在批处理作业中使用JPA时，太多数据经常是一个问题。一千万行要插入。

首先，当行数超过100000时，我将使用更好的api：批处理jdbc。

例如，使用批处理jdbc：

@Stateless
@LocalBean
@TransactionAttribute(TransactionAttributeType.MANDATORY)
public class PersonService implements Serializable {
private static final long serialVersionUID = 1L;

@PersistenceContext(unitName = Constants.PERSISTENCE_UNIT_NAME)
private transient EntityManager entityManager;

public void doIt() {
    // get a jdbc connection from the entityManager (unwrap(Connection.class) is openjpa specific)
    // or you may as well get a jdbc connection from a jdbc DataSource
    try (Connection connection = entityManager.unwrap(Connection.class)) {
        // if Postgresql or Oracle DB, you may need to add a nextval for a sequence in the sql
        String sql = "insert into person (name) values (?)";
        try (PreparedStatement statement = connection.prepareStatement(sql)) {
            int i = 0;
            for (Person person : personList) {
                i++;
                statement.setString(1, person.getName());
                statement.addBatch();
                if (i == 1000) {
                    statement.executeBatch();
                    i = 0;
                }
            }
            if (i > 0) {
                statement.executeBatch();
            }
        }
    }
}
}

如果这还不够，您可以尝试每百万行添加connection.commit()。

JPA运行非常慢的批处理作业

1 个答案: