Question

我正在使用Spring / Hibernate使用org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean完成JPA方式，并使用spring xml，persistence.xml和JPA 2注释进行配置。

功能上它很好并且正确持久。但是，我要求存储具有双向OneToMany的实体A，并尽可能快地使用大量的B集合。

我在persistence.xml中使用各种选项来尝试加速插入并减少内存使用（应用程序写入的内容与读取内容一样多）

<property name="hibernate.id.new_generator_mappings" value="true" />
<property name="hibernate.jdbc.batch_size" value="50" />
<property name="hibernate.order_inserts" value="true" />
<property name="hibernate.order_updates" value="true" />
<property name="hibernate.cache.use_query_cache" value="false" />
<property name="hibernate.cache.use_second_level_cache" value="false" />

并使用

完成持久化

entityManager.persist(instanceOfA)

修改其他信息：

每个实体都有一个生成的ID，如下所示：

@Id
    @Column(name="ID")
    @GeneratedValue(strategy=GenerationType.AUTO, generator="SEQUENCE_GENERATOR")
    @SequenceGenerator(name="SEQUENCE_GENERATOR", sequenceName="MY_SEQUENCE", allocationSize=50)
    private Long id;

与Oracle序列有关

CREATE SEQUENCE MY_SEQUENCE MINVALUE 1 MAXVALUE 999999999999999999999999999 START WITH 1 INCREMENT BY 50 NOCYCLE NOCACHE NOORDER;

当我使用show sql运行代码时，我可以看到很多插入语句花了很长时间。

我已经读过，我需要每插入50行就调用entityManager.flush(); entityManager.clear();。

http://abramsm.wordpress.com/2008/04/23/hibernate-batch-processing-why-you-may-not-be-using-it-even-if-you-think-you-are/

这是否意味着我需要将持久性分解为

entityManager.persist(instanceOfA);
instanceOfA.addB(instanceOfB);
entityManager.persist(instanceofB);

每隔50次调用persist()添加同花顺？

有更干净的方法吗？（我的实际对象层次结构有大约7层关系，如A和B）

我在考虑使用JDBC进行插入，但我讨厌编写行映射器：）

我听说过org.hibernate.StatelessSession但是没有办法从JPA实体管理器那里获取，而不会在某些时候转发到SessionFactory - 再次不是很干净。

提前致谢！

Answer 1

我在其中一个项目中遇到了同样的问题。我正在使用带有identity ID生成器的MySQL后端的Hibernate。问题是，Hibernate需要为保存的每个实体命中一次数据库以实际获取它的ID。我切换到increment生成器，看到了直接的好处（所有插入都被批处理）。

@Id
@GeneratedValue(generator = "increment")
@GenericGenerator(name = "increment", strategy = "increment")
@Column(name = "id", nullable = false)
private long id;

increment生成器在内存中生成ID，不需要访问数据库。我猜测sequence生成器还需要按照数据库中的定义来访问数据库。使用increment的问题是，Hibernate应该具有对数据库的独占插入访问权限，并且它可能在集群设置中失败。

我使用的另一个技巧是将rewriteBatchedStatements=true附加到JDBC URL。这是MySQL特定的，但我认为Oracle可能有类似的指令。

并且“每n次插入后调用刷新”技巧也有效。下面是一个示例代码（使用google-guava类）：

public List<T> saveInBatches(final Iterable<? extends T> entities, final int batchSize) {
    return ImmutableList.copyOf(
        Iterables.concat(
            Iterables.transform(
                Iterables.partition(entities, batchSize),
                new Function<List<? extends T>, Iterable<? extends T>>() {
                    @Override
                    public Iterable<? extends T> apply(final List<? extends T> input) {
                        List<T> saved = save(input); flush(); return saved;
                    }})));
}

public List<T> save(Iterable<? extends T> entities) {
    List<T> result = new ArrayList<T>();
    for (T entity : entities) {
        entityManager.persist(entity);
        result.add(entity);
    }
    return result;
}

Answer 2

使用纯JDBC进行批量/大型插入。不要使用任何ORM框架。

使用Spring EntityManager Hibernate保持大型集合时如何提高性能

2 个答案: