Question

I have a Person table with 10M rows in it, and this data is read-only. My application reads the whole table into a List via Spring Data JPA on its startup, and then uses this List throughout the lifetime of the app without making any more Person queries.

I'm using Postgres 9.6, Java 8, Spring Data JPA 1.11, and Hibernate 5.2, and there are bunch of other tables which are smaller/have updates etc, and overall everything works great.

The issue I have is that I need 2-3 times memory in order to load these 10M Person objects vs the memory required to hold these Person objects after they are loaded. During the load, JPA will download the whole result set, and then convert it into my Person objects, duplicating the memory. The level one cache of Hibernate is also holding on to these objects.

Hibernate has a StatelessSession which can help me with the caching issue (https://gist.github.com/jelies/5181262), and I can do paging queries of 500k rows at a time or something like that to not duplicate the whole dataset on load, but is there a simpler way of doing this with Spring Data JPA in 2018?

I.e. can I stream the Person table into my Person objects N rows at a time, and disable all caching in the process?

Answer 1

最终做了类似的事情。将获取大小设置为配置参数并测试其他参数

    StatelessSession session = ((Session) em.getDelegate()).getSessionFactory().openStatelessSession();
    // wherever you want to store them
    List<MyObject> output = new ArrayList<>();
    ScrollableResults results = null;


    try {
        Query query = session.createQuery("SELECT a FROM MyObject a");
        query.setFetchSize(250_000);
        query.setReadOnly(true);
        query.setCacheable(false);
        query.setLockMode("a", LockMode.NONE);
        results = query.scroll(ScrollMode.FORWARD_ONLY);
        while (results.next()) {
            MyObject o = (MyObject) results.get(0);
            output.add(o);
        }
    }

Spring Data JPA and loading large volumes of read-only data.

1 个答案: