Question

我需要执行干净插入（删除+插入），每个请求都有大量记录（接近100K）。为了进行清酒测试，我将使用10K测试我的代码。同样使用10K，操作将运行30秒，这是不可接受的。我正在做某种程度的spring-data-JPA提供的批处理插入。但是，结果并不令人满意。

我的代码如下所示

@Transactional 
  public void saveAll(HttpServletRequest httpRequest){  
  List<Person> persons = new ArrayList<>();
  try(ServletInputStream sis = httpRequest.getInputStream()){

         deletePersons(); //deletes all persons based on some criteria
         while((Person p = nextPerson(sis)) != null){

                 persons.add(p);
                 if(persons.size() % 2000 == 0){
                        savePersons(persons); //uses Spring repository to perform saveAll() and flush()
                        persons.clear();
                 }
         }
          savePersons(persons); //uses Spring repository to perform saveAll() and flush()
          persons.clear();
  }
}

@Transactional
public void savePersons(List<Persons> persons){

     System.out.println(new Date()+" Before save");
     repository.saveAll(persons);
     repository.flush();
     System.out.println(new Date()+" After save");
}

我还设置了以下属性

spring.jpa.properties.hibernate.jdbc.batch_size=40
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data=true
spring.jpa.properties.hibernate.id.new_generator_mappings=false

在查看日志时，我注意到插入操作大约需要3-4秒来保存2000条记录，但是在迭代过程中却不多。因此，我认为通读该流所花费的时间不是瓶颈。但是插入物是。我还检查了日志，并确认Spring按照属性集进行了40次插入操作。

我试图看看，如果有办法，我可以通过使用将从阻塞队列中读取的多个线程（例如2个线程）来提高性能，并且一旦累积了2000条记录，就会调用save。我希望从理论上讲，这可以提供更好的结果。但是问题是，正如我读到的那样，Spring在线程级别管理事务，并且事务无法在线程之间传播。但是我需要整个操作（删除+插入）为原子操作。我浏览了几篇有关Spring事务管理的文章，却找不到正确的方向。

有没有一种方法可以使用Spring事务实现这种并行性？如果Spring交易不能解决问题，还有其他可用的技术吗？

谢谢

Answer 1

不确定这是否对您有帮助-在测试应用程序中效果很好。另外，不知道这是否会成为Spring高级人员的“好人”，但我希望学习，因此我将发布此建议。

在Spring Boot测试应用程序中，以下代码将JPA存储库注入到ApplicationRunner中，然后将其注入到ExecutorService管理的Runnables中。每个Runnable都会获取一个BlockingQueue，该BlockingQueue由一个单独的KafkaConsumer（充当队列的生产者）连续填充。 Runnables使用queue.takes（）从队列中弹出，然后跟着repo.save（）。（可以很容易地将批处理插入添加到线程中，但是由于应用程序还不需要这样做，所以没有这样做...）

该测试应用程序当前为Postgres（或Timescale）数据库实现JPA，并且正在运行10个线程，由10个使用者提供10个队列。

JPA存储库由

提供

public interface DataRepository extends JpaRepository<DataRecord, Long> {
}

Spring Boot主程序是

@SpringBootApplication
@EntityScan(basePackages = "com.xyz.model")
public class DataApplication {

    private final String[] topics = { "x0", "x1", "x2", "x3", "x4", "x5","x6", "x7", "x8","x9" };
    ExecutorService executor = Executors.newFixedThreadPool(topics.length);


    public static void main(String[] args) {
        SpringApplication.run(DataApplication.class, args);
    }

    @Bean
    ApplicationRunner init(DataRepository dataRepository) {
        return args -> {

            for (String topic : topics) {

                BlockingQueue<DataRecord> queue = new ArrayBlockingQueue<>(1024);
                JKafkaConsumer consumer = new JKafkaConsumer(topic, queue);
                consumer.start();

                JMessageConsumer messageConsumer = new JMessageConsumer(dataRepository, queue);
                executor.submit(messageConsumer);
            }
            executor.shutdown();
        };
    }
}

并且Consumer Runnable具有构造函数和run（）方法，如下所示：

public JMessageConsumer(DataRepository dataRepository, BlockingQueue<DataRecord> queue) {
    this.queue = queue;
    this.dataRepository = dataRepository;
}

@Override
public void run() {
    running.set(true);
    while (running.get()) {
        // remove record from FIFO blocking queue
        DataRecord dataRecord;
        try {
            dataRecord = queue.take();
        } catch (InterruptedException e) {
            logger.error("queue exception: " + e.getMessage());
            continue;
        }
        // write to database 
        dataRepository.save(dataRecord);
    }
}

热爱学习，因此任何想法/顾虑/反馈都值得赞赏...

如何在同一事务下从不同线程调用多个数据库调用？

1 个答案: