Question

在每次添加文档后，关闭Lucene IndexWriter会减慢索引过程吗？

我想，关闭和打开索引编写器会减慢我的索引过程或者Lucene不是这样吗？

基本上，我在Spring Batch Job中有一个Lucene Indexer步骤，我在ItemProcessor创建索引。 Indexer Step是一个分区步骤，我在创建IndexWriter时创建ItemProcessor并保持打开直到步骤完成。

@Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String str) throws Exception{
        boolean exists = IndexUtils.checkIndexDir(str);
        String indexDir = IndexUtils.createAndGetIndexPath(str, exists);
        IndexWriterUtils indexWriterUtils = new IndexWriterUtils(indexDir, exists);
        IndexWriter indexWriter = indexWriterUtils.createIndexWriter();
        return new LuceneIndexProcessor(indexWriter);
    }

有没有办法在步骤完成后关闭此IndexWriter？

此外，我遇到了问题，因为我也在此步骤中搜索以查找重复的文档，但我通过在打开阅读器和搜索之前添加writer.commit();来修复此问题。

请建议我是否需要在每次添加文件后关闭并打开，或者一直保持打开状态？以及如何关闭StepExecutionListenerSupport的{{1}}？

最初，我正在为每个文档关闭并重新打开，但索引过程非常缓慢，所以我认为这可能是原因。

Answer 1

由于在开发过程中，索引目录的规模很小，所以我们可能看不到多少收益，但对于大型索引目录大小，我们不需要为IndexWriter以及IndexReader执行不必要的创建和关闭。

在Spring Batch中，我完成了这些步骤

1.如my other question所述，首先我们需要解决序列化问题，将对象放入ExecutionContext。

2.我们在分区器的ExecutionContext中创建并放置复合可序列化对象的实例。

3.从ExecutionContext到配置中的步读卡器，处理器或写入器的值，

    @Bean
    @StepScope
    public ItemProcessor<InputVO,OutputVO> luceneIndexProcessor(@Value("#{stepExecutionContext[field1]}") String field1,@Value("#{stepExecutionContext[luceneObjects]}") SerializableLuceneObjects luceneObjects) throws Exception{
        LuceneIndexProcessor indexProcessor =new LuceneIndexProcessor(luceneObjects);
        return indexProcessor;
    }

4.使用处理器传递此实例，并使用getter方法获取索引读取器或编写器public IndexWriter getLuceneIndexWriter() {return luceneIndexWriter;}

5.最后在StepExecutionListenerSupport afterStep(StepExecution stepExecution)关闭此作者或读者，从ExecutionContext获取。

ExecutionContext executionContext = stepExecution.getExecutionContext();
SerializableLuceneObjects slObjects = (SerializableLuceneObjects)executionContext.get("luceneObjects");
IndexWriter luceneIndexWriter = slObjects.getLuceneIndexWriter();
IndexReader luceneIndexReader = slObjects.getLuceneIndexReader();
if(luceneIndexWriter !=null ) luceneIndexWriter.close();
if(luceneIndexReader != null) luceneIndexReader.close();

我应该保持Lucene IndexWriter开放以进行整个索引还是在每次添加文档后关闭？

1 个答案: