lucene 4.10.2中生成了多个CFS文件

时间:2014-12-29 08:06:41

标签: java lucene

我正在使用lucene 4.10.2尝试索引612记录。它在索引目录中创建了大量的CFS文件。创建了大约626个CFS文件。索引需要更多时间。所有CFS文件最大为3kb。

ENV:java 8,window 7

Directory dir = FSDirectory.open(file);
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_10_2, new ClassicAnalyzer());
if(bufferSizeMB != 0 && bufferSizeMB != -1){
    config.setRAMBufferSizeMB(bufferSizeMB);
}  else {
    config.setRAMBufferSizeMB(DEFAULT_RAM_BUFFER_SIZE_MB);
}      
config.setMaxBufferedDocs(1000);
config.setMaxBufferedDeleteTerms(1000);
config.setMergePolicy(new LogDocMergePolicy());
IndexWriter iwriter = new IndexWriter(dir, config);
iwriter.getConfig().setMaxBufferedDeleteTerms(1000);
iwriter.getConfig().setMaxBufferedDocs(1000);
iwriter.getConfig().setRAMBufferSizeMB(bufferSizeMB)

http://lucene.472066.n3.nabble.com/Multiple-CFS-files-are-generated-in-lucene-4-10-2-td4176336.html

1 个答案:

答案 0 :(得分:0)

来自change文件,

  LUCENE-4462: DocumentsWriter now flushes deletes, segment infos and builds
  CFS files if necessary during segment flush and not during publishing. The latter
  was a single threaded process while now all IO and CPU heavy computation is done
  concurrently in DocumentsWriterPerThread. 

使用分段刷新,将根据您的合并策略触发合并。理想情况下,如果索引正确结束并且编写器关闭,则只保留一个cfs文件。

这就是我在申请中观察到的内容。

更新以回复评论

我最近从2.x迁移到4.10.2。

来自indexwriter 4.10.2 documentation的报价。

Commits all pending changes (added & deleted documents, segment merges, added indexes, 
etc.) to the index, and syncs all referenced index files, such that a reader will see 
the changes and the index updates will survive an OS or machine crash or power loss. 
Note that this does not wait for any running background merges to finish. This may
be a costly operation, so you should test the cost in your application and do it only
when really necessary.  

你可以做的是使用一个索引编写器并使用它来添加所有记录而不必每次都调用commit。最后,当添加所有记录时,只需调用indexwriter.close(),它将负责合并和提交过程。