Question

我的Lucene Java实现占用了太多文件。我遵循Lucene Wiki中有关太多打开文件的说明，但这只会帮助减缓问题。这是我将代码（PTicket）添加到索引的代码：

//This gets called when the bean is instantiated
public void initializeIndex() {
    analyzer = new WhitespaceAnalyzer(Version.LUCENE_32);
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

}


public void addAllToIndex(Collection<PTicket> records) {  
    IndexWriter indexWriter = null;
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

    try{
        indexWriter = new IndexWriter(directory, config);
        for(PTicket record : records) {
            Document doc = new Document();
            StringBuffer documentText = new StringBuffer();
            doc.add(new Field("_id", record.getIdAsString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field("_type", record.getType(), Field.Store.YES, Field.Index.ANALYZED));

            for(String key : record.getProps().keySet()) {
                List<String> vals = record.getProps().get(key);

                for(String val : vals) {
                    addToDocument(doc, key, val);
                    documentText.append(val).append(" ");
                }
            }
            addToDocument(doc, DOC_TEXT, documentText.toString());        
            indexWriter.addDocument(doc);    
        }

        indexWriter.optimize();
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        cleanup(indexWriter);
    }
}

private void cleanup(IndexWriter iw) {
    if(iw == null) {
        return;
    }

    try{
        iw.close();
    } catch (IOException ioe) {
        logger.error("Error trying to close index writer");
        logger.error("{}", ioe.getClass().getName());
        logger.error("{}", ioe.getMessage());
    }
}

private void addToDocument(Document doc, String field, String value) {
    doc.add(new Field(field, value, Field.Store.YES, Field.Index.ANALYZED));
}

编辑添加搜索代码

public Set<Object> searchIndex(AthenaSearch search) {  

    try {
        Query q = new QueryParser(Version.LUCENE_32, DOC_TEXT, analyzer).parse(query);

        //search is actually instantiated in initialization.  Lucene recommends this.
        //IndexSearcher searcher = new IndexSearcher(directory, true);
        TopDocs topDocs = searcher.search(q, numResults);
        ScoreDoc[] hits = topDocs.scoreDocs;
        for(int i=start;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            ids.add(d.get("_id"));
        }
        return ids;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

此代码位于Web应用程序中。

1）这是建议使用IndexWriter的方法（在每次添加索引时实例化一个新的吗？）

2）我读过提高ulimit会有所帮助，但这似乎是一个无法解决实际问题的创可贴。

3）问题可能在于IndexSearcher？

Answer 1

1）这是建议使用的方式 IndexWriter（实例化一个新的在每个添加到索引）？

我建议不，有constructors，它将在包含索引的目录中检查是否存在或创建新的编写器。如果重复使用索引编写器，问题2将得到解决。

编辑：

好的，似乎在Lucene 3.2中大多数但是一个构造函数已被弃用，因此可以通过使用值为CREATE_OR_APPEND的Enum IndexWriterConfig.OpenMode来实现Indexwriter的重新开始。

另外，打开新编写器并关闭每个文件添加效率不高，我建议重用，如果你想加快索引，设置setRamBufferSize默认值是16MB，所以通过试错法来做

来自文档：

请注意，您可以使用打开索引即使读者也是如此，create = true 使用索引。老读者会继续搜索“时间点” 他们已打开的快照，但不会看到新创建的索引，直到它们重新打开。

还重用了IndexSearcher，我看不到搜索的代码，但是Indexsearcher是线程安全的，也可以用作Readonly

我还建议你在编写器上使用MergeFactor，这不是必要的，但会有助于限制反向索引文件的创建，通过反复试验方法来实现

Answer 2

我认为我们需要确定您的搜索代码，但我怀疑索引搜索器存在问题。更具体地说，确保在完成索引阅读器时正确关闭它。

祝你好运，

Answer 3

科学的正确答案是：你无法通过这段代码来判断。

更具建设性的答案是：您必须确保在任何给定时间只有一个 IndexWriter正在写入索引，因此您需要一些机制来确保这一点。所以我的答案取决于你想要完成的事情：

你想要更深入地了解Lucene吗？或..
您是否只想构建和使用索引？

如果您回答的是后者，您可能希望查看像Solr这样隐藏所有索引读写的项目。

Answer 4

这个问题可能是重复的 Too many open files Error on Lucene

我在这里重复我的答案。

使用复合索引减少文件数。设置此标志后，lucene会将一个段写为单个.cfs文件而不是多个文件。这将显着减少文件数量。

IndexWriter.setUseCompoundFile(true)

Lucene Java打开了太多文件。我正确使用IndexWriter吗？

4 个答案: