Lucene Apache没有保留我的旧索引

时间:2016-05-03 14:17:48

标签: java lucene

我在互联网上找到了这个例子:

Indexer.java

public class Indexer {

private IndexWriter writer;

@SuppressWarnings("deprecation")
public Indexer(String indexDirectoryPath) throws IOException {
    Directory indexDirectory = FSDirectory.open(new File(indexDirectoryPath));
    writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36), true,
            IndexWriter.MaxFieldLength.UNLIMITED);
}

public void close() throws CorruptIndexException, IOException {
    writer.close();
}

private Document getDocument(File file) throws IOException {
    Document document = new Document();
    Field contentField = new Field(LuceneConstants.CONTENTS, new FileReader(file));
    Field fileNameField = new Field(LuceneConstants.FILE_NAME, file.getName(), Field.Store.YES,
            Field.Index.NOT_ANALYZED);
    Field filePathField = new Field(LuceneConstants.FILE_PATH, file.getCanonicalPath(), Field.Store.YES,
            Field.Index.NOT_ANALYZED);
    document.add(contentField);
    document.add(fileNameField);
    document.add(filePathField);
    return document;
}

public void indexFile(File file) throws IOException {
    Document document = getDocument(file);
    writer.addDocument(document);
}

public int createIndex(String file) throws IOException {
    indexFile(new File(file));
    return writer.numDocs();
}

}

Searcher.java

public class Searcher {
IndexSearcher indexSearcher;
QueryParser queryParser;
Query query;

@SuppressWarnings("deprecation")
public Searcher(String indexDirectoryPath) throws IOException {
    Directory indexDirectory = FSDirectory
            .open(new File(indexDirectoryPath));
    indexSearcher = new IndexSearcher(indexDirectory);
    queryParser = new QueryParser(Version.LUCENE_36,
            LuceneConstants.CONTENTS, new StandardAnalyzer(
                    Version.LUCENE_36));
}

public TopDocs search(String searchQuery) throws IOException,
        ParseException {
    query = queryParser.parse(QueryParser.escape(searchQuery));
    return indexSearcher.search(query, LuceneConstants.MAX_SEARCH);
}

public Document getDocument(ScoreDoc scoreDoc)
        throws CorruptIndexException, IOException {
    return indexSearcher.doc(scoreDoc.doc);
}

public void close() throws IOException {
    indexSearcher.close();
}

}

LuceneConstants.java

public class LuceneConstants {
public static final String CONTENTS = "contents";
public static final String FILE_NAME = "filename";
public static final String FILE_PATH = "filepath";
public static final int MAX_SEARCH = 10;

}

这就是我使用它们的方式:

public static void main(String[] args) throws IOException, ParseException {
    {
        // First file
        Indexer indexer = new Indexer("index");
        indexer.createIndex("f1.txt");
        indexer.close();
        Searcher searcher = new Searcher(Constante.DIR_INDEX.getValor());
        TopDocs hits = searcher.search("Art. 1°");
        for (ScoreDoc scoreDoc : hits.scoreDocs) {
            org.apache.lucene.document.Document doc = searcher.getDocument(scoreDoc);
            String nomeArquivo = doc.get(LuceneConstants.FILE_PATH);
            System.out.println(nomeArquivo);
        }
    }
    System.out.println("-----");
    {
        // Second file
        Indexer indexer = new Indexer("index");
        indexer.createIndex("f2.txt");
        indexer.close();
        Searcher searcher = new Searcher(Constante.DIR_INDEX.getValor());
        TopDocs hits = searcher.search("Art. 1°");
        for (ScoreDoc scoreDoc : hits.scoreDocs) {
            org.apache.lucene.document.Document doc = searcher.getDocument(scoreDoc);
            String nomeArquivo = doc.get(LuceneConstants.FILE_PATH);
            System.out.println(nomeArquivo);
        }
    }
}

直到“// second file”行才能正常工作。

在我索引第二个文件后,我无法在第一个文件中找到任何内容。

如果我创建一个Indexer实例并使用它同一个实例索引f1.txt和f2.txt并关闭它然后它就像我想要的那样工作。问题是,如果我关闭我的应用程序并打开它并决定索引另一个文件,我将丢失f1.txt和f2.txt。

有没有办法让Lucene在索引新文件时始终保留上一个索引?

1 个答案:

答案 0 :(得分:1)

看起来您使用的是旧版Lucene(3.6或更低版本),对吗?

IndexWriter constructor的第三个参数指定它是应该创建新索引还是打开现有索引。如果设置为true,它将覆盖现有索引(如果给定目录中存在索引)。如果要打开现有索引而不覆盖它,则应为false

writer = new IndexWriter(indexDirectory, new StandardAnalyzer(Version.LUCENE_36), false, IndexWriter.MaxFieldLength.UNLIMITED);