Lucene奇怪的行为

时间:2011-01-26 11:06:06

标签: java unit-testing lucene

我正在尝试开始使用lucene。我用来索引文档的代码是:

public void index(String type, String words) {
        IndexWriter indexWriter = null;
        try {
            if (dir == null)
                dir = createAndPropagate();
            indexWriter = new IndexWriter(dir, new StandardAnalyzer(), true,
                    new KeepOnlyLastCommitDeletionPolicy(),
                    IndexWriter.MaxFieldLength.UNLIMITED);

            Field wordsField = new Field(FIELD_WORDS, words, Field.Store.YES,
                    Field.Index.ANALYZED);
            Field typeField = new Field(FIELD_TYPE, type, Field.Store.YES,
                    Field.Index.ANALYZED);

            Document doc = new Document();
            doc.add(wordsField);
            doc.add(typeField);

            indexWriter.addDocument(doc);
            indexWriter.commit();
        } catch (IOException e) {
            logger.error("Problems while adding entry to index.", e);
            } finally {
            try {
                if (indexWriter != null)
                    indexWriter.close();
            } catch (IOException e) {
                logger.error("Unable to close index writer.", e);
            }
        }

    }

搜索结果如下:

public List<TagSearchEntity> searchFor(final String type, String words,
            int amount) {
        List<TagSearchEntity> result = new ArrayList<TagSearchEntity>();

        try {
            if (dir == null)
                dir = createAndPropagate();

            for (final Document doc : searchFor(dir, type, words, amount)) {
                @SuppressWarnings("serial")
                TagSearchEntity searchResult = new TagSearchEntity() {{
                    setType(type);
                    setWords(doc.getField(FIELD_WORDS).stringValue());
                }};
                result.add(searchResult);
            }
        } catch (IOException e) {
            logger.error("Problems while searching", e);
        }

        return result;
    }

private List<Document> searchFor(Directory indexDirectory, String type,
            String words, int amount) throws IOException {
        Searcher indexSearcher = new IndexSearcher(indexDirectory);

        final Query tagQuery = new TermQuery(new Term(FIELD_WORDS, words));
        final Query typeQuery = new TermQuery(new Term(FIELD_TYPE, type));

        @SuppressWarnings("serial")
        BooleanQuery query = new BooleanQuery() {{
            add(tagQuery, BooleanClause.Occur.SHOULD);
            add(typeQuery, BooleanClause.Occur.MUST);
        }};

        List<Document> result = new ArrayList<Document>();

        for (ScoreDoc scoreDoc : indexSearcher.search(query, amount).scoreDocs) {
            result.add(indexSearcher.doc(scoreDoc.doc));
        }

        indexSearcher.close();

        return result;
    }

我有两个用例。第一个添加某种类型的文档,然后搜索它,然后添加另一种类型的文档,然后搜索它,等等。另一个添加所有文档,然后搜索它们。第一个工作正常:

@Test
    public void testSearch() {
        search.index("type1", "test type1 for test purposes test test");
        List<TagSearchEntity> result = search.searchFor("type1", "test", 10);
        assertNotNull("Retrieved list should not be null.", result);
        assertTrue("Retrieved list should not be empty.", !result.isEmpty());

        search.index("type2", "test type2 for test purposes test test");
        result.clear();
        result = search.searchFor("type2", "test", 10);
        assertTrue("Retrieved list should not be empty.", !result.isEmpty());

        search.index("type3", "test type3 for test purposes test test");
        result.clear();
        result = search.searchFor("type3", "test", 10);
        assertTrue("Retrieved list should not be empty.", !result.isEmpty());
    }

但另一个似乎只是索引最后一个文件:

@Test
    public void testBuggy() {
       search.index("type1", "test type1 for test purposes test test");
       search.index("type2", "test type2 for test purposes test test");
       search.index("type3", "test type3 for test purposes test test");

        List<TagSearchEntity> result = search.searchFor("type3", "test", 10);
        assertNotNull("Retrieved list should not be null.", result);
        assertTrue("Retrieved list should not be empty.", !result.isEmpty());

        result.clear();
        result = search.searchFor("type2", "test", 10);
        assertTrue("Retrieved list should not be empty.", !result.isEmpty());

        result.clear();
        result = search.searchFor("type1", "test", 10);
        assertTrue("Retrieved list should not be empty.", !result.isEmpty());
    }

它成功找到type3,但未能找到所有其他人。如果我解决这些调用,它仍然会成功找到最后一个索引文档。 Lucene版本,我正在使用的是:

    <dependency>
        <groupId>org.apache.lucene</groupId>
        <artifactId>lucene-core</artifactId>
        <version>2.4.1</version>
    </dependency>

    <dependency>
        <groupId>lucene</groupId>
        <artifactId>lucene</artifactId>
        <version>1.4.3</version>
    </dependency>

我做错了什么?如何使其索引所有文件?

1 个答案:

答案 0 :(得分:2)

每次索引操作后都会创建一个新索引。第三个参数是create标志,它被设置为true。根据{{​​3}},如果设置了此标志,它将创建新索引或覆盖现有索引。将其设置为false以附加到现有索引。