Lucene:前缀查询不能与WhitespaceAnalyzer一起使用

时间:2014-07-21 17:54:47

标签: java api lucene

我正在尝试使用Lucene的各种Query对象,并且我试图理解为什么在使用WhitespaceAnaylzer进行索引时前缀查询与任何文档都不匹配。请考虑以下测试代码:

protected String[] ids = { "1", "2" };
protected String[] unindexed = { "Netherlands", "Italy" };
protected String[] unstored = { "Amsterdam has lots of bridges",
        "Venice has lots of canals" };
protected String[] text = { "Amsterdam", "Venice" };

@Test
public void testWhitespaceAnalyzerPrefixQuery() throws IOException, ParseException {
    File indexes = new File(
            "C:/LuceneInActionTutorial/indexes");

    FSDirectory dir = FSDirectory.open(indexes);

    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_4_9,
            new LimitTokenCountAnalyzer(new WhitespaceAnalyzer(
                    Version.LUCENE_4_9), Integer.MAX_VALUE));
    IndexWriter writer = new IndexWriter(dir, config);

    for (int i = 0; i < ids.length; i++) {
        Document doc = new Document();
        doc.add(new StringField("id", ids[i], Store.NO));
        doc.add(new StoredField("country", unindexed[i]));
        doc.add(new TextField("contents", unstored[i], Store.NO));
        doc.add(new Field("city", text[i], TextField.TYPE_STORED));
        writer.addDocument(doc);
    }
    writer.close();

    DirectoryReader dr = DirectoryReader.open(dir);
    IndexSearcher is = new IndexSearcher(dr);
    QueryParser queryParser = new QueryParser(Version.LUCENE_4_9,
            "contents", new WhitespaceAnalyzer(Version.LUCENE_4_9));
    queryParser.setLowercaseExpandedTerms(true);
    Query q = queryParser.parse("Ven*");
    assertTrue(q.getClass().getSimpleName().contains("PrefixQuery"));
    TopDocs hits = is.search(q, 10);
    assertEquals(1, hits.totalHits);
} 

如果我用StandardAnalyzer替换WhitespaceAnaylzer,测试就会通过。我使用Luke检查索引内容,但在索引编制期间Lucene如何存储值时无法找到任何差异。有人可以澄清出了什么问题吗?

1 个答案:

答案 0 :(得分:3)

StandardAnalyzer在编入索引时对文本进行小写。 WhitespaceAnalyzer没有。索引中的术语WhitespaceAnalyzer是“威尼斯”。

查询解析器会将您的查询小写,因为您已设置setLowercaseExpandedTerms(true)(这也是默认设置,要禁用此功能,您需要将其显式设置为false)。所以你的查询是“ven *”,与“威尼斯”不匹配。