Lucene白色空间分析仪忽略短语?

时间:2013-07-08 06:32:12

标签: lucene

我正在更新Lucene中的文档,但是当我在其中一个字段中搜索完整值时,没有结果返回。如果我只搜索一个单词,那么我会得到一个结果。

这个例子来自Lucene in Action第2版书的第2章,我正在使用Lucene 3 Java库。

这是主要逻辑

"Document fields show new value when updated, and not old value" in {
        getHitCount("city", "Amsterdam") must equal(1)

        val update = new Document
        update add new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED)
        update add new Field("country", "Netherlands", Field.Store.YES, Field.Index.NO)
        update add new Field("contents", "Den Haag has a lot of museums", Field.Store.NO, Field.Index.ANALYZED)
        update add new Field("city", "Den Haag", Field.Store.YES, Field.Index.ANALYZED)

        wr updateDocument(new Term("id", "1"), update)
        wr close

        getHitCount("city", "Amsterdam") must equal(0)
        getHitCount("city", "Den Haag") must equal(1)
    }

这是上面的最后一行失败 - 命中数为0.如果我将查询更改为“Den”或“Haag”,那么我会获得1次点击。

以下是所有设置和依赖项。请注意作者如何使用白色空间查询分析器。这是问题吗?

  override def beforeEach{
        dir = new RAMDirectory

        val wri = writer
        for (i <- 0 to ids.length - 1) {
            val doc = new Document
            doc add new Field("id", ids(i), Field.Store.YES, Field.Index.NOT_ANALYZED)
            doc add new Field("country", unindexed(i), Field.Store.YES, Field.Index.NO)
            doc add new Field("contents", unstored(i), Field.Store.NO, Field.Index.ANALYZED)
            doc add new Field("city", text(i), Field.Store.YES, Field.Index.ANALYZED)
            wri addDocument doc
        }
        wri close

        wr = writer
    }

 var dir: RAMDirectory = _
    def writer = new IndexWriter(dir, new WhitespaceAnalyzer, IndexWriter.MaxFieldLength.UNLIMITED)
    var wr: IndexWriter = _

def getHitCount(field: String, q: String): Int = {
        val searcher = new IndexSearcher(dir)
        val query = new TermQuery(new Term(field, q))
        val hitCount = searcher.search(query, 1).totalHits
        searcher.close()
        hitCount
    }

1 个答案:

答案 0 :(得分:0)

您可能希望查看PhraseQuery而不是TermQuery。