我想将一个例子从Lucene 3.0中的Lucene in Action 2nd Edition"迁移到Lucene的当前版本。以下是需要迁移的代码:
public void testUpdate() throws IOException {
assertEquals(1, getHitCount("city", "Amsterdam"));
IndexWriter writer = getWriter();
Document doc = new Document();
doc.add(new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
doc.add(new Field("country", "Netherlands", Field.Store.YES, Field.Index.NO));
doc.add(new Field("contents", "Den Haag has a lot of museums", Field.Store.NO, Field.Index.ANALYZED));
doc.add(new Field("city", "Den Haag", Field.Store.YES, Field.Index.ANALYZED));
writer.updateDocument(new Term("id", "1"), doc);
writer.close();
assertEquals(0, getHitCount("city", "Amsterdam"));
assertEquals(1, getHitCount("city", "Den Haag"));
}
我试图根据Lucene Migration Guide使用前一个Field构造函数的等价物来创建Document对象来执行迁移。其代码如下:
@Test
public void testUpdate() throws IOException
{
assertEquals(1, getHitCount("city", "Amsterdam"));
IndexWriter writer = getWriter();
Document doc = new Document();
FieldType ft = new FieldType(StringField.TYPE_STORED);
ft.setOmitNorms(false);
doc.add(new Field("id", "1", ft));
doc.add(new StoredField("country", "Netherlands"));
doc.add(new TextField("contents", "Den Haag has a lot of museums", Store.NO));
doc.add(new Field("city", "Den Haag", TextField.TYPE_STORED));
writer.updateDocument(new Term("id", "1"), doc);
writer.close();
assertEquals(0, getHitCount("city", "Amsterdam"));
assertEquals(1, getHitCount("city", "Den Haag");
}
第二个断言方法失败了,因为它没有找到字符串" Den Haag" (只有" Den"或" Haag"虽然有效)。如果我改为使用StringField对象,那么测试就会通过,因为" city"属性不是anaylzed(即标记化),因此保持不变。但是这个例子的意图显然不是像这样对待这个属性。一个ID。我已经读过组合" Field.Store.YES / Field.Index.ANALYZED"适用于诸如介绍文本,摘要或标题之类的小文本内容,因此它也应该匹配连接字符串,例如" Den Haag"或者我错了?请有人澄清。
作者使用Term对象创建搜索字符串:
protected int getHitCount(String fieldName, String searchString) throws IOException {
DirectoryReader dr = DirectoryReader.open(directory);
IndexSearcher searcher = new IndexSearcher(dr);
Term t = new Term(fieldName, searchString);
Query query = new TermQuery(t);
int hitCount = TestUtil.hitCount(searcher, query);
return hitCount;
}
TestUtil类只包含一行代码
public static int hitCount(IndexSearcher searcher, Query query) {
return searcher.search(query, 1).totalHits;
}
答案 0 :(得分:1)
简短说明:您需要确保在索引时和搜索时标记化设置(开/关)相同。
长解释:如果您希望分析内容,则不仅应使用TextField
,还应使用QueryParser
,以便您的查询经历相同的过程。在您的情况下,您的查询失败,因为
new Field("city", "Den Haag", TextField.TYPE_STORED));
文本被标记为" Den"和#34; Haag"。之后,当您创建TermQuery
时,您会搜索单个字词" Den Haag"当然,这没有结果。
下面的代码显示了这对非标记化案例有何作用:
doc.add(new StringField("city", "Den Haag", Field.Store.YES));
...
PhraseQuery query = new PhraseQuery();
query.addTerm(new Term("city", "Den Haag"));