我正在尝试使用KeywordTokenizer为不区分大小写的搜索索引文档。
我创建了一个自定义分析器,它应该进行关键字标记化以及将所有关键字转换为小写:
public class LowercasingKeywordAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String fieldName) {
KeywordTokenizer keywordTokenizer = new KeywordTokenizer();
return new TokenStreamComponents(keywordTokenizer, new LowerCaseFilter(keywordTokenizer));
}
}
为什么在我提交TermQuery并且所有标准术语都是低位时,搜索没有返回结果?这是一个重现问题的单元测试:
@Test
public void experiment() throws IOException, ParseException {
Analyzer analyzer = new LowercasingKeywordAnalyzer();
Directory directory = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new StringField("fieldname", text, Store.NO));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
//THE TEST PASSES WITH THE CASE SENSITIVE QUERY TERM, BUT DOES NOT PASS WITH LOWERCASED
//Query query = new TermQuery(new Term("fieldname", "This is the text to be indexed."));
Query query = new TermQuery(new Term("fieldname", "This is the text to be indexed.".toLowerCase()));
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
assertEquals(1, hits.length);
ireader.close();
directory.close();
}
请帮我辨别这里有什么问题?
注意:我知道Lucene QueryParsers以及某些接口的弃用,请不要对此发表评论。
答案 0 :(得分:1)
StringField
未分析。您定义的分析器不会影响它。您可以使用TextField
或Field
来定义自己的FieldType
。或者在构造字段之前将其小写,然后继续使用StringField
。