我有以下配置:
@AnalyzerDef(name = "autocompleteNGramAnalyzer",
// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),
filters = {
// Normalize token text to lowercase, as the user is unlikely to
// care about casing when searching for matches
@TokenFilterDef(factory = WordDelimiterFilterFactory.class),
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
@Parameter(name = "minGramSize", value = "2"),
@Parameter(name = "maxGramSize", value = "5") }) })
这几乎按预期工作,但是包含数字的单词存在问题。
例如:
通过ab
令牌lucene返回abcdefg
,但如果我需要找到
a1
并且a1b1c1d1
它没有返回任何内容
如何更改此配置?
答案 0 :(得分:0)
除非您没有其他要求,否则您应该尝试删除WordDelimiterFilterFactory
,或者至少正确配置(特别是preserveOriginal
设置为1
)真的需要它的一些功能。
默认情况下,我认为WordDelimiterFilter
会将"a1b1c1d1"
变为类似["a", "1", "b", "1", "c", "1", "d", "1"]
的内容,我怀疑这对于#34;自动完成"字段。