WordDelimiterFilterFactory如何通过带有数字的标记进行搜索?

时间:2017-09-18 15:15:00

标签: java indexing lucene hibernate-search

我有以下配置:

@AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
    // Normalize token text to lowercase, as the user is unlikely to
    // care about casing when searching for matches
    @TokenFilterDef(factory = WordDelimiterFilterFactory.class),

    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
        @Parameter(name = "minGramSize", value = "2"),
        @Parameter(name = "maxGramSize", value = "5") }) })

这几乎按预期工作,但是包含数字的单词存在问题。

例如:

通过ab令牌lucene返回abcdefg,但如果我需要找到  a1并且a1b1c1d1它没有返回任何内容

如何更改此配置?

1 个答案:

答案 0 :(得分:0)

除非您没有其他要求,否则您应该尝试删除WordDelimiterFilterFactory,或者至少正确配置(特别是preserveOriginal设置为1)真的需要它的一些功能。

默认情况下,我认为WordDelimiterFilter会将"a1b1c1d1"变为类似["a", "1", "b", "1", "c", "1", "d", "1"]的内容,我怀疑这对于#34;自动完成"字段。