Question

我有以下配置：

@AnalyzerDef(name = "autocompleteNGramAnalyzer",

// Split input into tokens according to tokenizer
tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class),

filters = {
    // Normalize token text to lowercase, as the user is unlikely to
    // care about casing when searching for matches
    @TokenFilterDef(factory = WordDelimiterFilterFactory.class),

    @TokenFilterDef(factory = LowerCaseFilterFactory.class),
    @TokenFilterDef(factory = EdgeNGramFilterFactory.class, params = {
        @Parameter(name = "minGramSize", value = "2"),
        @Parameter(name = "maxGramSize", value = "5") }) })

这几乎按预期工作，但是包含数字的单词存在问题。

例如：

通过ab令牌lucene返回abcdefg，但如果我需要找到 a1并且a1b1c1d1它没有返回任何内容

如何更改此配置？

Answer 1

除非您没有其他要求，否则您应该尝试删除WordDelimiterFilterFactory，或者至少正确配置（特别是preserveOriginal设置为1）真的需要它的一些功能。

默认情况下，我认为WordDelimiterFilter会将"a1b1c1d1"变为类似["a", "1", "b", "1", "c", "1", "d", "1"]的内容，我怀疑这对于＃34;自动完成＆＃34;字段。

WordDelimiterFilterFactory如何通过带有数字的标记进行搜索？

1 个答案: