Elasticsearch的单词分隔符过滤器类型表的有效类型是什么?

时间:2015-04-06 17:24:19

标签: elasticsearch token

Elasticsearch's Word Delimiter filter有一个非常有用的选项type_table;它允许您将其他特殊字符转换为令牌的合法字符。

但是,它的记录很少:

type_table
A custom type mapping table, for example (when configured using type_table_path):
    # Map the $, %, '.', and ',' characters to DIGIT
    # This might be useful for financial data.
    $ => DIGIT
    % => DIGIT
    . => DIGIT
    \\u002C => DIGIT

    # in some cases you might not want to split on ZWJ
    # this also tests the case where we need a bigger byte[]
    # see http://en.wikipedia.org/wiki/Zero-width_joiner
    \\u200D => ALPHANUM

从该示例中,我们可以看出DIGITALPHANUM是我们可以映射字符的有效选项。还有哪些其他选择,他们做了什么?

1 个答案:

答案 0 :(得分:3)

我通过深入研究Lucene文档找到了答案,Elasticsearch基本上是从中引用的。

WordDelimiterFilterFactory关联的this file in the Subversion repository文档。它受到Elasticsearch文档的大量引用,但包含了这个额外的片段:

  

WordDelimiterFilterFactory的自定义类型映射   允许的类型是:LOWER,UPPER,ALPHA,DIGIT,ALPHANUM,SUBWORD_DELIM