我可以在弹性搜索中为停止分析器的停用词指定regexp吗?

时间:2014-11-04 08:09:00

标签: regex elasticsearch analyzer stop-words

我想使用这个分析器:跳过每个单词“g”,“l”和你遇到的所有十进制数字。我想使用分析仪,但我不确定是否使用停止分析器是正确的,也不确定如何指定要跳过的这些十进制数。我有这个:

PUT /products
{
"settings": {
    "analysis": {
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "l", "g" ]
        }},
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
        }}

}}}

如何修复它以便它与十进制数一起使用?

1 个答案:

答案 0 :(得分:1)

我再次..我似乎无法将正则表达式添加到停用词。但是,我确实通过添加另一个名为 filter_amount 的过滤器来解决这个问题。这就是它的样子:

             "filter_amount": {
              "type": "pattern_replace",
              "pattern": "[\\d]+([\\.,][\\d]+)?",
              "replacement": ""
             }

这就是设置应该是这样的:

PUT /products
{
"settings": {
    "analysis": {
      "filter": {
          "my_stopwords": {
              "type":       "stop",
              "stopwords": [ "l", "g" ]
          },
         "filter_amount": {
              "type": "pattern_replace",
              "pattern": "[\\d]+([\\.,][\\d]+)?",
              "replacement": ""
          }
        },
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords", "filter_amount"]
        }}
  }}}

其余的都是一样的。干杯!