测试elasticsearch自定义分析器 - 管道分隔关键字

时间:2017-12-21 21:57:22

标签: elasticsearch elasticsearch-analyzers elasticsearch-6

我将此索引作为自定义分析器pipe。当我试图测试它时,它返回每个char,而不是管道分隔的单词。

我正在尝试构建一个用例,我的输入行keywords看起来像这样:crockpot refried beans|corningware replacement|crockpot lids|recipe refried beans并且EL会在爆炸后返回匹配。

{
  "keywords": {
    "aliases": {

    },
    "mappings": {
      "cloud": {
        "properties": {
          "keywords": {
            "type": "text",
            "analyzer": "pipe"
          }
        }
      }
    },
    "settings": {
      "index": {
        "number_of_shards": "5",
        "provided_name": "keywords",
        "creation_date": "1513890909384",
        "analysis": {
          "analyzer": {
            "pipe": {
              "type": "custom",
              "tokenizer": "pipe"
            }
          },
          "tokenizer": {
            "pipe": {
              "pattern": "|",
              "type": "pattern"
            }
          }
        },
        "number_of_replicas": "1",
        "uuid": "DOLV_FBbSC2CBU4p7oT3yw",
        "version": {
          "created": "6000099"
        }
      }
    }
  }
}

当我尝试按照guide进行测试时。

curl -XPOST 'http://localhost:9200/keywords/_analyze' -d '{
 "analyzer": "pipe",
 "text": "pipe|pipe2"
}'

我收回了char-by-char结果。

{
  "tokens": [
    {
      "token": "p",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 0
    },
    {
      "token": "i",
      "start_offset": 1,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "p",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "e",
      "start_offset": 3,
      "end_offset": 4,
      "type": "word",
      "position": 3
    },

1 个答案:

答案 0 :(得分:1)

干得好,你几乎就在那里。由于管道|字符是正则表达式中的保留字符,因此您需要将其转义为:

      "tokenizer": {
        "pipe": {
          "pattern": "\\|",   <--- change this
          "type": "pattern"
        }
      }

然后你的分析仪会工作并产生这个:

{
  "tokens": [
    {
      "token": "pipe",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "pipe2",
      "start_offset": 5,
      "end_offset": 10,
      "type": "word",
      "position": 1
    }
  ]
}