Question

我有一个忽略空格的分析器。当我搜索没有空格的字符串时，它会返回正确的结果。这是分析仪：

{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "word_joiner": {
          "type": "word_delimiter",
          "catenate_all": true
        }
      },
      "analyzer": {
        "word_join_analyzer": {
          "type": "custom",
          "filter": [
            "word_joiner"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  }
}

这是它的工作原理：

curl -XGET "http://localhost:9200/cake/_analyze?analyzer=word_join_analyzer&pretty" -d 'ONE"\ "TWO'

结果：

{
  "tokens" : [ {
    "token" : "ONE",
    "start_offset" : 1,
    "end_offset" : 5,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "ONETWO",
    "start_offset" : 1,
    "end_offset" : 13,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "TWO",
    "start_offset" : 7,
    "end_offset" : 13,
    "type" : "word",
    "position" : 1
  } ]
}

我想要的是我从这个分析仪中得到"token" : "ONE TWO"。我怎么能这样做？
谢谢！

Answer 1

您需要启用preserve_original设置，默认情况下为false

{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "word_joiner": {
          "type": "word_delimiter",
          "catenate_all": true,
          "preserve_original": true           <--- add this
        }
      },
      "analyzer": {
        "word_join_analyzer": {
          "type": "custom",
          "filter": [
            "word_joiner"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  }
}

这将产生：

{
  "tokens": [
    {
      "token": "ONE TWO",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": "ONE",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "ONETWO",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": "TWO",
      "start_offset": 4,
      "end_offset": 7,
      "type": "word",
      "position": 1
    }
  ]
}

查询中的空格

1 个答案: