ElasticSearch无法识别数字

时间:2018-11-29 22:18:34

标签: elasticsearch

我使用此配置进行搜索和映射:

  

输入:9200 /用户

{
  "settings": {
    "analysis": {
      "analyzer": {
        "autocomplete": {
          "tokenizer": "autocomplete",
          "filter": [
            "lowercase"
          ]
        },
        "autocomplete_search": {
          "tokenizer": "lowercase"
        }
      },
      "tokenizer": {
        "autocomplete": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 10,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "autocomplete_search"
        }
      }
    }
  }
}

但是当我添加新对象时:

  

POST:9200 / subscribers / doc /?pretty

{
  "id": "1421997",
  "name": "John 333 Martin",
  "contact_number":"+43fdsds*543254365"
}

如果我通过类似的多个字段进行搜索

  

POST:9200 / subscribers / doc / _search

{
    "query": {
        "multi_match": {
            "query": "Joh",
            "fields": [
                "name",
                "id",
                "contact_number"
            ],
            "type": "best_fields"
        }
    }
}

它成功返回"John 333 Martin"。但是,当我执行"query": "333""query": "+43fds""query": "14219"时,它什么也不返回。这很奇怪,因为我也为数字配置了过滤器:

 "token_chars": [
            "letter",
            "digit"
          ]

我该怎么做才能按所有字段进行搜索并查看带有数字的结果?


更新:

即使GET :9200/subscribers/_analyze

{
  "analyzer": "autocomplete",
  "text": "+43fdsds*543254365"
}

显示绝对正确的组合,例如"43""43f""43fd""43fds"。但是搜索没有。可能是我的搜索查询不正确吗?

1 个答案:

答案 0 :(得分:1)

您的搜索使用的是与在反向索引中创建标记所使用的分析器不同的分析器。由于您将lowercase标记程序用作search_analyzer,因此数字会被剥离。见下文

POST _analyze
{
  "tokenizer": "lowercase",
  "text":     "+43fdsds*543254365"
}

产生

{
  "tokens" : [
    {
      "token" : "fdsds",
      "start_offset" : 3,
      "end_offset" : 8,
      "type" : "word",
      "position" : 0
    }
  ]
}

请改为使用standard分析器作为search_analyzer,即按如下所示修改映射,它会按预期工作

"mappings": {
    "doc": {
      "properties": {
         "id": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
        "name": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        },
         "contact_number": {
          "type": "text",
          "analyzer": "autocomplete",
          "search_analyzer": "standard"
        }
      }
    }
  }

使用standard分析器

POST _analyze
{
  "analyzer": "standard",
  "text":     "+43fdsds*543254365"
}

生产

{
  "tokens" : [
    {
      "token" : "43fdsds",
      "start_offset" : 1,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "543254365",
      "start_offset" : 9,
      "end_offset" : 18,
      "type" : "<NUM>",
      "position" : 1
    }
  ]
}