Question

我必须用特殊字符搜索一些特定的query_string，但它会给我所有结果。

当我通过以下方式对其进行分析时：

GET /exact/_analyze
{
  "field": "subject",
  "text": "abdominal-scan"
}

输出就像：

{
  "tokens": [
    {
      "token": "abdominal",
      "start_offset": 0,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "scan",
      "start_offset": 10,
      "end_offset": 14,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

这意味着它会自动跳过连字符（-），并将这些单词视为2个单词。

如果我将字段的索引更改为not_analyzed，那么我将无法在该字段中搜索单个单词，必须在query_string中传递整个句子。

还有其他选择，以便我可以进行准确的搜索（不忽略特殊字符）吗？

Answer 1

您应该看看definitive guide's section about analysis，因为这对于理解索引的行为非常重要。

默认情况下，您的字段是使用standard analyzer分析的，它会在连字号上分割单词。

whitespace analyzer是一个非常简单易懂的分析器，它将输入分隔为空白字符上的标记。

您可以尝试以下示例：

POST /solution
{
  "mappings":{
    "doc": {
      "properties":{
        "subject": {
          "type": "string",
          "analyzer": "whitespace"
        }
      }
    }
  }
}

GET /solution/_analyze
{
  "field": "subject",
  "text": "This is about abdominal-scan"
}

输出：

{
  "tokens": [
    {
      "token": "This",
      "start_offset": 0,
      "end_offset": 4,
      "type": "word",
      "position": 0
    },
    {
      "token": "is",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 1
    },
    {
      "token": "about",
      "start_offset": 8,
      "end_offset": 13,
      "type": "word",
      "position": 2
    },
    {
      "token": "abdominal-scan",
      "start_offset": 14,
      "end_offset": 28,
      "type": "word",
      "position": 3
    }
  ]
}

在这种情况下，您可以看到连字符被保留。

Elasticsearch本身会忽略查询字符串中的特殊字符。我该如何避免呢？

1 个答案: