Question

我有一些标题文件

document 1: the C# language
document 2: the C++ language
document 3: the C language

默认映射：

{
  "mappings": {
    "langs": {
      "properties": {
        "title": {
          "type": "string"
        }
      }
  }
}

下一个query_sting查询，给我所有3个文档，但我不需要文档3

{
  "query": {
    "query_string": {
      "query": "C# OR C++",
      "fields": [
        "title"
      ]
    }
  }
}

Answer 1

返回所有三个文档，因为title字段使用标准分析器。使用标准分析器，C#，C++和C都会被分析并编入索引c。当您搜索"C# OR C++"时，您最终会搜索"c OR c"。

您需要创建一个whitespace tokenizer的自定义分析器，并在title字段（恰好在title.tokens子字段上）使用它。

curl -XPUT localhost:9200/test -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {               <--- custom analyzer
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "langs": {
      "properties": {
        "title": {
          "type": "string",
          "fields": {
            "tokens": {                <--- new sub-field using the custom analyzer
              "type": "string",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}'

然后您可以再次索引文档：

curl -XPUT localhost:9200/test/langs/1 -d '{"title":"The C++ language"}'
curl -XPUT localhost:9200/test/langs/2 -d '{"title":"The C# language"}'
curl -XPUT localhost:9200/test/langs/3 -d '{"title":"The C language"}'

最后，您现在可以在此title.tokens字段中进行搜索，并且您只会获得前两个文档：

curl -XPOST localhost:9200/test/_search -d '{
  "query": {
    "query_string": {
      "query": "C# OR C++",
      "analyzer": "my_analyzer",         <--- use your custom analyzer
      "fields": [ "title.tokens" ]       <--- use the new field
    }
  }
}'

Elasticsearch请求query_string查找具有特殊字符的单词

1 个答案: