Question

我的映射定义中包含以下字段：

...
"my_field": {
  "type": "string",
  "index":"not_analyzed"
}
...

当我将值为my_field = 'test-some-another'的文档编入索引时，该值会被分为3个术语：test，some，another。

我做错了什么？

我创建了以下索引：

curl -XPUT localhost:9200/my_index -d '{
   "index": {
    "settings": {
      "number_of_shards": 5,
      "number_of_replicas": 2
    },
    "mappings": {
      "my_type": {
        "_all": {
          "enabled": false
        },
        "_source": {
          "compressed": true
        },
        "properties": {
          "my_field": {
            "type": "string",
            "index": "not_analyzed"
          }
        }
      }
    }
  }
}'

然后我索引以下文件：

curl -XPOST localhost:9200/my_index/my_type -d '{
  "my_field": "test-some-another"
}'

然后我使用以下API的插件https://github.com/jprante/elasticsearch-index-termlist： curl -XGET localhost:9200/my_index/_termlist 这给了我以下回应：



{"ok":true,"_shards":{"total":5,"successful":5,"failed":0},"terms": ["test","some","another"]}

{"ok":true,"_shards":{"total":5,"successful":5,"failed":0},"terms": ["test","some","another"]}

Answer 1

通过运行确认映射实际上已设置：

curl localhost:9200/my_index/_mapping?pretty=true

创建索引的命令似乎不正确。它不应包含"index" : {作为根元素。试试这个：

curl -XPUT localhost:9200/my_index -d '{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 2
  },
  "mappings": {
    "my_type": {
      "_all": {
        "enabled": false
      },
      "_source": {
        "compressed": true
      },
      "properties": {
        "my_field": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}'

Answer 2

在ElasticSearch中，当一个字段进入倒排索引时，该字段被索引，lucene使用该数据结构来提供其强大而快速的全文搜索功能。如果要搜索字段，则必须对其进行索引。索引字段时，您可以决定是否要按原样对其进行索引，或者您要对其进行分析，这意味着决定应用于它的标记生成器，这将生成标记（单词）列表和标记列表可以修改生成的令牌的过滤器（甚至添加或删除一些）。索引字段的方式会影响您搜索字段的方式。如果您对字段编制索引但不对其进行分析，并且其文本由多个单词组成，那么您将能够找到该文档，仅搜索包含空格的确切特定文本。

您可以拥有您只想搜索的字段，并且永远不会显示：已编入索引且未存储（默认为lucene）。您可以拥有要搜索的字段并检索：索引和存储。您可以拥有自己不想搜索的字段，但确实需要检索以显示它们。

为什么Elasticsearch“not_analyzed”字段被分成条款？

2 个答案: