如何使用elasticsearch索引特定文档字段

时间:2015-05-23 07:35:20

标签: elasticsearch

我的要求是在elasticsearch中存储特定的文档字段以进行索引。 例: 我的文件是

{
  "name":"stev",
  "age":26,
  "salary":25000
}

这是我的文档,但我不想索引总文档。我想要只存储名称字段。 我创建了一个索引emp和写下映射,如下所示

"person" : {
    "_all" : {"enabled" : false},
    "properties" : {
        "name" : {
            "type" : "string", "store" : "yes"
        }
    }
}

查看索引文档时

{

    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 1,
        "hits": [
            {
                "_index": "test",
                "_type": "test",
                "_id": "AU1_p0xAq8r9iH00jFB_",
                "_score": 1,
                "_source": { }
            }
            ,
            {
                "_index": "test",
                "_type": "test",
                "_id": "AU1_lMDCq8r9iH00jFB-",
                "_score": 1,
                "_source": { }
            }
        ]
    }
}

未生成名称字段,为什么? 任何人帮助我

1 个答案:

答案 0 :(得分:1)

很难说你发布的内容有什么问题,但我可以举一个有效的例子。

默认情况下,Elasticsearch将索引您提供的任何源文档。每当它看到一个新的文档字段时,它将创建一个具有合理默认值的映射字段,并且它也将默认索引它们。如果要排除字段,可以在映射中为要排除的每个字段设置"index": "no""store": "no"。如果您希望该行为成为每个字段的默认行为,则可以使用"_default_"属性指定不存储的字段(尽管我无法使其无法编制索引)。

您可能还需要停用"_source",并在搜索查询中使用"fields"参数。

这是一个例子。索引定义如下所示:

PUT /test_index
{
   "mappings": {
      "person": {
         "_all": {
            "enabled": false
         },
         "_source": {
            "enabled": false
         },
         "properties": {
            "name": {
               "type": "string",
               "index": "analyzed", 
               "store": "yes"
            },
            "age": {
                "type": "integer",
                "index": "no",
                "store": "no"
            },
            "salary": {
                "type": "integer",
                "index": "no",
                "store": "no"
            }
         }
      }
   }
}

然后我可以使用bulk api添加一些文档:

POST /test_index/person/_bulk
{"index":{"_id":1}}
{"name":"stev","age":26,"salary":25000}
{"index":{"_id":2}}
{"name":"bob","age":30,"salary":28000}
{"index":{"_id":3}}
{"name":"joe","age":27,"salary":35000}

由于我禁用了"_source",因此简单查询只返回ID:

POST /test_index/_search
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "person",
            "_id": "1",
            "_score": 1
         },
         {
            "_index": "test_index",
            "_type": "person",
            "_id": "2",
            "_score": 1
         },
         {
            "_index": "test_index",
            "_type": "person",
            "_id": "3",
            "_score": 1
         }
      ]
   }
}

但如果我指定我想要"name"字段,我会得到它:

POST /test_index/_search
{
   "fields": [
      "name"
   ]
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "test_index",
            "_type": "person",
            "_id": "1",
            "_score": 1,
            "fields": {
               "name": [
                  "stev"
               ]
            }
         },
         {
            "_index": "test_index",
            "_type": "person",
            "_id": "2",
            "_score": 1,
            "fields": {
               "name": [
                  "bob"
               ]
            }
         },
         {
            "_index": "test_index",
            "_type": "person",
            "_id": "3",
            "_score": 1,
            "fields": {
               "name": [
                  "joe"
               ]
            }
         }
      ]
   }
}

您可以通过运行来证明其他字段未存储:

POST /test_index/_search
{
   "fields": [
      "name", "age", "salary"
   ]
}

将返回相同的结果。我还可以通过运行此查询来证明"age"字段未编入索引,如果已将"age"编入索引,则会返回文档:

POST /test_index/_search
{
   "fields": [
      "name", "age"
   ],
   "query": {
      "term": {
         "age": {
            "value": 27
         }
      }
   }
}
...
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   }
}

这是我用来玩这个的一堆代码。我想使用_default映射和/或字段来处理此问题,而无需为每个字段指定设置。我能够在不存储数据方面使其工作,但每个字段仍然被编入索引。

http://sense.qbox.io/gist/d84967923d6c0757dba5f44240f47257ba2fbe50