试图从Elasticsearch中提取叶子字段

时间:2015-03-05 16:32:32

标签: elasticsearch lucene

我在elasticsearch中有一个类似于这样的对象:

{
"text": "something something something",
"entities": { "hashtags":["test","test123"]}
}

问题是并非每个文档都设置了实体属性。所以我想写一个查询:

  • 必须在text字段
  • 中包含关键字
  • 必须包含entities字段
  • 提取entities.hashtag字段

我尝试使用以下查询提取叶子字段,问题是我仍然得到没有entities字段的文档。

对于问题的第二部分,我想知道:我如何只提取entities.hashtags字段?我试过像"fields": ["entities.hashtags"]这样的东西,但它没有用。

{
    "size": 2000,
    "query": {
        "filtered": {
            "query": {
                "match_all": {

                }
            },
            "filter": {
                "bool": {
                    "must": [{
                        "term": {
                            "text": "something"
                        }
                    },
                    {
                        "missing": {
                            "field": "entities",
                            "existence": true
                        }
                    }]
                }
            }
        }
    }
}

1 个答案:

答案 0 :(得分:1)

如果我正确理解你,这似乎就是你想要的。 "term"字段上的"text"过滤器和"entities"字段上的"exists" filter会过滤文档,"entities.hashtags"上的"terms" aggregation会提取值。我将发布我使用的完整示例:

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1
   }
}

PUT /test_index/doc/1
{
   "text": "something something something",
   "entities": { "hashtags": ["test","test123"] }
}

PUT /test_index/doc/2
{
   "text": "another doc",
   "entities": { "hashtags": ["testagain","testagain123"] }
}

PUT /test_index/doc/3
{
   "text": "doc with no entities"
}

POST /test_index/_search
{
   "size": 0,
   "query": {
      "filtered": {
         "query": {
            "match_all": {}
         },
         "filter": {
            "bool": {
               "must": [
                  { "term": {  "text": "something" } },
                  { "exists": { "field": "entities" } }
               ]
            }
         }
      }
   },
   "aggs": {
      "hashtags": {
         "terms": {
            "field": "entities.hashtags"
         }
      }
   }
}
...
{
   "took": 35,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0,
      "hits": []
   },
   "aggregations": {
      "hashtags": {
         "buckets": [
            {
               "key": "test",
               "doc_count": 1
            },
            {
               "key": "test123",
               "doc_count": 1
            }
         ]
      }
   }
}