如何在Elasticsearch中计算每个令牌的总数

时间:2018-11-20 10:21:05

标签: python-3.x elasticsearch elasticsearch-py

我有一个请求Elastic

{  
   "query":{  
      "bool":{  
         "must":[  
            {  
               "query_string":{  
                  "query":"something1 OR something2 OR something3",
                  "default_operator":"OR"
               }
            }
         ],
         "filter":{  
            "range":{  
               "time":{  
                  "gte":date
               }
            }
         }
      }
   }
}

我想在一个请求中使用弹性搜索来计算所有文档中每个令牌的计数,例如:

something1: 26 documents
something2: 12 documents
something3: 1 documents

3 个答案:

答案 0 :(得分:1)

假设令牌与枚举不类似(即受约束的一组特定值(例如状态名),它们将通过正确的映射使terms aggregation成为您的最佳选择),我认为与您最接近的事情想要使用filters aggregation

POST your-index/_search
{
  "query":{  
    "bool":{  
      "must":[  
      {  
        "query_string":{  
          "query":"something1 OR something2 OR something3",
          "default_operator":"OR"
         }
      }
      ],
      "filter":{  
        "range":{  
          "time":{  
            "gte":date
          }
        }
      }
    }
  },
  "aggs": {
    "token_doc_counts": {
      "filters" : {
        "filters" : {
          "something1" : { 
            "bool": { 
              "must": { "query_string" : { "query" : "something1" } }, 
              "filter": { "range": { "time": { "gte": date } } } 
            }
          },
          "something2" : { 
            "bool": { 
              "must": { "query_string" : { "query" : "something2" } }, 
              "filter": { "range": { "time": { "gte": date } } } 
            }
          },
          "something3" : { 
            "bool": { 
              "must": { "query_string" : { "query" : "something3" } }, 
              "filter": { "range": { "time": { "gte": date } } } 
            }
          }
        }
      }
    } 
  }
}

响应类似于:

{
  "took": 9,
  "timed_out": false,
  "_shards": ...,
  "hits": ...,
  "aggregations": {
    "token_doc_counts": {
      "buckets": {
        "something1": {
          "doc_count": 1
        },
        "something2": {
          "doc_count": 2
        },
        "something3": {
          "doc_count": 3
        } 
      } 
    } 
  }
}

答案 1 :(得分:0)

您可以将查询拆分为三个过滤器的过滤器聚合。供参考,请点击此处:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html

答案 2 :(得分:0)

您需要做的是创建一个Copy_To字段并具有如下所示的映射。

根据query_string查询的字段,您需要在all字段中包括某些或copy_to个字段。

默认情况下,query_string搜索所有字段,因此您可能需要为所有字段指定copy_to,如下面的映射所示,为简单起见,我仅创建了三个字段, titlefield_2和第三个字段content,它们将作为复制到字段。

映射

PUT <your_index_name>
{
  "mappings": {
    "mydocs": {
      "properties": {
        "title": {
          "type": "text",
          "copy_to": "content" 
        },
        "field_2": {
          "type": "text",
          "copy_to": "content" 
        },
        "content": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

样本文件

POST <your_index_name>/mydocs/1
{
  "title": "something1",
  "field_2": "something2"
}

POST <your_index_name>/mydocs/2
{
  "title": "something2",
  "field_2": "something3"
}

查询:

使用以下聚合查询,您将获得每个令牌的必需文档计数,而我使用了Terms Aggregation

POST <your_index_name>/_search
{
  "size": 0,
  "query": {
    "query_string": {
      "query": "something1 OR something2 OR something3"
    }
  },
  "aggs": {
    "myaggs": {
      "terms": {
        "field": "content",
        "include" : ["something1","something2","something3"]
      }
    }
  }
}

查询响应:

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "myaggs": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "something2",
          "doc_count": 2
        },
        {
          "key": "something1",
          "doc_count": 1
        },
        {
          "key": "something3",
          "doc_count": 1
        }
      ]
    }
  }
}

让我知道是否有帮助!