在弹性搜索中使用术语频率进行聚合

时间:2017-09-23 12:33:31

标签: elasticsearch search lucene

这是我的ES查询:

===创建索引===

PUT /sample

===插入数据===

PUT /sample/docs/1
{"data": "And the world said, 'Disarm, disclose, or face serious consequences'—and therefore, we worked with the world, we worked to make sure that Saddam Hussein heard the message of the world."}
PUT /sample/docs/2
{"data": "Never give in — never, never, never, never, in nothing great or small, large or petty, never give in except to convictions of honour and good sense. Never yield to force; never yield to the apparently overwhelming might of the enemy"}

===查询获得结果===

POST sample/docs/_search
{
  "query": {
    "match": {
      "data": "never"
    }
  },
  "highlight": {
    "fields": {
      "data":{}
    }
  }
}

===检索结果===

...
        "highlight": {
          "data": [
            "<em>Never</em> give in — <em>never</em>, <em>never</em>, <em>never</em>, <em>never</em>, in nothing great or small, large or petty, <em>never</em> give",
            " in except to convictions of honour and good sense. <em>Never</em> yield to force; <em>never</em> yield to the apparently overwhelming might of the enemy"
          ]
        }

===所需结果===

所需期限按文档搜索的期限的频率 如下例所示

Doc Id: 2
Term Frequency :{
    "never": 8
}

我尝试过Bucket Aggregation,Terms Aggregation和其他聚合,但我没有得到这个结果。

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

您应该使用术语向量,它支持根据频率查询特定术语。

https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

在这种情况下,您的查询将是

GET /sample/docs/_termvectors
{
    "doc": {
      "data": "never"
    },
    "term_statistics" : true,
    "field_statistics" : true,
    "positions": false,
    "offsets": false,
    "filter" : {
      "min_term_freq" : 8
    }
}