使用elasticsearch中的查询字符串使用OR进行精确查询的奇怪突出显示

时间:2017-02-03 15:22:37

标签: elasticsearch

嗨我在elasticsearch v2.3中有一些亮点问题 我无法想出任何导致此问题的逻辑,这是两个例子:

这是我的疑问:

GET reports_all/all/_search
{
    "query": {
      "query_string": {
        "fields": [
          "text"
        ],
        "query": "(\"base of the pyramid impact assessment\" OR \"corporate human rights benchmark\")"
//      "query": "(\"corporate human rights benchmark\" OR \"base of the pyramid impact assessment\")"
      }
    },
    "highlight": {
      "pre_tags": [
        "<mark>"
      ],
      "post_tags": [
        "</mark>"
      ],
      "fields": {
        "text": {
          "number_of_fragments": 10
        }
      }
    },
    "size": 10,
    "from": 0
}

查看查询第二部分与OR分开的完全匹配。我只是交换第一个和第二个短语,这是第一个突出显示错误文本的结果:

"highlight": {
  "text": [
    " organisations to launch <mark>the</mark> \n<mark>Corporate</mark> <mark>Human</mark> <mark>Rights</mark> <mark>Benchmark</mark> (CHRB), <mark>the</mark> \nworld’s first wide-scale project to",
    " taking \naction to reduce <mark>the</mark> environmental \n<mark>impact</mark> <mark>of</mark> our business and finding \nnew ways to help",
    " focuses <mark>of</mark> this is reducing <mark>the</mark> <mark>impact</mark> <mark>of</mark> \nclimate change. Aviva Investors signed <mark>the</mark> Montreal Carbon",
    " \nprogrammes in 2015\n</p>\n<p>Our 2015 reporting\nThis is <mark>the</mark> summary <mark>of</mark> our sustainable\nbusiness and corporate",
    " aim to uphold <mark>the</mark> highest ethical \nstandards in <mark>the</mark> way that we do business. \nIn 2015, 98% <mark>of</mark> Aviva",
    " costs to \nour customers\n</p>\n<p>    Reducing our\nenvironmental <mark>impact</mark>\nIn 2015 Aviva became <mark>the</mark>",
    " first insurer \nto achieve <mark>the</mark> Carbon Trust Supply Chain \nStandard, in recognition <mark>of</mark> work to measure",
    " Stonewall’s  \nTop 100 Employers list\n</p>\n<p>A principal partner  \n<mark>of</mark> <mark>the</mark> Living Wage \nFoundation",
    " take control <mark>of</mark> their finances, as\nwell as benefiting society and <mark>the</mark> environment\n</p>\n<p>• <mark>The</mark> way",
    " we help our local communities, giving\nthousands <mark>of</mark> organisations <mark>the</mark> support they need\nto make a"
  ]
}

},

但第二个结果很好:

   "highlight": {
      "text": [
        " organisations to launch the \n<mark>Corporate</mark> <mark>Human</mark> <mark>Rights</mark> <mark>Benchmark</mark> (CHRB), the \nworld’s first wide-scale project to"
      ]
    }

知道可能出了什么问题?

1 个答案:

答案 0 :(得分:0)

我不太确定发生了什么,但看起来您的查询被分析器分解成单独的单词,ES正在查询中添加隐式AND。

这就是为每个单词分别获得<mark>突出显示的原因。

如果您希望ES将base of the pyramid impact assessment视为单个实体,则可以使用match_phrase查询。

您的查询将类似于

"query": {
  "bool": {
      "should": [
         {
             "match_phrase": {
                "text": "base of the pyramid impact assessment"
             }},
             {
                 "match_phrase": {
                    "text": "corporate human rights benchmark"
                 }
             }
                 ],
                 "minimum_number_should_match": 1
             }
         } 

我不确定这是否有效。让我知道。