Elasticsearch:具有快速矢量突出显示的多个pre_tags / post_tags

时间:2016-06-09 09:37:57

标签: elasticsearch

该文档包含以下关于pre_tags / post_tags设置的含糊不清的评论,该设置能够包含多对前/后标记:

  

使用快速矢量荧光笔可以有更多标签,而且   "重要性"订购。

有谁知道声明的确切含义是什么?

1 个答案:

答案 0 :(得分:2)

花了一段时间,但通过使用ES 1.7和_head插件尝试不同的查询,我能够弄清楚多个前置和后置标签如何影响突出显示。

使用快速矢量荧光笔,您可以按“重要性”的顺序指定标签,这似乎意味着他们的订单和搜索字词的顺序应该匹配。对任何效果使用多个前置或后置标记需要在查询中使用多个字段。

给出索引

{
 myindex: {
  mappings: {
   corpdocument: {
    properties: {
     createddate: {
      type: "date",
      format: "dateOptionalTime"
     },
     docbody: {
      type: "string",
      analyzer: "text_analyzer",
      fields: {
       exact: {
        type: "string",
        analyzer: "text_analyzer_exact"
       }
      }
     },
     modifieddate: {
      type: "date",
      format: "dateOptionalTime"
     },
     title: {
      type: "string"
     }
    }
   }
  }
 }
}

和搜索

POST locahost:9200/myindex/corpdocument/_search
{
 "highlight": {
  "pre_tags": ["|primary-highlight|",
  "|secondary-highlight|",
  "post_tags": ["|/primaryh-highlight|",
  "|/secondary-highlight|",
  "fields": {
   "docbody.exact": {
    "fragment_size": 150,
    "number_of_fragments": 3
   }
  }
 },
 "_source": {
  "exclude": ["docbody"]
 },
 "query": {
  "bool": {
   "should": [{
    "match": {
     "docbody.exact": {
      "query": "foo"
     }
    }
   },
   {
    "match": {
     "docbody.exact": {
      "query": "bar"
     }
    }
   }
  }
 }
}

你可以得到像这样的结果

{
 "took": 14,
 "timed_out": false,
 "_shards": {
  "total": 5,
  "successful": 5,
  "failed": 0
 },
 "hits": {
  "total": 97,
  "max_score": 0.48895144,
  "hits": [{
   "_index": "myindex",
   "_type": "corpdocument",
   "_id": "XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M=",
   "_score": 0.48895144,
   "_source": {
    "createddate": "2010-11-02T00:00:00-05:00",
    "modifieddate": "2007-09-04T00:00:00-05:00",
    "_id": "XFxxZWR0ZXN0ZG9jc1xTYW5kYm94XFNhbmRib3hBbGxcRGV4dGVyX2xpdFw3NS5kb2M="
   },
   "highlight": {
    "docbody.exact": ["Lorem ipsum dolor sit amet, consectetur adipiscing elit |primary-highlight|foo|/primary-highlight|Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit |secondary-highlight|bar|/secondary-highlight|TOTHE|primary-highlight|foo</span>|/primary-highlight|Lorem ipsum dolor sit amet, consectetur adipiscing elit",
    "Lorem ipsum dolor sit amet, consectetur adipiscing elit |secondary-highlight|bar|/secondary-highlight| Lorem ipsum dolor sit amet, consectetur adipiscing elit |primary-highlight|Chief|/primary-highlight| Lorem ipsum dolor sit amet, consectetur adipiscing elit"]
   }
  },
  ...
  ]
 }
}

哪个标记包含哪个匹配基于标记和搜索词的顺序。切换“foo”和“bar”的顺序,同时将其他所有内容保持不变将导致bar被包裹在主标记中并且foo被包装在辅助标记中。

从使用3个搜索词和2个标签的初步实验看来,第三个术语似乎包含在第一个标签而不是第二个标签中。添加第三个标记可以解决该问题,但需要重复次要标记n次以覆盖所有搜索项。

"highlight": {
 "pre_tags": ["|primary-highlight|",
 "|secondary-highlight|",
 "|secondary-highlight|",
 "post_tags": ["|/primaryh-highlight|",
 "|/secondary-highlight|",
 "|/secondary-highlight|",
 "fields": {
  "docbody.exact": {
   "fragment_size": 150,
   "number_of_fragments": 3
  }
 }
},
..."query": {
 "bool": {
  "should": [{
   "match": {
    "docbody.exact": {
     "query": "foo"
    }
   }
  },
  {
   "match": {
    "docbody.exact": {
     "query": "bar"
    }
   }
  },
  {
   "match": {
    "docbody.exact": {
     "query": "baz"
    }
   }
  }
 }
}