Elasticsearch,按地理距离和分数排序聚合

时间:2018-03-18 19:46:39

标签: elasticsearch distance aggregation

我的映射如下:

PUT places
{
  "mappings": {
    "test": {
      "properties": {
        "id_product": { "type": "keyword" },
        "id_product_unique": { "type": "integer" },
        "location": { "type": "geo_point" },
        "suggest": {
          "type": "text"
        },
        "active": {"type": "boolean"}
      }
    }
  }
}

POST places/test
{
   "id_product" : "A",
   "id_product_unique": 1,
   "location": {
      "lat": 1.378446,
      "lon": 103.763427
   },
   "suggest": ["coke","zero"],
   "active": true
}

POST places/test
{
   "id_product" : "A",
   "id_product_unique": 2,
   "location": {
      "lat": 1.878446,
      "lon": 108.763427
   },
   "suggest": ["coke","zero"],
   "active": true
}

POST places/test
{
   "id_product" : "B",
   "id_product_unique": 3,
   "location": {
      "lat": 1.478446,
      "lon": 104.763427
   },
   "suggest": ["coke"],
   "active": true
}

POST places/test
{
   "id_product" : "C",
   "id_product_unique": 4,
   "location": {
      "lat": 1.218446,
      "lon": 102.763427
   },
   "suggest": ["coke","light"],
   "active": true
}

在我的示例中,有2罐可乐零("id_product_unique" = 12),1罐可乐("id_product_unique" = 3)和一罐焦炭灯("id_product_unique" = 4

所有这些罐子都在不同的位置。

" id_product"并不是一个完全相同的可口可乐"可以在两个不同的位置出售("id_product_unique" = 12)。

仅限" id_product_unique"和"位置"改变焦炭"罐头#34;到另一个(2个相同"可以和#34;有相同的字段"建议"和" id_product"但不一样&#34 ; id_product_unique"和" location")。

我的目标是从给定的GPS位置搜索产品,并通过id_product(最近的产品)显示唯一的结果:

POST /places/_search?size=0
{
  "aggs" : {
    "group-by-type" : {
      "terms" : { "field" : "id_product"},
      "aggs": {
        "min-distance": {
          "top_hits": {
            "sort": {
              "_script": { 
                "type": "number",
                "script": {
                  "source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
                  "lang": "painless"
                },
                "order": "asc"
              }
            },
            "size" : 1
          }
        }
      }
    }
  }
}

从结果列表中我现在想应用一个应该查询并按计算得分重新排序我的结果列表。我尝试了以下方法:

POST /places/_search?size=0
{
  "query" : {
    "bool": {
      "filter": {"term" : { "active" : "true" }},
      "should": [
        {"match" : { "suggest" : "coke" }},
        {"match" : { "suggest" : "light" }}
      ]
    }
  },
  "aggs" : {
    "group-by-type" : {
      "terms" : { "field" : "id_product"},
      "aggs": {
        "min-distance": {
          "top_hits": {
            "sort": {
              "_script": { 
                "type": "number",
                "script": {
                  "source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
                  "lang": "painless"
                },
                "order": "asc"
              }
            },
            "size" : 1
          }
        }
      }
    }
  }
}

但我无法想象如何用doc分数替换距离排序分数。

任何帮助都会很棒。

2 个答案:

答案 0 :(得分:0)

我设法通过添加新的聚合“max_score”来实现:

"max_score": {
  "max": {
    "script": {
      "lang": "painless",
      "source": "_score"
    }
  }
}

并通过max_score.value desc命令:

"order": {"max_score.value": "desc"}

我的最终查询如下:

POST /places/_search?size=0
{
  "query" : {
    "bool": {
      "filter": {"term" : { "active" : "true" }},
      "should": [
        {"match" : { "suggest" : "coke" }},
        {"match" : { "suggest" : "light" }}
      ]
    }
  },
  "aggs" : {
    "group-by-type" : {
      "terms" : {
        "field" : "id_product",
          "order": {"max_score.value": "desc"}
      },
      "aggs": {
        "min-distance": {
          "top_hits": {
            "sort": {
              "_script": { 
                "type": "number",
                "script": {
                  "source": "def x = doc['location'].lat; def y = doc['location'].lon; return Math.abs(x-1.178446) + Math.abs(y-101.763427)",
                  "lang": "painless"
                },
                "order": "asc"
              }
            },
            "size" : 1
          }
        },
        "max_score": {
          "max": {
            "script": {
              "lang": "painless",
              "inline": "_score"
            }
          }
        }
      }
    }
  }
}

答案:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "group-by-type": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "C",
          "doc_count": 1,
          "max_score": {
            "value": 1.0300811529159546
          },
          "min-distance": {
            "hits": {
              "total": 1,
              "max_score": null,
              "hits": [
                {
                  "_index": "places",
                  "_type": "test",
                  "_id": "VhJdOmIBKhzTB9xcDvfk",
                  "_score": null,
                  "_source": {
                    "id_product": "C",
                    "id_product_unique": 4,
                    "location": {
                      "lat": 1.218446,
                      "lon": 102.763427
                    },
                    "suggest": [
                      "coke",
                      "light"
                    ],
                    "active": true
                  },
                  "sort": [
                    1.0399999646503995
                  ]
                }
              ]
            }
          }
        },
        {
          "key": "A",
          "doc_count": 2,
          "max_score": {
            "value": 0.28768208622932434
          },
          "min-distance": {
            "hits": {
              "total": 2,
              "max_score": null,
              "hits": [
                {
                  "_index": "places",
                  "_type": "test",
                  "_id": "UhJcOmIBKhzTB9xc6ve-",
                  "_score": null,
                  "_source": {
                    "id_product": "A",
                    "id_product_unique": 1,
                    "location": {
                      "lat": 1.378446,
                      "lon": 103.763427
                    },
                    "suggest": [
                      "coke",
                      "zero"
                    ],
                    "active": true
                  },
                  "sort": [
                    2.1999999592114756
                  ]
                }
              ]
            }
          }
        },
        {
          "key": "B",
          "doc_count": 1,
          "max_score": {
            "value": 0.1596570909023285
          },
          "min-distance": {
            "hits": {
              "total": 1,
              "max_score": null,
              "hits": [
                {
                  "_index": "places",
                  "_type": "test",
                  "_id": "VRJcOmIBKhzTB9xc_vc0",
                  "_score": null,
                  "_source": {
                    "id_product": "B",
                    "id_product_unique": 3,
                    "location": {
                      "lat": 1.478446,
                      "lon": 104.763427
                    },
                    "suggest": [
                      "coke"
                    ],
                    "active": true
                  },
                  "sort": [
                    3.2999999020282695
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

答案 1 :(得分:0)

根据我收集的内容,您的用例是您希望将文档中特定字段的值计入相关性得分计算的位置。 在您希望根据字段值(例如价格或此处查询特定产品)提升文档相关性的情况下,这种情况很常见。 如果您正在搜索产品A,那么在这种情况下,这比产品本身的距离更重要。因此,如果B距离原点2英里,A距离5英里,则A是您要搜索的产品中最接近的。

您需要的是使用基于距离的衰减函数的功能评分查询。我想你想要一个高斯型来反映衰变的速度,就像钟形曲线一样。

以下是使用exp(指数)类型的衰减函数的示例。这个用例做了同样的事情,但它使用的是不同的字段类型(日期) 你是,但这个想法应该是一样的。

  

假设不是想要逐步增加值   一个字段,你有一个你想要定位的理想值,你想要的   提升因子使你从价值中走得更远。这个   通常在基于lat / long,数字字段的boost中有用   价格或日期。在我们人为的例子中,我们正在寻找书籍   “搜索引擎”理想地发布于2014年6月左右。

POST /bookdb_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "functions": [
                {
                    "exp": {
                        "publish_date" : {
                            "origin": "2014-06-15",
                            "offset": "7d",
                            "scale" : "30d"
                        }
                    }
                }
            ],
            "boost_mode" : "replace"
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}

以下是一些有用的参考资料:

Elasticsearch 6.2 Function Score document

Elastcisearch Example Queries

The Closer the Better
这是一个Elasticsearch 2x Decay Function示例,即使它是一个不同的版本,我认为它与您的用例非常相似