ElasticSearch,返回结果,重复数据删除以及自定义排序冒泡到顶部

时间:2014-11-18 22:36:25

标签: elasticsearch aggregation facet

我有一个列表索引。每个都有与文档相关的权重。我需要能够搜索"工程师",并在" title"的每场比赛中获得最佳结果。基于与文档相关的相关性和任意权重。

示例索引:

Doc 1 {"title": "Java Engineer", "content": "A long description", "weighted_importance": 10}
Doc 2 {"title": "Search Engineer", "content": "A long description", "weighted_importance": 10}
Doc 3 {"title": "Ruby Engineer", "content": "A long description", "weighted_importance": 10}
Doc 4 {"title": "PHP Engineer", "content": "A long description", "weighted_importance": 10}
Doc 5 {"title": "Java Engineer", "content": "A long description", "weighted_importance": 10}
Doc 6 {"title": "Search Engineer", "content": "A long description", "weighted_importance": 100}
Doc 7 {"title": "Java Engineer", "content": "A long description", "weighted_importance": 100}
Doc 8 {"title": "MySQL Engineer", "content": "A long description", "weighted_importance": 10}

如果我正在寻找"工程师"将它用相同的标题重复删除项目,并通过增加weighted_importance字段返回结果集中的最佳结果,例如:

Doc 6 {"title": "Search Engineer", "content": "A long description", "weighted_importance": 100}
Doc 7 {"title": "Java Engineer", "content": "A long description", "weighted_importance": 100}
Doc 3 {"title": "Ruby Engineer", "content": "A long description", "weighted_importance": 10}
Doc 4 {"title": "PHP Engineer", "content": "A long description", "weighted_importance": 10}
Doc 8 {"title": "MySQL Engineer", "content": "A long description", "weighted_importance": 10}

最后三个结果会被排序但是它们会下降,但前两个结果需要在它们自己的桶中冒泡到表面。

我是ElasticSearch的新手,你可以说。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

尝试这种方法:

  • 您的映射还应该存储not_analyzed版本的title,以便根据完整标题构建存储桶,而不是根据形成标题的单个条款构建:
{
  "mappings": {
    "engineers": {
      "properties": {
        "title": {
          "type": "string",
          "fields":{
            "raw": {
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "content": {
          "type": "string"
        },
        "weighted_importance": {
          "type": "integer"
        }
      }
    }
  }
}
  • 将结果分组到基于上面定义的title.raw
  • 构建的存储桶上
  • 定义top_hits子聚合以带回" best"每个桶的文件
  • 定义与top_hits相同级别的另一个子聚合,该聚合应该是max聚合,将计算最大weighted_importance
  • 主聚合中的
  • 使用上面的max对生成的桶进行排序
GET /my_index/engineers/_search?search_type=count
{
  "query": {
    "match": {
      "title": "Engineer"
    }
  },
  "aggs": {
    "title": {
      "terms": {
        "field": "title.raw",
        "order": {"best_hit":"desc"}
      },
      "aggs": {
        "first_match": {
          "top_hits": {
            "sort": [{"weighted_importance": {"order": "desc"}}],
            "size": 1
          }
        },
        "best_hit": {
          "max": {
            "lang": "groovy", 
            "script": "doc['weighted_importance'].value"
          }
        }
      }
    }
  }
}