通过弹性搜索获得独特的结果

时间:2015-05-22 13:53:08

标签: php symfony elasticsearch elastica foselasticabundle

我在我的项目中使用 FOSElasticaBundle Symfony2 ,并且MySQL数据库上有条目和用户表,每个条目属于一个用户。

我希望在数据库的所有条目中每个用户只获得一个条目。

参赛作品代表

[
  {
    "id": 1,
    "name": "Hello world",
    "user": {
      "id": 17,
      "username": "foo"
    }
  },
  {
    "id": 2,
    "name": "Lorem ipsum",
    "user": {
      "id": 15,
      "username": "bar"
    }
  },
  {
    "id": 3,
    "name": "Dolar sit amet",
    "user": {
      "id": 17,
      "username": "foo"
    }
  },
]

预期结果是:

[
  {
    "id": 1,
    "name": "Hello world",
    "user": {
      "id": 17,
      "username": "foo"
    }
  },
  {
    "id": 2,
    "name": "Lorem ipsum",
    "user": {
      "id": 15,
      "username": "bar"
    }
  }
]

但它会返回表格中的所有条目。我试图在我的弹性搜索查询中添加一个聚合,但没有任何改变。

$distinctAgg = new \Elastica\Aggregation\Terms("distinctAgg");
$distinctAgg->setField("user.id");
$distinctAgg->setSize(1);

$query->addAggregation($distinctAgg);

有没有办法通过术语过滤器或其他任何方式来做到这一点?任何帮助都会很棒。谢谢。

1 个答案:

答案 0 :(得分:2)

当您习惯于MySQL分组时,聚合并不容易理解。

第一件事,是hitsaggregations中未返回聚合结果。因此,当您获得搜索结果时,您必须获得类似的聚合:

$results = $search->search();
$aggregationsResults = $results->getAggregations();

第二件事是聚合不会返回源代码。通过聚合您的示例,您将只知道您有1个ID为15的用户和2个ID为15的用户。

E.g。使用此查询:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "byUser": {
      "terms": {
        "field": "user.id"
      }
    }
  }
}

结果:

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [ ... ]
   },
   "aggregations": {
      "byUser": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": 17,
               "doc_count": 2
            },
            {
               "key": 15,
               "doc_count": 1
            }
         ]
      }
   }
}

如果您想获得结果,就像使用MySQL中的GROUP BY一样,您必须使用top_hits子聚合:

{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "byUser": {
      "terms": {
        "field": "user.id"
      },
      "aggs": {
        "results": {
          "top_hits": {
            "size": 1
          }
        }
      }
    }
  }
}

结果:

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [ ... ]
   },
   "aggregations": {
      "byUser": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": 17,
               "doc_count": 2,
               "results": {
                  "hits": {
                     "total": 2,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test_stackoverflow",
                           "_type": "test1",
                           "_id": "1",
                           "_score": 1,
                           "_source": {
                              "id": 1,
                              "name": "Hello world",
                              "user": {
                                 "id": 17,
                                 "username": "foo"
                              }
                           }
                        }
                     ]
                  }
               }
            },
            {
               "key": 15,
               "doc_count": 1,
               "results": {
                  "hits": {
                     "total": 1,
                     "max_score": 1,
                     "hits": [
                        {
                           "_index": "test_stackoverflow",
                           "_type": "test1",
                           "_id": "2",
                           "_score": 1,
                           "_source": {
                              "id": 2,
                              "name": "Lorem ipsum",
                              "user": {
                                 "id": 15,
                                 "username": "bar"
                              }
                           }
                        }
                     ]
                  }
               }
            }
         ]
      }
   }
}

此页面上的更多信息:https://www.elastic.co/blog/top-hits-aggregation