弹性搜索:如何查询为不同用户返回特定数量的产品

时间:2014-05-01 17:25:02

标签: elasticsearch

我有一系列属于少数用户的产品(系统使用ElasicSearch(ES),MySQL,Scala和ES Play Framework API link):

[
  { id: 1, user_id: 'jason', product: [...] },
  { id: 2, user_id: 'mike', product: [...] },
  { id: 3, user_id: 'mike', product: [...] },
  { id: 4, user_id: 'dan', product: [...] },
  { id: 5, user_id: 'bill', product: [...] },
  { id: 6, user_id: 'mike', product: [...] },
  { id: 7, user_id: 'dan', product: [...] },
  { id: 8, user_id: 'bill', product: [...] },
  { id: 9, user_id: 'mike', product: [...] },
  { id: 10, user_id: 'dan', product: [...] },
  { id: 11, user_id: 'bill', product: [...] },
  ...
]

我想根据用户的ID检索一些最佳匹配文档的特定数字(例如,匹配得分最高的前2名):

[
  { id: 2, user_id: 'mike', product: [...], _score: 100},
  { id: 3, user_id: 'mike', product: [...], _score: 95},
  { id: 4, user_id: 'dan', product: [...], _score: 90},
  { id: 5, user_id: 'bill', product: [...], _score: 80},
  { id: 7, user_id: 'dan', product: [...], _score: 70},
  { id: 8, user_id: 'bill', product: [...], _score: 65},
  ...
]

我在user_id上尝试term facets,但我找不到当前每个用户的相同数量的产品,例如,

[ 
  { id: 2, user_id: 'mike', product: [...], _score: 100},
  { id: 3, user_id: 'mike', product: [...], _score: 95},
  { id: 4, user_id: 'dan', product: [...], _score: 90},
  { id: 5, user_id: 'bill', product: [...], _score: 80},
  { id: 6, user_id: 'mike', product: [...], _score: 75},
  ...
]

术语构面伪代码:

/** query type is com.github.cleverage.elasticsearch.ScalaHelpers.IndexResults[Product]
  * filtered is matching requirement filter, i.e. including keyword "fashion" 
  * limit is the size of returned users with matching document, i.e. 10 
  * finalQuery return 5 unique users based on tmpQuery result with 10 users
  * each user has 2 products finally
  */
tmpQuery = query.withBuilder(filtered).withSize(limit)
finalQuery = tmpQuery.addFacet(FacetBuilders.termsFacet("userId").field("user_id").size(5))

如何确保每个人都有2个产品,而不是迈克有3个,丹有1个,账单有1个? 我的意思是,addFacet现在不起作用,因为finalQuery基于tmpQuery,而tmpQuery返回10个结果,这些结果更多来自mike,因为匹配得分较高,如何更新tmpQuery以达到2的限制?)

术语构面无法确保唯一用户,它只返回最常用的用户。实际上,在这种情况下,必须首先匹配产品,然后检索其user_id,这样它就不能先获得用户,然后再获得他们的产品。

欣赏。

0 个答案:

没有答案