我有一系列属于少数用户的产品(系统使用ElasicSearch(ES),MySQL,Scala和ES Play Framework API link):
[
{ id: 1, user_id: 'jason', product: [...] },
{ id: 2, user_id: 'mike', product: [...] },
{ id: 3, user_id: 'mike', product: [...] },
{ id: 4, user_id: 'dan', product: [...] },
{ id: 5, user_id: 'bill', product: [...] },
{ id: 6, user_id: 'mike', product: [...] },
{ id: 7, user_id: 'dan', product: [...] },
{ id: 8, user_id: 'bill', product: [...] },
{ id: 9, user_id: 'mike', product: [...] },
{ id: 10, user_id: 'dan', product: [...] },
{ id: 11, user_id: 'bill', product: [...] },
...
]
我想根据用户的ID检索一些最佳匹配文档的特定数字(例如,匹配得分最高的前2名):
[
{ id: 2, user_id: 'mike', product: [...], _score: 100},
{ id: 3, user_id: 'mike', product: [...], _score: 95},
{ id: 4, user_id: 'dan', product: [...], _score: 90},
{ id: 5, user_id: 'bill', product: [...], _score: 80},
{ id: 7, user_id: 'dan', product: [...], _score: 70},
{ id: 8, user_id: 'bill', product: [...], _score: 65},
...
]
我在user_id上尝试term facets,但我找不到当前每个用户的相同数量的产品,例如,
[
{ id: 2, user_id: 'mike', product: [...], _score: 100},
{ id: 3, user_id: 'mike', product: [...], _score: 95},
{ id: 4, user_id: 'dan', product: [...], _score: 90},
{ id: 5, user_id: 'bill', product: [...], _score: 80},
{ id: 6, user_id: 'mike', product: [...], _score: 75},
...
]
术语构面伪代码:
/** query type is com.github.cleverage.elasticsearch.ScalaHelpers.IndexResults[Product]
* filtered is matching requirement filter, i.e. including keyword "fashion"
* limit is the size of returned users with matching document, i.e. 10
* finalQuery return 5 unique users based on tmpQuery result with 10 users
* each user has 2 products finally
*/
tmpQuery = query.withBuilder(filtered).withSize(limit)
finalQuery = tmpQuery.addFacet(FacetBuilders.termsFacet("userId").field("user_id").size(5))
如何确保每个人都有2个产品,而不是迈克有3个,丹有1个,账单有1个? 我的意思是,addFacet现在不起作用,因为finalQuery基于tmpQuery,而tmpQuery返回10个结果,这些结果更多来自mike,因为匹配得分较高,如何更新tmpQuery以达到2的限制?)
术语构面无法确保唯一用户,它只返回最常用的用户。实际上,在这种情况下,必须首先匹配产品,然后检索其user_id,这样它就不能先获得用户,然后再获得他们的产品。
欣赏。