Question

我正在向我的ES索引发送搜索查询，并返回多个结果。很多时候，分数较低的结果是无关紧要的，我想删除这些结果，只返回高质量的结果（多数分数较高）。

我的索引包含1000个文档，文本类型为100-500字。例如-{"text":'AVENGERS: ENDGAME is set after Thanos' catastrophic use of the Infinity Stones randomly wiped out half of Earth's population in Avengers: Infinity War. Those left behind are desperate to do something -- anything -- to bring back their lost loved ones. But after an initial attempt -- with extra help from Captain Marvel -- creates more problems than solutions, the grieving, purposeless Avengers think all hope is lost.'}

如果用户搜索“ Marvel aka又名Brie Larson杀死了电影中的Thanos”，则应返回上述文档，因为它包含相似的术语。

当前，我正在使用min_score设置阈值，但是我知道这不是最佳实践，其得分会根据索引中文档的数量而有所不同（该数量会不断增长）。因此，这种方法似乎不可扩展。

我还尝试了多种调整查询以获取高质量结果的方法，例如“更多类似功能”

"must": 
[{"more_like_this" : {
"fields" : field_list,
"like" : query_data,
"min_term_freq" : 1,
"max_query_terms" : 50,
"min_doc_freq" : 1,
"minimum_should_match" : '50%'}}]}}

但是我仍然得到低分（例如1.5）的结果，而高质量的结果通常得分为20。是否有很好的方法可以进一步优化查询或将min_score调整为动态以仅返回高度相关的结果文件？任何帮助将不胜感激！

在ElasticSearch中返回高质量结果

0 个答案: