Question

我有这个简单的文件集：

{
  id : 1,
  book_ids : [2,3],
  collection_ids : ['a','b']
},
{
  id : 2,
  book_ids : [1,2]
}

如果我运行 过滤器查询 ，它将匹配这两个文件：

{
    bool: {
        filter: [
            {
                bool: {
                    should: [
                        {
                            bool: {
                                must_not: {
                                    exists: {
                                        field: 'book_ids'
                                    }
                                }
                            }
                        },
                        {
                            bool: {
                                filter: {
                                    term: {
                                        book_ids: 2
                                    }
                                }
                            }
                        }
                    ]
                }
            },
            {
                bool: {
                    should: [
                        {
                            bool: {
                                must_not: {
                                    exists: {
                                        field: 'collection_ids'
                                    }
                                }
                            }
                        },
                        {
                            bool: {
                                filter: {
                                    term: {
                                        collection_ids: 'a'
                                    }
                                }
                            }
                        }
                    ]
                }
            }
        ]
    }
}

问题是我想对这些文档进行排序，我希望首先返回第一个（id：1），因为它匹配book_ids值和collection_ids值。

像这样的简单排序子句不起作用：

[
  'book_ids',
  'collection_ids'
]

因为它会返回第一个文档2，因为book_ids数组的第一个值。

编辑：这是我所面临的问题的简化示例，它在should子句中有N个这样的子句。此外，条款之间有一个顺序，因为我试图用sort片段反映：匹配第一个子句（book_ids）的结果应该出现在匹配第二个子句（collection_ids）的结果之前。我真的在寻找某种SQL排序操作，我只考虑字段数组的匹配值。一个可行的选择可能是根据预期的排序顺序为每个term子句分配递减的constant_scores，ES必须将这个子得分相加以计算最终得分。但我无法弄清楚如何做到这一点，或者甚至是否可能。

奖金问题：有没有办法让ElasticSearch返回某种只有匹配值的新文档？以下是我对上述 过滤查询 的回应：

{
  id : 1,
  book_ids : [2],
  collection_ids : ['a']
},
{
  id : 2,
  book_ids : [2]
}

Answer 1

我认为你对持续得分的想法是正确的。我想你可以这样做：

{
  query: {
    bool: {
      must: [
        {
          bool: {
            should: [
              {
                bool: {
                  must_not: {
                    exists: {
                      field: 'book_ids'
                    }
                  }
                }
              },
              {
                constant_score: {
                  filter: {
                    term: {
                      book_ids: 2
                    }
                  },
                  boost: 100
                }
              }
            ]
          }
        },
        {
          bool: {
            should: [
              {
                bool: {
                  must_not: {
                    exists: {
                      field: 'collection_ids'
                    }
                  }
                }
              },
              {
                constant_score: {
                  filter: {
                    term: {
                      collection_ids: 'a'
                    }
                  },
                  boost: 50
                }
              }
            ]
          }
        }
      ]
    }  
  }
}

我认为使用常量分数唯一缺少的可能只是顶级查询需要must，而不是filter。（过滤器没有得分，所有得分都为0.）

另一种方法是将过滤器放在function_score查询中（但将其保留为过滤器），然后根据需要计算得分（https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html）

关于奖金问题，如果您使用脚本字段来过滤并添加您想要的新字段（https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-script-fields.html），则可能是这样，但这不可能以直截了当的方式进行。除非您的值中包含很长的列表，否则在收到结果后进行过滤可能更容易并且更有意义。

Elasticsearch自定义排序/添加过滤器子句分数

1 个答案: