Question

最近将一些Bing Search API v2代码转换为v5并且它有效，但我对“totalEstimatedMatches”的行为感到好奇。这是一个例子来说明我的问题：

我们网站上的用户搜索特定字词。 API查询返回10个结果（我们的页面大小设置）和totalEstimatedMatches设置为21.因此，我们指出3页结果并让用户翻页。

当他们到达第3页时，totalEstimatedMatches返回22而不是21.看起来很奇怪，如果这么小的结果集它不应该已经知道它是22，但好吧我可以忍受。所有结果都正确显示。

现在，如果用户再次从第3页翻页到第2页，则totalEstimatedMatches的值再次为21。这让我感到有些惊讶，因为一旦结果集被分页，API可能应该知道有22个而不是21个结果。

自80年代以来，我一直是一名专业的软件开发人员，因此我认为这是与API设计相关的细节问题之一。显然它没有缓存确切的结果数量，或者其他什么。我只是不记得V2搜索API中的那种行为（我意识到它是第三方代码）。结果数量非常可靠。

这对我以外的任何人有点意想不到吗？

Answer 1

原来这就是为什么响应JSON字段totalEstimatedMatches包含单词...Estimated...并且不仅仅被称为totalMatches的原因：

＆＃34; ...搜索引擎索引不支持对总匹配的准确估计。＆＃34;

取自：News Search API V5 paging results with offset and count

正如人们所预料的那样，您获得的结果越少，您在totalEstimatedMatches值中可能看到的误差就越大。同样，您的查询越复杂（例如，运行复合查询，例如../search?q=(foo OR bar OR foobar)&...，实际上是3次搜索打包为1）此值似乎表现出的变化越多。

那就是说，我设法（至少初步）通过设置offset == totalEstimatedMatches并创建一个简单的等效检查功能来弥补这一点。

这是python中的一个简单例子：

while True:
    if original_totalEstimatedMatches < new_totalEstimatedMatches:
       original_totalEstimatedMatches = new_totalEstimatedMatches.copy()

       #set_new_offset_and_call_api() is a func that does what it says.
       new_totalEstimatedMatches = set_new_offset_and_call_api()
    else:
        break

Answer 2

重新访问API，我想出了一种无需使用"totalEstimatedMatches"返回值即可有效分页的方法：

class ApiWorker(object):
    def __init__(self, q):
        self.q = q
        self.offset = 0
        self.result_hashes = set()
        self.finished = False

    def calc_next_offset(self, resp_urls):
       before_adding = len(self.result_hashes)
       self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
       after_adding = len(self.result_hashes)
       if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
           self.complete = True
       else:
           self.offset += len(new_results)

    def page_through_results(self, *args, **kwargs):
        while not self.finished:
            new_resp_urls = ...<call_logic>...
            self.calc_next_offset(new_resp_urls)
            ...<save logic>...
        print(f'All unique results for q={self.q} have been obtained.')

一旦获得完全的重复答复，此^将停止分页。

使用Microsoft（Bing）认知搜索API（v5）的totalEstimatedMatches行为

2 个答案: