Question

有没有办法在应用引擎搜索索引中获取整个数据集？以下搜索采用QueryOptions的整数限制，以及始终需要存在的限制。

我无法确定是否有一些特殊标志可以绕过此限制并返回整个结果集。如果查询没有QueryOptions，则结果集以某种方式限制为20。

_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
  query,
  options=search.QueryOptions(
      limit=limit,
      sort_options=search.SortOptions(...))))

有什么想法吗？

Answer 1

您可以自定义删除所有示例，如果您确实希望索引中的每个文档而不是查询中的每个结果https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index

from google.appengine.api import search

def delete_all_in_index(index_name):
    """Delete all the docs in the given index."""
    doc_index = search.Index(name=index_name)

    # looping because get_range by default returns up to 100 documents at a time
    while True:
        # Get a list of documents populating only the doc_id field and extract the ids.
        document_ids = [document.doc_id
                        for document in doc_index.get_range(ids_only=True)]
        if not document_ids:
            break
        # Delete the documents for the given ids from the Index.
        doc_index.delete(document_ids)

所以你最终会得到类似的东西：

while True:
    document_ids = [document.doc_id
                    for document in doc_index.get_range(ids_only=True)]
    if not document_ids:
        break
    # Get then something with the document
    for id in document_ids:
        document = index.get(id)

您可能希望在列表理解中获取文档，而不是获取ID然后从该ID获取文档，但您明白了。

Answer 2

首先，如果您查看QueryOptions的构造函数，它会回答您的问题，为什么它会返回20个结果：

def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
               offset=None, sort_options=None, returned_fields=None,
               ids_only=False, snippeted_fields=None,
               returned_expressions=None):

我认为API为什么这样做的原因是为了避免不必要的结果提取。如果需要在用户操作时获取更多结果而不是始终获取所有结果，则应使用偏移量。见this。

from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
    options=search.QueryOptions(limit=page_size))

# calculate pages
pages = results.found_count / page_size

# user chooses page and hence an offset into results
next_page = ith * page_size

# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
    options=search.QueryOptions(limit=page_size, offset=next_page))

使用Google App Engine索引搜索返回整个数据集

2 个答案: