有没有办法在应用引擎搜索索引中获取整个数据集?以下搜索采用QueryOptions
的整数限制,以及始终需要存在的限制。
我无法确定是否有一些特殊标志可以绕过此限制并返回整个结果集。如果查询没有QueryOptions
,则结果集以某种方式限制为20。
_INDEX = search.Index(name=constants.SEARCH_INDEX)
_INDEX.search(query=search.Query(
query,
options=search.QueryOptions(
limit=limit,
sort_options=search.SortOptions(...))))
有什么想法吗?
答案 0 :(得分:1)
您可以自定义删除所有示例,如果您确实希望索引中的每个文档而不是查询中的每个结果https://cloud.google.com/appengine/docs/python/search/#Python_Deleting_documents_from_an_index
from google.appengine.api import search
def delete_all_in_index(index_name):
"""Delete all the docs in the given index."""
doc_index = search.Index(name=index_name)
# looping because get_range by default returns up to 100 documents at a time
while True:
# Get a list of documents populating only the doc_id field and extract the ids.
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Delete the documents for the given ids from the Index.
doc_index.delete(document_ids)
所以你最终会得到类似的东西:
while True:
document_ids = [document.doc_id
for document in doc_index.get_range(ids_only=True)]
if not document_ids:
break
# Get then something with the document
for id in document_ids:
document = index.get(id)
您可能希望在列表理解中获取文档,而不是获取ID然后从该ID获取文档,但您明白了。
答案 1 :(得分:0)
首先,如果您查看QueryOptions的构造函数,它会回答您的问题,为什么它会返回20个结果:
def __init__(self, limit=20, number_found_accuracy=None, cursor=None,
offset=None, sort_options=None, returned_fields=None,
ids_only=False, snippeted_fields=None,
returned_expressions=None):
我认为API为什么这样做的原因是为了避免不必要的结果提取。如果需要在用户操作时获取更多结果而不是始终获取所有结果,则应使用偏移量。见this。
from google.appengine.api import search
...
# get the first set of results
page_size = 10
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size))
# calculate pages
pages = results.found_count / page_size
# user chooses page and hence an offset into results
next_page = ith * page_size
# get the search results for that page
results = index.search(search.Query(query_string='some stuff',
options=search.QueryOptions(limit=page_size, offset=next_page))