我关注Python Client Libraries for the Google BigQuery API
- https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#jobs> 查询数据(异步)
在检索结果时,执行代码:
rows, total_count, token = query.fetch_data() # API requet
始终返回ValueError: too many values to unpack (expected 3)
(顺便说一下,我认为这是一个错字,它应该是results.fetch_data()
!)
但是,以下代码可以正常工作
results = job.results()
rows = results.fetch_data()
tbl = [x for x in rows]
表中的所有行都是在单个镜头中以tbl返回(作为元组列表),> 225K行!
任何人都可以为什么会收到错误,或者说文档中有什么问题?
我如何仍然批量检索结果(逐页迭代)
提前多多感谢!
答案 0 :(得分:2)
前段时间我打开this issue要求更新文档,但正如您从答案中看到的那样,它仍然需要正式发布才能更改。
请参阅code base本身以获得更好的文档字符串(在本例中特别是类Iterator):
"""Iterators for paging through API responses.
These iterators simplify the process of paging through API responses
where the response is a list of results with a ``nextPageToken``.
To make an iterator work, you'll need to provide a way to convert a JSON
item returned from the API into the object of your choice (via
``item_to_value``). You also may need to specify a custom ``items_key`` so
that a given response (containing a page of results) can be parsed into an
iterable page of the actual objects you want. You then can use this to get
**all** the results from a resource::
>>> def item_to_value(iterator, item):
... my_item = MyItemClass(iterator.client, other_arg=True)
... my_item._set_properties(item)
... return my_item
...
>>> iterator = Iterator(..., items_key='blocks',
... item_to_value=item_to_value)
>>> list(iterator) # Convert to a list (consumes all values).
Or you can walk your way through items and call off the search early if
you find what you're looking for (resulting in possibly fewer
requests)::
>>> for my_item in Iterator(...):
... print(my_item.name)
... if not my_item.is_valid:
... break
At any point, you may check the number of items consumed by referencing the
``num_results`` property of the iterator::
>>> my_iterator = Iterator(...)
>>> for my_item in my_iterator:
... if my_iterator.num_results >= 10:
... break
When iterating, not every new item will send a request to the server.
To iterate based on each page of items (where a page corresponds to
a request)::
>>> iterator = Iterator(...)
>>> for page in iterator.pages:
... print('=' * 20)
... print(' Page number: %d' % (iterator.page_number,))
... print(' Items in page: %d' % (page.num_items,))
... print(' First item: %r' % (next(page),))
... print('Items remaining: %d' % (page.remaining,))
... print('Next page token: %s' % (iterator.next_page_token,))
====================
Page number: 1
Items in page: 1
First item: <MyItemClass at 0x7f1d3cccf690>
Items remaining: 0
Next page token: eav1OzQB0OM8rLdGXOEsyQWSG
====================
Page number: 2
Items in page: 19
First item: <MyItemClass at 0x7f1d3cccffd0>
Items remaining: 18
Next page token: None
To consume an entire page::
>>> list(page)
[
<MyItemClass at 0x7fd64a098ad0>,
<MyItemClass at 0x7fd64a098ed0>,
<MyItemClass at 0x7fd64a098e90>,
]
答案 1 :(得分:0)
是的,你对这份文件是正确的。有一个拼写错误 -
results = job.results()
rows, total_count, token = query.fetch_data() # API requet
while True:
do_something_with(rows)
if token is None:
break
rows, total_count,token=query.fetch_data(page_token=token) # API requeste here
对于大数据集,我们每小时进行一次查询以获取日常工作中的数据。