所以我有一个带有集合people
的RESTful API,可以像这样调用:
http://example.com/people?lastname=smith
返回如下的JSON响应:
{
"page": 0,
"next": 1,
"total": 5000000,
"people": [
{
"firstname": "John",
"lastname": "Smith",
"age": 32
},
{
"firstname": "Adam",
"lastname": "Smith",
"age": 84
},
...
}
我想编写一个Python生成器,它将从响应中产生每个人,当它到达最后一个人时,如果有下一个页面,它将使用http://example.com/people?lastname=smith&page=1
和下一页请求继续无缝地迭代结果。结果类调用将简单如下:
client = PeopleClient("http://example.com/people")
smiths = client.get_people_by_last_name("smith")
然后我可以在smiths
中迭代每个“史密斯”;如有必要,通过所有500万。
有关如何实现这一目标的任何想法,或者甚至是否可能?
使用@ ali-afshar的答案作为指导,此实现应适用于假设的REST API:
import requests
class PeopleClient:
def __init__(self, url):
self._url = url
def _get_people(self, **kwargs):
return requests.get(self._url, params=kwargs)
def get_people_by_last_name(self, lastname):
current_page = 0
while current_page >= 0:
result = self._get_people(lastname=lastname, page=current_page)
for person in result.get("people", []):
yield person
current_page = result.get("next", -1)
答案 0 :(得分:7)
如果没有为您编写代码,您希望利用Python的生成器,而不是将整个集合视为列表。这样,您可以立即开始使用结果,并且只有在到达页面末尾时才执行分页请求。
for person in PeopleClient("http://ex..").get_people_by_last_name("smith"):
# Do something with the person
其次,实际请求的实现应该采用一个页面参数,您可以增加该参数,并且可以由包装器生成器调用。
def get_people_page(name, page):
# Perform the HTTP request, using page=page
生成器本身就像:
def get_all_people(name):
page = 0
has_more = 1
while has_more:
for person in get_people_page(name, page):
yield person
page += 1
has_more = # calculate has more by whether you have a next link
# or whether the results set is empty
# or whether you get an error
答案 1 :(得分:2)
这是我的生成器解决方案,我认为它是一种触摸清洁工具,当使用指定的per_page
时,它可以为您节省额外的不必要的请求。
def get_all(per_page=100):
page = 0
while True:
items = self.api.get(per_page=per_page, page=page)
for item in items:
yield item
if len(items) < per_page:
break
page += 1
all_items = list(get_all())
self.api.get()
必须接受page
和per_page
参数。