我需要更新URL的查询部分(page_index =)。我尝试了下面显示的几种方法,但我碰到了一堵墙。我是python的新手,正在寻找指导。页面索引的范围是0 - 511(每天添加新的),我需要更新url以循环遍历所有索引。索引始终从0开始。
import urlparse
url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?
start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
parts = urlparse.urlparse(url)
parts = parts._replace(query = page_index [2])
parts.geturl()
我收到错误:
TypeError Traceback (most recent call last)
<ipython-input-29-066332f37bb3> in <module>()
3 url = 'https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index=0&countries=US'
4 parts = urlparse.urlparse(url)
----> 5 parts = parts._replace(query = page_index [2])
6 parts.geturl()
7
TypeError: 'function' object has no attribute '__getitem__'
答案 0 :(得分:1)
最简单的方法,只需直接修改网址:
base_url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews?start_date=2016-1-01&end_date=2017-8-26&page_index={}&countries=US"
for pi in range(512):
this_url = base_url.format(pi)
# now get it
稍微复杂但更容易定制的方式 - 将参数作为dict传递:
import requests
url = "https://api.appannie.com/v1.2/apps/ios/app/331177714/reviews"
params = {
"start_date": "2016-1-01",
"end_date" : "2017-8-26"
"countries" : "US"
}
for pi in range(512):
params["page_index"] = pi
res = requests.get(url, params)
if res.ok:
html = res.text
答案 1 :(得分:1)
您必须提取urlparse()结果的query
组件并对其进行修改,然后重新构建一个新URL,如下所示:
pr = urlparse.urlparse(url)
parts = pr.query.split('&')
parts[2] = 'page_index=2'
new_url = urlparse.urlunparse([pr.scheme, pr.netloc, pr.path, pr.params, "&".join(parts), pr.fragment])
要遍历所有页码,请遍历最后两行,以获取所需的任何页码范围。