我正在构建网络抓取工具,并希望生成我要请求的所有网址。
该URL具有三个参数:
我需要从日期列表以及设施和运动词典中生成所有可能的组合。
dates = ['2020-10-21', '2020-10-22'] db = {'facility_id': [184, 4, 3, 3], 'sport_id': [1, 2, 1, 5]}
结果URL看起来像这样(这是八个(第2个日期*字典中有4行)的第一个结果
https://www.website.se/subsite?date=2020-10-21&facility_id=184&sport_id=1
我尝试了嵌套的for循环,但发现自己卡住了。
url = 'https://www.website.se/subsite?' dates = ['2020-10-21', '2020-10-22'] db = {'facility_id': [184, 4, 3, 3], 'sport_id': [1, 2, 1, 5]} for date in dates: url = url + date + ',' for col in db: url = url + col + ',' for values in db[col]: url = url + str(values) + ',' print(url)
嵌套的for循环是走的路还是有更好的方法?
https://www.website.se/subsite?date=2020-10-21&facility_id=184&sport_id=1 https://www.website.se/subsite?date=2020-10-21&facility_id=4&sport_id=2 https://www.website.se/subsite?date=2020-10-21&facility_id=3&sport_id=1 https://www.website.se/subsite?date=2020-10-21&facility_id=3&sport_id=5 https://www.website.se/subsite?date=2020-10-22&facility_id=184&sport_id=1 https://www.website.se/subsite?date=2020-10-22&facility_id=4&sport_id=2 https://www.website.se/subsite?date=2020-10-22&facility_id=3&sport_id=1 https://www.website.se/subsite?date=2020-10-22&facility_id=3&sport_id=5
答案 0 :(得分:4)
您可以使用itertools.product
:
from itertools import product
dates = ['2020-10-21', '2020-10-22']
db = {'facility_id': [184, 4, 3, 3], 'sport_id': [1, 2, 1, 5]}
for d, (f, s) in product(dates, zip(db['facility_id'], db['sport_id'])):
print('https://www.website.se/subsite?date={}&facility_id={}&sport_id={}'.format(d, f, s))
打印:
https://www.website.se/subsite?date=2020-10-21&facility_id=184&sport_id=1
https://www.website.se/subsite?date=2020-10-21&facility_id=4&sport_id=2
https://www.website.se/subsite?date=2020-10-21&facility_id=3&sport_id=1
https://www.website.se/subsite?date=2020-10-21&facility_id=3&sport_id=5
https://www.website.se/subsite?date=2020-10-22&facility_id=184&sport_id=1
https://www.website.se/subsite?date=2020-10-22&facility_id=4&sport_id=2
https://www.website.se/subsite?date=2020-10-22&facility_id=3&sport_id=1
https://www.website.se/subsite?date=2020-10-22&facility_id=3&sport_id=5
答案 1 :(得分:1)
尝试一下:
for date in dates:
for fac_id, sport_id in zip(db['facility_id'], db['sport_id']):
res = f'https://www.website.se/subsite?date={date}&facility_id={fac_id}&sport_id={sport_id}'
print(res)
答案 2 :(得分:0)
使用您当前的代码,您要在网址中插入,
。这是一个解决方案:
dates = ['2020-10-21', '2020-10-22']
db = {'facility_id': [184, 4, 3, 3], 'sport_id': [1, 2, 1, 5]}
for date in dates:
for col in db:
for values in db[col]:
url = f"https://www.website.se/subsite?date={date}&facility_id={col}&sport_id={values}"
print(url)
不确定是否有解决嵌套循环的方法。