我真的不知道该怎么称呼,抱歉,标题不明确。
我的程序检查网站的多个路径上是否存在元素。该程序具有一个基本URL,该基本URL获取要检查的域的不同路径,这些路径位于json文件(name.json)中。
在我程序的当前状态下,如果找到该元素,它将打印1,否则将显示2。我希望它打印的是url,而不是1或2。但是我的问题是,在最后的for循环之前已保存了id。尝试打印fullurl
时,我只会多次打印json文件中的最后一个id(因为它没有保存),而不是唯一的url。
import json
import grequests
from bs4 import BeautifulSoup
idlist = json.loads(open('name.json').read())
baseurl = 'https://steamcommunity.com/id/'
complete_urls = []
for uid in idlist:
fullurl = baseurl + uid
complete_urls.append(fullurl)
rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)
for r in resp:
soup = BeautifulSoup(r.text, 'lxml')
if soup.find('span', class_='actual_persona_name'):
print('1')
else:
print('2')
答案 0 :(得分:0)
由于grequests.map按照请求的顺序(see this)返回响应,因此您可以使用枚举将每个请求的完整网址与响应进行匹配。
import json
import grequests
from bs4 import BeautifulSoup
idlist = json.loads(open('name.json').read())
baseurl = 'https://steamcommunity.com/id/'
for uid in idlist:
fullurl = baseurl + uid
complete_urls = []
for uid in idlist:
fullurl = baseurl + uid
complete_urls.append(fullurl)
rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)
for index,r in enumerate(resp): # use enumerate to get the index of response
soup = BeautifulSoup(r.text, 'lxml')
print(complete_urls[index]) # using the index of responses to access the already existing list of complete_urls
if soup.find('span', class_='actual_persona_name'):
print('1')
else:
print('2')
答案 1 :(得分:0)
如果我没有正确理解,您可以只使用print(r.url)
而不是数字,因为fullurl
存储在每个响应对象中。
for r in resp:
soup = BeautifulSoup(r.text, 'lxml')
if soup.find('span', class_='actual_persona_name'):
print(r.url)
else:
print(r.url)