查看已检查了哪些for循环

时间:2020-08-16 22:03:26

标签: python for-loop python-requests

我真的不知道该怎么称呼,抱歉,标题不明确。 我的程序检查网站的多个路径上是否存在元素。该程序具有一个基本URL,该基本URL获取要检查的域的不同路径,这些路径位于json文件(name.json)中。 在我程序的当前状态下,如果找到该元素,它将打印1,否则将显示2。我希望它打印的是url,而不是1或2。但是我的问题是,在最后的for循环之前已保存了id。尝试打印fullurl时,我只会多次打印json文件中的最后一个id(因为它没有保存),而不是唯一的url。

import json
import grequests
from bs4 import BeautifulSoup

idlist = json.loads(open('name.json').read())

baseurl = 'https://steamcommunity.com/id/'


complete_urls = []

for uid in idlist:
    fullurl = baseurl + uid
    complete_urls.append(fullurl)

rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)

for r in resp:
    soup = BeautifulSoup(r.text, 'lxml')

    if soup.find('span', class_='actual_persona_name'):
        print('1')

    else:
        print('2')

2 个答案:

答案 0 :(得分:0)

由于grequests.map按照请求的顺序(see this)返回响应,因此您可以使用枚举将每个请求的完整网址与响应进行匹配。

import json
import grequests
from bs4 import BeautifulSoup

idlist = json.loads(open('name.json').read())

baseurl = 'https://steamcommunity.com/id/'

for uid in idlist:
    fullurl = baseurl + uid

complete_urls = []

for uid in idlist:
    fullurl = baseurl + uid
    complete_urls.append(fullurl)

rs = (grequests.get(fullurl) for fullurl in complete_urls)
resp = grequests.map(rs)

for index,r in enumerate(resp): # use enumerate to get the index of response
    soup = BeautifulSoup(r.text, 'lxml')
    print(complete_urls[index]) # using the index of responses to access the already existing list of complete_urls
    if soup.find('span', class_='actual_persona_name'):
        print('1')

    else:
        print('2')

答案 1 :(得分:0)

如果我没有正确理解,您可以只使用print(r.url)而不是数字,因为fullurl存储在每个响应对象中。

for r in resp:
    soup = BeautifulSoup(r.text, 'lxml')

    if soup.find('span', class_='actual_persona_name'):
        print(r.url)

    else:
        print(r.url)