这回答了我的问题,问题已解决。您可以删除此帖子。
RETRIES = 10
id = None
session = requests.Session()
for attempt in range(1, RETRIES + 1):
response = session.get(url)
soup = BeautifulSoup(r.text, "lxml")
element = soup.find('a', class_="class", id=True)
if element is None:
print("Attempt {attempt}. Element not found".format(attempt=attempt))
continue
else:
id = element["id"]
break
print(id)
这回答了我的问题,问题已解决。您可以删除此帖子。
答案 0 :(得分:0)
您可以应用“跳跃前先看”(LBYL
)原理并检查find()
的结果-如果未找到元素,它将返回None
。然后,您可以将其放入循环并在有值时退出,并通过循环计数器限制来保护自己:
RETRIES = 10
id = None
session = requests.Session()
for attempt in range(1, RETRIES + 1):
response = session.get(url)
soup = BeautifulSoup(r.text, "lxml")
element = soup.find('a', class_="class", id=True)
if element is None:
print("Attempt {attempt}. Element not found".format(attempt=attempt))
continue
else:
id = element["id"]
break
print(id)
夫妇笔记:
id=True
被设置为仅查找存在id
元素的元素。您也可以使用CSS selector soup.select_one("a.class[id]")
Session()
有助于多次向同一主机发出请求时提高性能。在Session Objects 答案 1 :(得分:-1)
如果您只想第二次发出相同的请求,则可以执行以下操作:
import requests
from bs4 import BeautifulSoup
def find_data(url):
found_data = False
while not found_data:
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
id = soup.find('a', class_="class").get('id')
found_data = True
except:
pass
如果数据确实不存在,这将使您处于无限循环的风险。您可以这样做以避免无限循环:
import requests
from bs4 import BeautifulSoup
def find_data(url, attempts_before_fail=3):
found_data = False
while not found_data:
r = requests.get(url)
soup = BeautifulSoup(r.text, "lxml")
try:
id = soup.find('a', class_="class").get('id')
found_data = True
except:
attempts_before_fail -= 1
if attempts_before_fail == 0:
raise ValueError("couldn't find data after all.")