我最近正在编写一个网络爬虫,发现自己嵌套了try / except循环,并依靠错误来驱动我的部分代码,如以下两个部分:
try:
reg_title = soup.find('p', {'class': "regnumber-e"}).text
except AttributeError:
try:
reg_title = soup.find('p', {'class': "regtitle-e"}).text
except AttributeError:
reg_title = soup.find('p', {'class': "Yregnumber-e"}).text
和
if soup.find_all('p', {'class': "Notice"}):
try:
#More code
except IndexError:
#More code
continue
elif (soup.find_all('p', {'class': "ConsolidationPeriod-e"}) or
soup.find_all('p', {'class': "ConsolidationPeriod"})):
try:
text = soup.find('p', {'class': "ConsolidationPeriod-e"}).text
except AttributeError:
text = soup.find('p', {'class': "ConsolidationPeriod"}).text
elif soup.find('p', {'class': "Notice-e"}):
#More code
continue
else:
continue
很显然,我已经剪掉了代码部分,但是这里的特定代码是无关紧要的。通常,我的编码传感器性能不佳,并且在进行网页抓取时,我觉得必须有一种更好的方法来导航不同的html标签。有什么想法吗?
答案 0 :(得分:0)
您难道不只是try except
所有捕获多异常的代码吗?喜欢:
try:
# All your code
# For exemple
# if soup.find_all('p', {'class': "Notice"}):
# ...
# else:
# ...
except (AttributeError, IndexError) as e:
continue
对于您要获取文本的部分内容,我认为只需进行一次测试就足够了
赞:
if soup.find('p', {'class': "ConsolidationPeriod-e"}):
text = soup.find('p', {'class': "ConsolidationPeriod-e"}).get_text()
else:
text = soup.find('p', {'class': "ConsolidationPeriod"}).text
或者:
if soup.find('p', {'class': "regnumber-e"}):
reg_title = soup.find('p', {'class': "regnumber-e"}).get_text()
elif soup.find('p', {'class': "regtitle-e"}):
reg_title = soup.find('p', {'class': "regtitle-e"}).get_text()
else:
reg_title = soup.find('p', {'class': "Yregnumber-e"}).get_text()