我正在尝试通过网页抓取从Lazada提取问题和答案部分,但是当某些页面没有任何问题/答案时,我遇到了问题。当我在多个网页上运行该代码时,我的代码什么也没有返回,但仅适用于具有疑问和答案的页面。
尽管第一页没有问题,我如何使代码继续读取其余网页?
我尝试在代码中添加if else语句,如下所示。
import bleach
import csv
import datetime
from bs4 import BeautifulSoup
urls = ['url1','url2','url3']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")
nameList = soup.findAll("div", {"class":"qna-content"})
for name in nameList:
if nameList == None:
print('None')
else:
print(name.get_text())
continue
我的预期输出将如下所示:
无-> 从url1输出 无->从url2输出
可以选择榛子吗? 尊敬的客户您好:最晚的到期日期为2019年,我们将确保到期日期仍超过6个月。--> url3的输出
感谢您的帮助,谢谢!
答案 0 :(得分:1)
由于我仍在学习中,因此我对代码的逻辑进行了一些更改,并设法打印了记录,因为如果您有替代/更好的解决方案,希望也能与他人共享。
import datetime
from bs4 import BeautifulSoup
import requests
urls = ['url1','url2','url3']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")
qna = []
qna = soup.findAll("div", class_= "qna-content")
for qnaqna in qna:
if not qnaqna:
print('List is empty')
else:
print(qnaqna.get_text())
continue
答案 1 :(得分:1)
您使用了错误的语法,将if nameList == None:
放在循环之外,还需要修复缩进。
urls = ['url1','url2','url3']
for url in urls:
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")
nameList = soup.findAll("div", {"class":"qna-content"})
if nameList == None:
print(url, 'None')
continue # skip this URL
for name in nameList:
print(name.get_text())