如何处理空列表-多页网页抓取

时间:2019-01-04 01:01:28

标签: python web-scraping beautifulsoup

我正在尝试通过网页抓取从Lazada提取问题和答案部分,但是当某些页面没有任何问题/答案时,我遇到了问题。当我在多个网页上运行该代码时,我的代码什么也没有返回,但仅适用于具有疑问和答案的页面。

尽管第一页没有问题,我如何使代码继续读取其余网页?

我尝试在代码中添加if else语句,如下所示。

 import bleach
 import csv
 import datetime
 from bs4 import BeautifulSoup

urls = ['url1','url2','url3']

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

 now = datetime.datetime.now()
 print ("Date data being pulled:")
 print str(now)
 print ("")

 nameList = soup.findAll("div", {"class":"qna-content"})

for name in nameList:
    if nameList == None:
       print('None')
    else:
       print(name.get_text())
       continue

我的预期输出将如下所示:

  

无-> 从url1输出   无->从url2输出
  可以选择榛子吗?   尊敬的客户您好:最晚的到期日期为2019年,我们将确保到期日期仍超过6个月。--> url3的输出

感谢您的帮助,谢谢!

2 个答案:

答案 0 :(得分:1)

由于我仍在学习中,因此我对代码的逻辑进行了一些更改,并设法打印了记录,因为如果您有替代/更好的解决方案,希望也能与他人共享。

import datetime
from bs4 import BeautifulSoup
import requests

urls = ['url1','url2','url3']

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

now = datetime.datetime.now()
print ("Date data being pulled:")
print str(now)
print ("")

qna = []
qna = soup.findAll("div", class_= "qna-content")

for qnaqna in qna:
     if not qnaqna:
        print('List is empty')
     else:
        print(qnaqna.get_text())
        continue

答案 1 :(得分:1)

您使用了错误的语法,将if nameList == None:放在循环之外,还需要修复缩进。

urls = ['url1','url2','url3']

for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    now = datetime.datetime.now()
    print ("Date data being pulled:")
    print str(now)
    print ("")

    nameList = soup.findAll("div", {"class":"qna-content"})
    if nameList == None:
        print(url, 'None')
        continue # skip this URL

    for name in nameList:
        print(name.get_text())