我使用Python 3.6创建了一个非常基本的Web scraper,旨在获取存储在csv文档中的url列表并返回信息。昨天它正在运作。
今天,即使使用之前使用的URL的csv,它也不再有效。相反,我收到错误消息。
以下是我正在使用的代码:
import pandas as pd
from bs4 import BeautifulSoup as bs
from urllib.request import urlopen
import time
dataset = pd.read_csv('read_csv.csv')
dataset = dataset.iloc[:, 0].str.strip('[]')
data = []
for i in dataset:
page = urlopen(i)
soup = bs(page, 'html.parser', time.sleep(1))
title = soup.find(attrs = {'class': 'title'})
title = title.text.strip()
content = soup.find(attrs = {'class': 'articleContent articleTruncate'}, itemprop = 'text')
content = content.text.strip()
date = soup.find(attrs = {'class': 'date'})
date = date.text.strip()
author = soup.find(attrs = {'class': 'authorInfo'})
author = author.text.strip()
data.append((title, date, author, content))
以下是控制台错误消息:
Traceback (most recent call last):
File "<ipython-input-26-3a1fc158da11>", line 6, in <module>
title = title.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'