使用python 2.7我试图从NYT中抓取并导入文章,之前没有问题,无论是同时获取一篇文章还是多篇文章,现在都出现错误 AttributeError:'module'对象没有属性'Scraper'。
我正在使用newspaper package,到目前为止它一直很有效,直到出现此错误。它似乎适用于某些HTML链接,而不是其他链接,尽管HTML链接是准确的。关于解决方案的任何想法?
这是我的代码:
import pandas as pd
import newspaper
from newspaper import Article
url3='http://www.nytimes.com/2010/08/04/nyregion/04shooting.html'
url4='http://www.nytimes.com/2010/08/04/nyregion/04gunman.html'
url5='http://www.nytimes.com/2010/08/05/nyregion/05shooting.html'
url6='http://www.nytimes.com/2010/08/05/nyregion/05vics.html'
urls=[url3, url4,url5,url6]
Nyt_HBC =pd.DataFrame()
for i in urls:
a=Article(i, language='en')
a.download()
a.parse()
Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
Nyt_HBC.columns=['Title','Article']
Nyt_HBC
这是我的完整错误消息(快速注意,如果没有.parse(),则无法运行它) -
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-47-12545a6e9854> in <module>()
9 a=Article(i, language='en')
10 a.download()
---> 11 a.parse()
12 Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
13 Nyt_HBC.columns=['Title','Article']
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in parse(self)
226
227 if self.config.fetch_images:
--> 228 self.fetch_images()
229
230 self.is_parsed = True
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in fetch_images(self)
245 first_img = self.extractor.get_first_img_url(
246 self.url, self.clean_top_node)
--> 247 self.set_top_img(first_img)
248
249 if not self.has_top_image():
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in set_top_img(self, src_url)
399 def set_top_img(self, src_url):
400 if src_url is not None:
--> 401 s = images.Scraper(self)
402 if s.satisfies_requirements(src_url):
403 self.set_top_img_no_check(src_url)
AttributeError: 'module' object has no attribute 'Scraper'