AttributeError:'module'对象没有属性'Scraper'

时间:2017-08-29 15:16:57

标签: python python-2.7 web-scraping python-newspaper

使用python 2.7我试图从NYT中抓取并导入文章,之前没有问题,无论是同时获取一篇文章还是多篇文章,现在都出现错误 AttributeError:'module'对象没有属性'Scraper'。

我正在使用newspaper package,到目前为止它一直很有效,直到出现此错误。它似乎适用于某些HTML链接,而不是其他链接,尽管HTML链接是准确的。关于解决方案的任何想法?

这是我的代码:

import pandas as pd
import newspaper
from newspaper import Article

url3='http://www.nytimes.com/2010/08/04/nyregion/04shooting.html'
url4='http://www.nytimes.com/2010/08/04/nyregion/04gunman.html'
url5='http://www.nytimes.com/2010/08/05/nyregion/05shooting.html'
url6='http://www.nytimes.com/2010/08/05/nyregion/05vics.html'
urls=[url3, url4,url5,url6]
Nyt_HBC =pd.DataFrame()
for i in urls: 
    a=Article(i, language='en')
    a.download()
    a.parse()
    Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
Nyt_HBC.columns=['Title','Article']
Nyt_HBC

这是我的完整错误消息(快速注意,如果没有.parse(),则无法运行它) -

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-47-12545a6e9854> in <module>()
      9     a=Article(i, language='en')
     10     a.download()
---> 11     a.parse()
     12     Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
     13 Nyt_HBC.columns=['Title','Article']

/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in parse(self)
    226 
    227         if self.config.fetch_images:
--> 228             self.fetch_images()
    229 
    230         self.is_parsed = True

/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in fetch_images(self)
    245             first_img = self.extractor.get_first_img_url(
    246                 self.url, self.clean_top_node)
--> 247             self.set_top_img(first_img)
    248 
    249         if not self.has_top_image():

/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in set_top_img(self, src_url)
    399     def set_top_img(self, src_url):
    400         if src_url is not None:
--> 401             s = images.Scraper(self)
    402             if s.satisfies_requirements(src_url):
    403                 self.set_top_img_no_check(src_url)

AttributeError: 'module' object has no attribute 'Scraper'

0 个答案:

没有答案