为什么我的Newspaper3k代码不能与《新闻周刊》一起使用?

时间:2019-06-15 23:42:27

标签: python-newspaper

我正在使用Jupyter笔记本电脑工作,并且报纸出现问题,无法从新闻周刊中提取任何内容。我可以让它在Goose上运行,但我想进行备份以防Goose失败。

我曾经尝试过其他网站,例如Fox,Yahoo和CNN,但这些网站都可以正常运行。因此,《新闻周刊》是一个孤立的问题。

from newspaper import Article
url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod- 
calling-daughter-trump-press-secretary-sarah-sanders-1444184'
article = Article(url)
article.download()
article.html
article.parse()
article.text

Article `download()` failed with 403 Client Error: Forbidden for url: 
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter- 
trump-press-secretary-sarah-sanders-1444184 on URL 
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter- 
trump-press-secretary-sarah-sanders-1444184

1 个答案:

答案 0 :(得分:1)

您可能已经解决了此问题,但是与您通过 Newspaper 索取文章时未通过用户代理直接相关。

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'

config = Config()
config.browser_user_agent = user_agent

url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-trump-press-secretary-sarah-sanders-1444184'

article = Article(url, config=config)
article.download()
article.html
article.parse()
article.text