我试图编写一个小型网络应用,返回涉及关键字的新闻文章的情绪。
我使用了TextBlob和Newspaper3K python 3包。我试图将Newspaper3K的url字符串作为Google新闻搜索查询的结果,但报纸包似乎只是重定向到"主页"谷歌新闻。
有没有办法获得包含某个关键字的报纸文章列表?另外,报纸是否有可能遍历页面?
以下是我的代码:
from textblob import TextBlob
import newspaper
#keyword = input("Please enter the keyword: ")
keyword = "Apple" #for testing only
keyword_lowercase = keyword.lower()
search_string = "" # only for google news
split_keyword = keyword.split()
for i in range(len(split_keyword)):
search_string += split_keyword[i]
if i != len(split_keyword)-1:
search_string += '+'
def google_news_site(search_query):
prefix = 'http://news.google.com/news?q='
return prefix+search_string
#Currently for news.google.com only
url_string = google_news_site(search_string)
paper = newspaper.build(url_string, memoize_articles=False)
def sentiment(text):
return TextBlob(text).sentiment.polarity
current_sum = 0.0
relevant_article_count = 0
for article in paper.articles:
print(article.url)
article_text = article.text
article_text_lowercase = article_text.lower()
if keyword_lowercase in article_text_lowercase:
current_sum += sentiment(article_text)
print("Article count is", str(relevant_article_count)+".")
rating = current_sum/max(relevant_article_count, 1)
print("The rating for", keyword, "is", str(rating)+".")
答案 0 :(得分:0)
最简单的方法是设置名为searx的软件实例或使用诸如framabee.org之类的现有实例。
searx是一个元搜索引擎,它将查询实际的搜索引擎,合并结果并可能返回json文件。这是查询示例:
$ curl "https://framabee.org/?q=Apple&categories=news&time_range=week&language=en&format=json" | jq . | head -n 100
{
"number_of_results": 0,
"corrections": [],
"query": "Apple",
"infoboxes": [],
"suggestions": [],
"results": [
{
"engine": "bing news",
"category": "news",
"parsed_url": [
"https",
"www.apfelnews.de",
"/2019/09/22/apple-iphone-11-falltests-mit-unterschiedlichen-ergebnissen/",
"",
"",
""
],
"pubdate": "2019-09-22 08:28:00+0000",
"engines": [
"bing news"
],
"publishedDate": "il y a 9 heure(s), 5 minute(s)",
"url": "https://www.apfelnews.de/2019/09/22/apple-iphone-11-falltests-mit-unterschiedlichen-ergebnissen/",
"positions": [
1
],
"title": "Apple iPhone 11 Falltests mit unterschiedlichen Ergebnissen",
"content": "Auf der Keynote 2019 am 10. September 2019 wurde das Apple iPhone 11 mit dem härtesten Glas in einem Smartphone beworben.",
"pretty_url": "https://www.apfelnews.de/2019/09/22/ap[...]sts-mit-unterschiedlichen-ergebnissen/",
"score": 1,
"img_src": "http://www.bing.com/th?id=ON.EA4492580B994DBA90318950CC35E5A6&pid=News"
},
...
由于searx是python代码,因此您可以直接调用相应的python函数。