Feedparser无法解析以搜索描述

时间:2019-06-28 12:57:17

标签: python rss feedparser

我正在尝试利用RSS来获取有关我可能关注的特定安全漏洞的自动通知。我已经有了它的功能,可以在供稿条目的标题和URL中搜索关键字,但是它似乎忽略了rss描述。

我已经验证了提要中是否存在description字段(在发现该字段之前,我最初是使用Summary代替了description),但不明白为什么它不起作用(对于python来说是相对较新的)。可能是卫生问题,还是我在搜索方式上缺少什么?

#!/usr/bin/env python3.6

import feedparser

#Keywords to search for in the rss feed


key_words = ['Chrome','Tomcat','linux','windows']

# get the urls we have seen prior

f = open('viewed_urls.txt', 'r')
urls = f.readlines()
urls = [url.rstrip() for url in urls]
f.close()

#Returns true if keyword is in string

def contains_wanted(in_str):
    for wrd in key_words:
        if wrd.lower() in in_str:
            return True
    return False

#Returns true if url result has not been seen before

def url_is_new(urlstr):
    # returns true if the url string does not exist
    # in the list of strings extracted from the text file
    if urlstr in urls:
        return False
    else:
        return True

#actual parsing phase

feed = feedparser.parse('https://nvd.nist.gov/feeds/xml/cve/misc/nvd-rss.xml')
for key in feed["entries"]:
    title = key['title']
    url = key['links'][0]['href']
    description  = key['description']

#formats and outputs the specified rss fields

    if contains_wanted(title.lower()) and contains_wanted(description.lower()) and url_is_new(url):
        print('{} - {} - {}\n'.format(title, url, description))

#appends reoccurring rss feeds in the viewed_urls file
        with open('viewed_urls.txt', 'a') as f:
            f.write('{}\n'.format(title,url))

1 个答案:

答案 0 :(得分:0)

这有所帮助。我不知道合取逻辑,但是已经解决了。我省略了contains_wanted(title.lower()),因为在声明逻辑中这不是必需的,因为contains_wanted(description.lower())既可以实现标题声明的目的,也可以满足标题声明的目的。并获得适当的输出。

谢谢。