Question

我正在使用python feedparser解析某些rss提要（每2小时一次），不幸的是rss提要不包含etag或修改后的值，因此，每当我解析提要时，每次都会获取全部数据。我正在考虑创建从feedparser.parse返回的条目的哈希并将其存储在数据库中，以便下一次再次解析时，我可以与哈希进行比较，看看feed是否已更改，然后才开始对每个项目进行解析在提要中我的问题

还有其他/更好的方法来查看rss feed是否已更新

如何创建哈希？只需执行以下操作即可

[{"VALUE":"03","ATTRIBUTE":"Laayelbxw"},
 {"VALUE":"01","ATTRIBUTE":"Leruaret"},
 {"VALUE":"08","ATTRIBUTE":"Lscwbryeiyabwaa"},
 {"VALUE":"09","ATTRIBUTE":"Leruxyklrwbwaa"}]

将hex_dig存储在数据库中

Answer 1

似乎可以对FEEDPARSER_RESPONSE进行哈希处理，尤其是在您的供稿中不存在etag或修改后的值的情况下。您没有提供RSS供稿的链接，因此我正在使用CNN的一个作为答案。

import hashlib
import feedparser

cnn_top_news = feedparser.parse('http://rss.cnn.com/rss/cnn_topstories.rss')

# I using entries, because in testing it gave me the same hash.
news_updated = cnn_top_news.entries

###################################################################
# During testing all of these items worked for creating the hash.
# So there are multiple options to choice from.   
#
# cnn_top_news['entries']
# titles = [entry.title for entry in cnn_top_news['entries']]
# summaries = [entry.summary for entry in cnn_top_news['entries']]
###################################################################

hash_object = hashlib.sha256(str(news_updated).encode('utf-8'))
hex_dig = hash_object.hexdigest()

print (hex_dig)
# output 
371c5730c7f1407878a32a814bc72542b48a43e1f7670eae0627d2617289161b

从RSS提要中获取新项目

1 个答案: