Question

我正在研究一个基本的python脚本，该脚本可分析来自SEC.gov网站的RSS Feed数据，但是运行该脚本时它将失败。我要去哪里错了？

我正在使用的Python版本是3.6.5，并且我尝试使用库Atoma和feedparser，但无法成功提取任何SEC RSS数据。老实说，这可能是rss feed数据的格式无效（我检查了https://validator.w3.org/feed/，它表明该数据无效）。但是，当我在Google Chrome RSS feed扩展中尝试同一行时，它可以工作，所以我一定做错了。有人知道如何解决格式问题，还是在Python中以错误的方式解决问题？

import atoma, requests

feed_name = "SEC FEED"
url ='https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001616707&type=&dateb=&owner=exclude&start=0&count=100&output=atom'
response = requests.get(url)
feed = atoma.parse_rss_bytes(response.content)

for post in feed.items:
  date = post.pub_date.strftime('(%Y/%m/%d)')
  print("post date: " + date)
  print("post title: " + post.title)
  print("post link: " + post.link)

Answer 1

这是在 Python 中解决问题的另一种方法：

import requests
import feedparser
import datetime

feed_name = "SEC FEED"
url ='https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0001616707&type=&dateb=&owner=exclude&start=0&count=100&output=atom'
response = requests.get(url)
feed = feedparser.parse(response.content)

for entry in feed['entries']:
    dt = datetime.datetime.strptime(entry['filing-date'], '%Y-%m-%d')
    print('Date: ', dt.strftime('(%Y/%m/%d)'))
    print('Title: ', entry['title'])
    print(entry['link'])
    print('\n')

网址中没有 pub_date 字段，但您可以使用申请日期或选择其他日期。您应该得到如下所示的输出：

日期：(2021/03/11) 标题：8-K - 当前报告 https://www.sec.gov/Archives/edgar/data/1616707/000161670721000075/0001616707-21-000075-index.htm

日期：(2021/02/25) 标题： S-8 - 在员工福利计划中向员工提供的证券 https://www.sec.gov/Archives/edgar/data/1616707/000161670721000066/0001616707-21-000066-index.htm

日期：(2021/02/25) 标题：10-K - 年度报告 [第 13 和 15(d) 节，不是 S-K 项目 405] https://www.sec.gov/Archives/edgar/data/1616707/000161670721000064/0001616707-21-000064-index.htm

无法使用python解析RSS feed，但是chrome中的其他RSS feed应用程序可以解析数据

1 个答案: