我通过使用帖子底部提供的URL解析此feedparser来获取Feed 。
在上面提供的URL中,页面上的链接上有一个zip文件列表。 tutor (slide 8)想要使用下面的代码使用下面的代码提取所有链接(我猜测来自RSS Feed):
#Downloading the data - parsing the RSS feed to extract the ZIP file enclosure filename
# Process RSS feed and walk through all items contained
#if you are confused about smthng print(type(obj), repr(obj))
#feed= 'http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2012-02.xml'
for item in feed.entries:
print( item[ "summary" ], item[ "title" ], item[ "published" ] )
try:
# Identify ZIP file enclosure, if available
enclosures = [ l for l in item[ "links" ] if l[ "rel" ] == "enclosure" ]
if ( len( enclosures ) > 0 ):
# ZIP file enclosure exists, so we can just download the ZIP file
enclosure = enclosures[0]
sourceurl = enclosure[ "href" ]
cik = item[ "edgar_ciknumber" ]
targetfname = target_dir+cik +' - ' +sourceurl.split('/')[-1]
retry_counter = 3
while retry_counter > 0:
good_read = downloadfile( sourceurl, targetfname )
if good_read:
break
else:
print( "Retrying:", retry_counter )
retry_counter -= 1
except:
pass
问题是,无论我搜索多少,我都找不到Feed中每个zip的链接!特别是使用上面的代码。
我确信我错过了图片中的内容......
以下是我从feedparser获取的Feed:
{'updated': 'Tue, 25 Jun 2013 22:48:50 EDT', 'published': 'Tue, 25 Jun 2013 22:48:50 EDT', 'subtitle_detail': {'value': 'This is a list all of the filings containing XBRL for 2012-07', 'base': 'http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2012-07.xml', 'language': None, 'type': 'text/html'}, 'published_parsed': time.struct_time(tm_year=2013, tm_mon=6, tm_mday=26, tm_hour=2, tm_min=48, tm_sec=50, tm_wday=2, tm_yday=177, tm_isdst=0), 'updated_parsed': time.struct_time(tm_year=2013, tm_mon=6, tm_mday=26, tm_hour=2, tm_min=48, tm_sec=50, tm_wday=2, tm_yday=177, tm_isdst=0), 'links': [{'href': 'http://www.sec.gov/spotlight/xbrl/filings-and-feeds.shtml', 'rel': 'alternate', 'type': 'text/html'}, {'href': 'http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2012-07.xml', 'rel': 'self', 'type': 'application/rss+xml'}], 'title_detail': {'value': 'All XBRL Data Submitted to the SEC for 2012-07', 'base': 'http://www.sec.gov/Archives/edgar/monthly/xbrlrss-2012-07.xml', 'language': None, 'type': 'text/plain'}, 'subtitle': 'This is a list all of the filings containing XBRL for 2012-07', 'title': 'All XBRL Data Submitted to the SEC for 2012-07', 'language': 'en-us', 'link': 'http://www.sec.gov/spotlight/xbrl/filings-and-feeds.shtml'}
答案 0 :(得分:0)
至于我,你在feed
中得到的数据不完整
或者您没有在下一张幻灯片中添加其他代码,例如downloadfile()
。
此代码隐藏所有错误消息
except:
pass
所以将其改为
except Exception as e:
print( e )
获取一些错误信息
或删除try
和except pass
以获取完整的错误消息(追溯)
并再次尝试你的代码。