我尝试了以下程序正常工作:
我想从网页中删除停用词,因此FEED_URL ='http://feeds.feedburner.com/oreilly/radar/atom'它成功运行但当我更改网址时会出现错误
import os
import sys
import json
import feedparser
from BeautifulSoup import BeautifulStoneSoup
from nltk import clean_html
FEED_URL = 'http://feeds.feedburner.com/oreilly/radar/atom'
def cleanHtml(html):
return BeautifulStoneSoup(clean_html(html),
convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0]
fp = feedparser.parse(FEED_URL)
print "Fetched %s entries from '%s'" % (len(fp.entries[0].title), fp.feed.title)
#print "Fetched %s entries from '%s'" % (len(fp.entries[0])
blog_posts = []
for e in fp.entries:
blog_posts.append({'title': e.title, 'content'
: cleanHtml(e.content[0].value), 'link': e.links[0].href})
out_file = os.path.join('resources', 'ch05-webpages', 'feed.json')
f = open(out_file, 'w')
f.write(json.dumps(blog_posts, indent=1))
f.close()
print ('Wrote output file to %s' % (f.name, ))
但是当我更改网址时,它会显示错误
FEED_URL = 'http://www.thehindu.com'
错误:
IndexError Traceback (most recent call last)
<ipython-input-1-b80b4061a360> in <module>()
14 fp = feedparser.parse(FEED_URL)
15
---> 16 print "Fetched %s entries from '%s'" % (len(fp.entries[0].title), fp.feed.title)
17 #print "Fetched %s entries from '%s'" % (len(fp.entries[0])
18
IndexError: list index out of range
所以有人可以帮我解决这个问题吗?
答案 0 :(得分:0)
您使用的Feed网址看起来不正确。
尝试:
FEED_URL = 'http://www.thehindu.com/?service=rss'