我试过这个,它似乎没有起作用。我只需要列表中的文章链接。
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml")
bsObj = BeautifulSoup(html.read(),"html.parser");
for link in bsObj.find_all('a'):
print(link.get('href'))
答案 0 :(得分:0)
即使它在通过浏览器访问时呈现为HTML,服务器也会将XML返回给Python。如果您print(html.read())
,您将看到该XML。
在此XML中,<a>
代码替换为<link>
代码(没有属性),因此您需要更改代码以反映:
from urllib import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml")
bsObj = BeautifulSoup(html.read(),"html.parser");
for link in bsObj.find_all('link'):
print(link.text)
# http://www.bbc.co.uk/news/
# http://www.bbc.co.uk/news/
# http://www.bbc.co.uk/news/entertainment-arts-41914725
# http://www.bbc.co.uk/news/entertainment-arts-41886207
# http://www.bbc.co.uk/news/entertainment-arts-41886475
# ...
# ...
答案 1 :(得分:0)
import feedparser
url='http://feeds.bbci.co.uk/news/entertainment_and_arts/rss.xml'
data = feedparser.parse(url)
i=0
while i < len(data):
print(data['entries'][i]["link"])
i=i+1