Python和BeautifulSoup URL解析

时间:2016-03-22 17:20:43

标签: python

我有以下代码,我正在尝试从redsox新闻中获取标题和说明。我有它的工作,但一个小细节。它显示标签。我怎样才能消除它们?

import urllib2
from BeautifulSoup import BeautifulSoup
# or if you're using BeautifulSoup4:
# from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen('http://partner.mlb.com/partnerxml/gen/news/rss/bos.xml').read())

title = soup.find('item').title
desc = soup.find('item').description

print "Title: %s " % (title)
print "Summary: %s " % (desc)

这就是它显示的内容

Title: <title>Shaw or Panda? Hot corner duel heats up</title> 
Summary: <description>With two weeks until Opening Day, the hottest topic in Red Sox camp is the competition at the hot corner between incumbent Pablo Sandoval and the emerging Travis Shaw.</description> 
>>> 

2 个答案:

答案 0 :(得分:2)

尝试:

print "Title: %s " % (title.text)
print "Summary: %s " % (desc.text)

你可以使用BeautifulSoup做得更好,但这是让它起作用的快捷方法。

答案 1 :(得分:-1)

print ("Title: %s " % (title.get_text()))
print ("Summary: %s " % (desc.get_text()))

这有效