Python BeautifulSoup提取字体标记的内容

时间:2015-02-22 20:44:41

标签: python html beautifulsoup

嘿伙计们我正在尝试使用beautifulSoup来获取字体标记的内容。在html页面中,我正在解析标签,我希望从以下内容中获取文本:

<font color="#000000">Text I want to extract</font>

关闭另一个stackOverFlow问题(how to extract text within font tag using beautifulsoup)我正在尝试使用

html = urlopen(str(BASE_URL)).read()
soup = BeautifulSoup(html, "lxml")
info=soup('font', color="#000000")

print str(info)

但print语句只返回[]。 知道我做错了吗?

1 个答案:

答案 0 :(得分:1)

你走了:

from bs4 import BeautifulSoup

html = """<font color="#000000">Text I want to extract</font>"""

soup = BeautifulSoup(html, 'html.parser')

result1 = soup.find('font').text  # not specifying the color attribute
result2 = soup.find('font', {'color':'#000000'}).text  # specifying the color attribute

print result1  # prints 'Text I want to extract'
print result2  # prints 'Text I want to extract'