嘿伙计们我正在尝试使用beautifulSoup来获取字体标记的内容。在html页面中,我正在解析标签,我希望从以下内容中获取文本:
<font color="#000000">Text I want to extract</font>
关闭另一个stackOverFlow问题(how to extract text within font tag using beautifulsoup)我正在尝试使用
html = urlopen(str(BASE_URL)).read()
soup = BeautifulSoup(html, "lxml")
info=soup('font', color="#000000")
print str(info)
但print语句只返回[]
。
知道我做错了吗?
答案 0 :(得分:1)
你走了:
from bs4 import BeautifulSoup
html = """<font color="#000000">Text I want to extract</font>"""
soup = BeautifulSoup(html, 'html.parser')
result1 = soup.find('font').text # not specifying the color attribute
result2 = soup.find('font', {'color':'#000000'}).text # specifying the color attribute
print result1 # prints 'Text I want to extract'
print result2 # prints 'Text I want to extract'