Question

嘿伙计们我正在尝试使用beautifulSoup来获取字体标记的内容。在html页面中，我正在解析标签，我希望从以下内容中获取文本：

<font color="#000000">Text I want to extract</font>

关闭另一个stackOverFlow问题（how to extract text within font tag using beautifulsoup）我正在尝试使用

html = urlopen(str(BASE_URL)).read()
soup = BeautifulSoup(html, "lxml")
info=soup('font', color="#000000")

print str(info)

但print语句只返回[]。知道我做错了吗？

Answer 1

你走了：

from bs4 import BeautifulSoup

html = """<font color="#000000">Text I want to extract</font>"""

soup = BeautifulSoup(html, 'html.parser')

result1 = soup.find('font').text  # not specifying the color attribute
result2 = soup.find('font', {'color':'#000000'}).text  # specifying the color attribute

print result1  # prints 'Text I want to extract'
print result2  # prints 'Text I want to extract'

Python BeautifulSoup提取字体标记的内容

1 个答案: