python scrape html字体标记

时间:2016-01-30 09:59:49

标签: python screen-scraping

我是编程和python的新手。我无法从html中删除字体标记文本。这是我的代码。我需要提取所有文本并计算它。我不知道我没有考虑到因为运行程序而得到空响应。

from bs4 import BeautifulSoup

html = """<P STYLE="margin-bottom: 0in">&quot;amy in marketing press one amanda in groups press two to repeat this menu press star&quot;</P>
<P STYLE="margin-bottom: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in">Labels:<FONT COLOR="#ff0000">Machine-Message,In-House-Alternative,Company-Alternative;</FONT></P>
<P STYLE="margin-bottom: 0in"><FONT COLOR="#00b050">Machine-Message,</FONT><FONT COLOR="#00b050">Greetings-Other;</FONT></P>
<P STYLE="margin-bottom: 0in"><FONT COLOR="#0070c0">Machine-Message,</FONT>
<FONT COLOR="#0070c0">Personal-Information;</FONT></P>
<P STYLE="margin-bottom: 0in"><BR>
</P>"""

soup = BeautifulSoup(html)
print(soup.find('FONT', COLOR="#ff0000"))

1 个答案:

答案 0 :(得分:2)

你缺少引号“并在soup.find中使用小写标记名称或者为了获取所有出现的结果find_all

from bs4 import BeautifulSoup

html = """<P STYLE="margin-bottom: 0in">&quot;amy in marketing press one amanda in groups press two to repeat this menu press star&quot;</P>
<P STYLE="margin-bottom: 0in"><BR>
</P>
<P STYLE="margin-bottom: 0in">Labels:<FONT COLOR="#ff0000">Machine-Message,In-House-Alternative,Company-Alternative;</FONT></P>
<P STYLE="margin-bottom: 0in"><FONT COLOR="#00b050">Machine-Message,</FONT><FONT COLOR="#00b050">Greetings-Other;</FONT></P>
<P STYLE="margin-bottom: 0in"><FONT COLOR="#0070c0">Machine-Message,</FONT>
<FONT COLOR="#0070c0">Personal-Information;</FONT></P>
<P STYLE="margin-bottom: 0in"><BR>
</P>"""
soup = BeautifulSoup(html)
print(soup.find("font", color="#ff0000").text)