从特定标签BeautifulSoup / Python中删除样式

时间:2014-03-19 06:01:02

标签: python html html-parsing beautifulsoup

我们说我有汤,我想删除所有段落的所有样式标签。所以我想在整个汤中将<p style='blah' id='bla' class=...>变成<p id='bla' class=...>。但我不想触摸<img style='...'>标签。我该怎么做?

1 个答案:

答案 0 :(得分:3)

我们的想法是使用p迭代所有find_all('p')代码并删除样式属性:

from bs4 import BeautifulSoup


data = """
<body>
    <p style='blah' id='bla1'>paragraph1</p>
    <p style='blah' id='bla2'>paragraph2</p>
    <p style='blah' id='bla3'>paragraph3</p>
    <img style="awesome_image"/>
</body>"""


soup = BeautifulSoup(data, 'html.parser')
for p in soup.find_all('p'):
    if 'style' in p.attrs:
        del p.attrs['style']

print soup.prettify()

打印:

<body>
 <p id="bla1">
  paragraph1
 </p>
 <p id="bla2">
  paragraph2
 </p>
 <p id="bla3">
  paragraph3
 </p>
 <img style="awesome_image"/>
</body>