BeautifulSoup删除标签

时间:2014-02-09 03:01:16

标签: python beautifulsoup

我正在尝试从源中删除样式标记及其内容,但它不起作用,没有错误只是不会分解。这就是我所拥有的:

source = BeautifulSoup(open("page.html"))
getbody = source.find('body')
for child in getbody[0].children:
    try:
        if child.get('style') is not None and child.get('style') == "display:none":
            # it in here
            child.decompose()
    except:
        continue
print source
# display:hidden div's are still there.

1 个答案:

答案 0 :(得分:0)

以下代码可以满足您的需求并且运行正常; 使用毯子除了处理以掩盖错误:

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
    hidden.decompose()

或者更好的是,使用正则表达式来扩展网络:

import re

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
    hidden.decompose()

Tag.children仅列出body标记的直接子项,而非所有嵌套子项。