Question

我想删除  或   的长期运行（50+）（此处剪裁的尾随空格？），但decompose()在{{1}的迭代中无效}}。对于BeautifulSoup正则表达式，.find_all已转换为unicode。

在迭代中打印显示找到的内容（“..snipped ..”表示我缩短了过长的行）：

&nbsp;

输出html仍然包含：

compileMe = ('(' + u'\xa0' + ' *){50,}') 
for i in (soup.find_all('p', string=re.compile(compileMe))):
    print(i)  # ==> " <p>        ..snipped..       </p>"
    print(i.text.strip())  # ==> "      ..snipped..       "
    print(i.text.strip)  # ==> "<built-in method strip of str object at 0x110184c30>"
    i.decompose()  # ==> no effect on output
    i.p.decompose()  # ==> 'NoneType' object has no attribute 'decompose'

P.S。：没有“decompose（）”标签，会创建一个有保证吗？

Answer 1

解决。回答任何遇到同一陷阱的人：

分解被<p>内的标签（已经用先前的分解清空）阻止以进行分解。首先删除它（find_all('thisEmptyTag', text='')）修复了问题。

编辑：使用text而非string非常重要，因为text包含所有子字符串。对任何人都有帮助......

通过使用decompose（）迭代find_all没有任何效果

1 个答案: