BeautifulSoup:如何用span标签替换内容

时间:2015-09-14 14:42:55

标签: python html beautifulsoup html-parsing

........<p style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; text-indent:0px;">textHere

<span style=" font-family:'Noto Sans';">ABC</span></p>

<p style=" margin-top:12px; margin-bottom:0px; margin-left:0px; margin-right:0px; text-indent:0px;"><span style=" font.......

我有一个像上面这样的HTML。我需要

  1. 找到Noto Sans&#39;中的所有内容。 font-family(它们总是在span标签内)
  2. 替换它们(A为X,B为Y等......)而无需更改其余代码
  3. 我试过的是这个,但没有正常工作。

    from bs4 import BeautifulSoup
    source_code = """.....<span style=" font-family:'Noto Sans';">ABC</span></p>......""
    soup = BeautifulSoup(source_code, "lxml")
    
    for re in soup.findAll('font', 'face' = "Noto Sans"):
        print (re.replace("A", "X"))
    

    有什么想法?

1 个答案:

答案 0 :(得分:1)

您需要找到内置span的所有font-family: Noto Sans代码,然后将A替换为您X元素中的span内容实测值:

import re

from bs4 import BeautifulSoup


source_code = """.....<span style=" font-family:'Noto Sans';">ABC</span></p>......"""    
soup = BeautifulSoup(source_code, "lxml")

for elm in soup.find_all('span', style=re.compile(r"font-family:'Noto Sans'")):
    elm.string = elm.text.replace("A", "X")

print(soup.prettify())

打印:

<span style=" font-family:'Noto Sans';">
 XBC
</span>