Question

民间

我是python和beautifulsoup的新手 - 请耐心等待我。我正在尝试做一些html解析。

我想从所选属性中删除换行符和压缩空白（基于html文件中的字符串搜索。

例如，对于以下html，我想搜索具有字符串属性“xy”的所有标记，然后从该字符串中删除换行符和多个空格（替换为单个空格。

<html>   
    <head></head>   
    <body>
    <h1>xy
        z</h1>
    <p>xy
        z</p>
    <div align="center" style="margin-left: 0%; ">
      <b>
       <font style="font-family: 'Times New Roman', Times">
        ab    c
       </font>
       <font style="font-family: 'Times New Roman', Times">
        xy    z
       </font>
      </b>
     </div>  
    </body> 
</html>

生成的html应如下所示：

<html>   
  <head></head>   
  <body>
    <h1>xy z</h1>
    <p>xy z</p>
    <div align="center" style="margin-left: 0%; ">
      <b>
       <font style="font-family: 'Times New Roman', Times">
        ab    c
       </font>
       <font style="font-family: 'Times New Roman', Times">
        xy z
       </font>
      </b>
     </div>   
  </body> 
</html>

Answer 1

好的 - 我找到了一种方法...你使用findall然后使用replaceWith（）方法，如下所示。

......... 汤= BeautifulSoup（内容）     s = soup.findAll（text = re.compile（“xy”））
    对于s中的s1：
        s1.replaceWith（re.sub（'\ s +'，''，str（s1）））
............

使用beautifulsoup操作html文件中的字符串内容值

1 个答案: