Question

我正在寻找一种更快捷的方法来寻找＆amp;在我处理的许多html文件中复制两个标签（包括标签）之间的所有内容。我目前正在使用sublime手动复制每个文件。 html标记是常量（<center> </center>）。我尝试使用正则表达式完成此操作但没有成功。＆＃34; <center>(.*)</center>＆＃34; ...为了实现这一点，我会在崇高中键入什么？或者，如果有一个更好的方法，初学者可以轻松学习，我可以接受建议！

</head>

<body style="background-color:#9b9b9b;">
**<center>
<table width="580" border="0" cellspacing="0" cellpadding="0" align="center"  class ="responsive-table" style="background-color:#3e5b3e;border:solid thin #3e5b3e;" >
  <tbody>
    <tr>
      <td background="http://app.randomsite.com/js/ckfinder/userfiles//images/banner.jpg" style="padding-top:20px;padding-right:20px;padding-left:20px;" class="hideForMobile"><h1 style="font-family:Arial, Helvetica, sans-serif;font-size:20px;font-weight:bold;text-align:right;color:#eee;vertical-align:bottom;text-decoration:none;margin-top:0;margin-bottom:0;margin-right:0;margin-left:0;" >some message</h1></td>
    </tr>
    <tr>
</center>**
    <!---Start of Banner Image--->
      <td><a href="{{Custom1}}" style="color:inherit;text-decoration:none;" ><img src="http://app.clientcommand.com/js/ckfinder/userfiles//images/top-dollar-ford-banner.jpg" alt="" class="table.responsiveImage" style="display:block;width:100%;border-style:none;" /></a></td>
    <!---End of Banner Image--->
    </tr>
    <tr>

要温柔 - 我不熟悉编码

Answer 1

你的正则表达式遗漏了我的想法。使用.*你获取所有字符但不是换行符（换行符）试试这样的事情

<center>(.|\n)*<\/center>

变更部分的细分
. =所有字符为| =或者\n =换行（换行符）
(.|\n)* = 0或者更多次上面的行（贪婪，因为manny times as posible
see demo

如果您一次有更多部分，则可以使用<center>(.|\n)*?<\/center>

已更改部分的细分<{1}}会使其变得非贪婪，因此它会在?的第一次出现时返回
see demo

Answer 2

避免使用正则表达式来解析标记文件考虑使用Beautifulsoup来解析html文件并提取内部标记内容。

在你的情况下它应该是这样的： from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') for centered_content in soup.find_all('center'): ...(do what you want)...

如何找到＆amp;复制两个标签之间的所有内容（包括标签）

2 个答案: