使用sed从html输入中删除标记

时间:2016-09-28 13:40:04

标签: sed

我有一个html表,我想从中删除某个类的行。 但是:当我尝试sed 's/<tr class="expandable">.*<\/tr>//g时,它什么都不做(比如:不删除标签)

示例输入可以是:

<tr><td>Some col</td></tr>
<tr class="expandable">
    <td colspan="6">
        <div class="expandable-content">
<p>Holds ACCA Practising Certificate: This indicates a member holding a practising certificate issued by ACCA. This means that the member is authorised to provide a range of general accountancy services to individuals and businesses, including business and tax advice and planning, preparation of personal and business tax returns, set up of book-keeping and business systems, providing book-keeping services, payroll work, assistance with management accounting help with raising finance, budgeting and cash-flow advice, business start-up advice and expert witness.</p>
        </div>
    </td>
</tr>

我不是sed专业人士,感谢您给予我任何帮助!

1 个答案:

答案 0 :(得分:2)

假设您的html是有效的XML,您可以使用之类的工具:

xmlstarlet ed -d '//tr[@class="expandable"]' <<ENDHTML
<html><body><table>
  <tr><td>Some col</td></tr>
  <tr class="expandable">
      <td colspan="6">
          <div class="expandable-content">
  <p>Holds ACCA Practising Certificate: This indicates a member holding a practising certificate issued by ACCA. This means that the member is authorised to provide a range of general accountancy services to individuals and businesses, including business and tax advice and planning, preparation of personal and business tax returns, set up of book-keeping and business systems, providing book-keeping services, payroll work, assistance with management accounting help with raising finance, budgeting and cash-flow advice, business start-up advice and expert witness.</p>
          </div>
      </td>
  </tr>
</table></body></html>
ENDHTML
<?xml version="1.0"?>
<html>
  <body>
    <table>
      <tr>
        <td>Some col</td>
      </tr>
    </table>
  </body>
</html>