Question

我需要删除html文件中的某些行，例如在<BR>INSTANCE NAME is : T0<BR>和最后一次出现的</table>之间，其中应包括上述两种模式。

样本输入：

</table>
<BR>INSTANCE NAME is : T0<BR>
<table BORDER=1 CELLPADDING=2>
<TD BGCOLOR=#5D6D7E><font color=white><center>ID</center></TD> <TD BGCOLOR=#5D6D7E><font color=white><center>Find</center></TD> <TD BGCOLOR=#5D6D7E><font color=white><center>count</center></TD>
</table>
<BR>INSTANCE NAME is : T0<BR>
<table BORDER=1 CELLPADDING=2>
<TD BGCOLOR=#5D6D7E><font color=white><center>ID</center></TD> <TD BGCOLOR=#5D6D7E><font color=white><center>Find</center></TD> <TD BGCOLOR=#5D6D7E><font color=white><center>count</center></TD>
</table>
<BR>INSTANCE NAME is : T0<BR>
<table BORDER=1 CELLPADDING=2>
<TD BGCOLOR=#5D6D7E><font color=white><center>ID</center></TD> <TD BGCOLOR=#5D6D7E><font color=white><center>Find</center></TD> <TD BGCOLOR=#5D6D7E><font color=white><center>count</center></TD>
 </table>
 </BODY>
 </HTML>

预期输出：

</table>
 </BODY>
 </HTML>

我尝试过：sed -n '/<BR>INSTANCE NAME is : T0<BR>,</table>d/ file_name`，但是它不起作用。

非常欢迎任何帮助！

Answer 1

sed -e '/^<BR\>/,/<\/table>/d' file_name

这将删除包括匹配行在内的所有行，并提供所需的输出。需要注意的几点：

如果您只想删除带有特定关键字的某些行，则无需给sed整行来删除它，仅关键字就足够了。
如果您的模式与某些字符（具有某些特殊含义的字符）匹配，则必须通过在关键字前面加上\来对其进行转义。在这里，您需要转义/的{{1}}标记，因为它对table

有关sed的参考，请参见sed

Answer 2

这可能对您有用（GNU sed）：

sed -r '/<BR>INSTANCE NAME is : T0<BR>/,${H;$!d;x;s/.*<\/table>[^\n]*\n//}' file

将<BR>INSTANCE NAME is : T0<BR>的第一个实例之间到文件末尾的所有行存储在保留空间中，不要通过删除它们立即打印这些行。在文件末尾，交换到保留空间并使用贪婪的东西，删除所有包含</table>的行，并打印其余部分。

在Linux中删除包含两种模式的行

2 个答案: