Question

我有以下html源代码，它包含两个样式标记，使用正则表达式，我们可以从文件中删除所有html标记，但是我们无法删除第二个样式标记的内容

<style id="owaParaStyle" type="text/css">P {margin-top:0;margin-bottom:0;}</style>

C＃代码

1) Regex test = new Regex(@"<[^\>]*>{}");
2) strText = test.Replace(strText, String.Empty);

输出： -

1) Expected is blank but we get P {margin-top:0;margin-bottom:0;}

Answer 1

但我也希望删除样式标记的属性/值

您可以尝试使用与之前与捕获组匹配的文本相匹配的back reference。

要删除<...>至</...>内的所有内容，请使用以下正则表达式查找相同打开和关闭HTML标记。

                   <(\w+)[^>]*>.*<\/\1>
Captured Group 1-----^^^             ^^----- Back Reference first matched group

这是demo

Answer 2

是否要删除样式标记？

<style.*?</style>

我通常不建议使用正则表达式来匹配HTML / XML，除非您确定它总是具有某种结构。有更好的工具来处理XML。