Question

我有一个字符串：

string hmtl = "<DIV><B> xpto </B></DIV>

并需要删除<div>和</DIV>的代码。结果为：<B> xpto </B>

只需<DIV> and </DIV>即可删除大量html标记，但请保存<B> xpto </B>。

Answer 1

使用htmlagilitypack

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("<html>yourHtml</html>");

foreach(var item in doc.DocumentNode.SelectNodes("//div"))// "//div" is a xpath which means select div nodes that are anywhere in the html
{
 item.InnerHtml;//your div content
}

如果您只想要B标签..

foreach(var item in doc.DocumentNode.SelectNodes("//B"))
    {
     item.OuterHtml;//your B tag and its content
    }

Answer 2

如果你只是删除div标签，这将获得div标签以及它们可能具有的任何属性。

var html = 
  "<DIV><B> xpto <div text='abc'/></B></DIV><b>Other text <div>test</div>" 

var pattern = "@"(\</?DIV(.*?)/?\>)"";  

// Replace any match with nothing/empty string
Regex.Replace(html, pattern, string.Empty, RegexOptions.IgnoreCase);

<强>结果

<B> xpto </B><b>Other text test

Answer 3

使用Regex：

var result = Regex.Replace(html, @"</?DIV>", "");

<强>已更新

作为mentioned，通过此代码，正则表达式会删除所有其他标记B

var hmtl = "<DIV><B> xpto </B></DIV>";
var remainTag = "B";
var pattern = String.Format("(</?(?!{0})[^<>]*(?<!{0})>)", remainTag );
var result =  Regex.Replace(hmtl , pattern, "");

Answer 4

你可以使用常规

<[(/body|html)\s]*>

在c＃中：

 var result = Regex.Replace(html, @"<[(/body|html)\s]*>", "");

<html>
<body>
< / html> 
< / body>

Answer 5

html = Regex.Replace(html,@"<*DIV>", String.Empty);

只删除c＃上的一些html标签

5 个答案: