Question

以下HTML语句存储在字符串中。我需要删除HTML代码<style>和</style>

之间的文字

<html> <head><style type="text/css">
        @font-face { 
            font-family: "tunga";
            src: url(tunga.TTF); 
        }

        body {              
            font-family:"tunga";
            padding:0;
            margin: 0;
        }


        table {
            font-family:"tunga";
            padding:0;
        }

        a {
            text-decoration:none
        }

    </style></head>  <body marginwidth="0" marginheight="0" leftmargin="10" topmargin="0" >
    </body>
    </html>

如何使用c＃代码解决这个问题？

Answer 1

使用HtmlAgilityPack加载Html文件。

打开文件：

HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(myHtmlString);

然后删除节点：

foreach(var descendant in htmlDocument.DocumentNode.Descendants("style").ToList())
    descendant.Remove()

然后获取代表HTML文件的字符串：

string htmlWithoutStyle = htmlDocument.DocumentNode.OuterHtml;

Answer 2

string str = "<html> <head><style type='text/css'> jhiun  </style></head> </html>";
            Console.WriteLine(str);
            string strToRemove = str.Substring(str.IndexOf("<style"), str.IndexOf("</style>") - str.IndexOf("<style") + 8); 
            Console.WriteLine(str.Replace(strToRemove,""));
            Console.ReadLine();

Answer 3

您可以使用htmlagilitypack解决此问题。这个工具是为html解析等设计的。编写一个正则表达式或解析它你自己只会给你带来麻烦，并可能导致你的程序中的securtyrisks。

Answer 4

使用HtmlAgilityPack。不要试图推出自己的解析器。

var doc=new HtmlDocument();
doc.LoadHtml(html);
doc.DocumentNode.SelectSingleNode("//style").RemoveAllChildren();
using(var sw=new StringWriter())
{
    doc.Save(sw);
    var moddedHtml=sw.ToString();
}

Answer 5

此处无需使用额外的库。尝试这样的事情。

// Find the start tag
var start = html.IndexOf("<style");

// Find the end tag
var end = html.IndexOf("</style>") + 8;

// Remove the tag using Substring
var newHtml = html.Substring(0, start - 1) + html.Substring(end);

Answer 6

_htmlContent = Regex.Replace(_htmlContent, "< style.*?< /style>", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);

试试这个。如果您感觉很懒，请删除少于<符号后的额外空格，并想复制粘贴代码。

Answer 7

您可以使用额外的库或只使用简单的字符串删除...

RemoveHTMLTagsText("your html statement", "<style>");

public static string RemoveHTMLTagsText(string html, string tag)
{
      int startIndex = html.IndexOf(tag.Remove(tag.Length - 1));
      startIndex = html.IndexOf(">", startIndex) + 1;
      int endIndex = html.IndexOf(tag.Insert(1, "/"), startIndex) - startIndex;
      html = html.Remove(startIndex, endIndex);
      return html;
}

如何删除c＃中标签之间的文本？

7 个答案: