Question

请求的HTTP GET响应如下所示

    <html>
      <head>        <script type="text/javascript">----</script>        <script type="text/javascript">---</script>             <title>Detailed Notes</title>
      </head>
      <body style="background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000">           <p>this is one note&nbsp;</p>  </body>      </html>

我把它作为一个字符串，我必须阅读它的身体部分。

我尝试过HtmlAgility包，但HTML解析由于html内容中的一些特殊内容而失败（我认为来自评论脚本的内容会导致此问题）。

因此，要阅读标记内容，我正在考虑使用SubString操作。

类似于<body tag开头的SubString。

我们如何从文本的单词开头做SubString？

Answer 1

使用简单的SubString()与IndexOf（）+ LastIndexOf()：

string BodyContent = input.Substring(0, input.LastIndexOf("</body>") - 1).Substring(input.IndexOf("<body"));
BodyContent = BodyContent.Substring(BodyContent.IndexOf(">") + 1).Trim();

这将返回：
<p> this is one note </p>

string FullBody = input.Substring(0, input.LastIndexOf("</body>") + 7).Substring(input.IndexOf("<body")).Trim();

这将返回：

<body style = background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000' >< p > this is one note </p> </body>

Answer 2

＆＃34;会引起问题所以你需要更换每一个＆＃34;获得请求源后

WebClient client = new WebClient(); // make an instance of webclient
string source = client.DownloadString("url").Replace("\"",",,"); // get the html source and escape " with any charachter
string code = "<body style=\"background-color: #FFFFFF; border-width: 0px; font-family: sans-serif; font-size: 13; color: #000000\">           <p>this is one note&nbsp;</p>  </body>";
MatchCollection m0 = Regex.Matches(code, "(<body)(?<body>.*?)(</body>)", RegexOptions.Singleline); // use RE to get between tags
foreach (Match m in m0) // loop through the results
{
    string result = m.Groups["body"].Value.Replace(",,", "\""); // get the result and replace the " back
}

从一个单词的开头的子串

2 个答案: