我有这个HTML
<div class="postrow firs">
<h2 class="title icon">
This is the title
</h2>
<div class="content">
<div id="post_message_1668079">
<blockquote class="postcontent restore ">
<div>Category</div>
<div>Authour: Kim</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div> <div class="postrow">
<h2 class="title icon">
This is the title
</h2>
<div class="content">
<div id="post_message_1668079">
<blockquote class="postcontent restore ">
<div>Category</div>
line 1<br /> line2
</blockquote>
</div>
</div>
</div>
我想从每个具有“postrow”类的div中提取以下内容,并且还可能有另一个类,如<div class="postrow first">
。因此,班级“第一”不是我的关注,只需要在开头就有“后置”。
我试过的代码:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml("http://localhost/vanilla/");
List<string> facts = new List<string>();
foreach (HtmlNode li in doc.DocumentNode.SelectNodes("//div[@class='postrow']"))
{
facts.Add(li.InnerHtml);
foreach (String s in facts)
{
textBox1.Text += s + "/n";
}
}
答案 0 :(得分:1)
您的代码存在问题,您必须将html作为字符串而不是路径
doc.LoadHtml("http://localhost/vanilla/");
代替
var request = (HttpWebRequest)WebRequest.Create("http://localhost/vanilla/");
String response = request.GetResponse();
doc.loadHtml(response);
现在迭代解析的html