我想获取网页的html。然后使用这个html,有两个我想要阅读的xpath。我对这个话题知之甚少。
搜索时我会不断看到示例,但是会加载网址并将html放入字符串中。但是我相信,因为我有两个xpath,所以最好将网页的html下载为html文档而不是字符串,或者我错了?
using (WebClient client = new WebClient()) {
string s = client.DownloadString(url);
}
那么如何将网页的html下载到我可以搜索的html文档中呢?
答案 0 :(得分:1)
这就是我这样做的方式。
HttpWebRequest
类的字符串。HtmlAgilityPack
,因此您应该将其包含在项目中(例如使用Nugger)。 HtmlDocument
的对象,并将数据加载到此对象。现在,您可以浏览HtmlDocument
。
string urlAddress = "url.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
string data = "";
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
HtmlDocument document2 = new HtmlAgilityPack.HtmlDocument();
document2.LoadHtml(data);
答案 1 :(得分:-2)
您可以使用StreamWriter将下载的数据写入文件:
string s = string.Empty;
using(WebClient client = new WebClient())
{
string s = client.DownloadString(url);
}
using (FileStream fs = new FileStream("test.html", FileMode.Create))
{
using (StreamWriter w = new StreamWriter(fs, Encoding.UTF8))
{
w.WriteLine(s);
}
}