将网页html下载到html文档中

时间:2017-04-06 12:17:30

标签: c# html .net

我想获取网页的html。然后使用这个html,有两个我想要阅读的xpath。我对这个话题知之甚少。

搜索时我会不断看到示例,但是会加载网址并将html放入字符串中。但是我相信,因为我有两个xpath,所以最好将网页的html下载为html文档而不是字符串,或者我错了?

using (WebClient client = new WebClient()) {
    string s = client.DownloadString(url);
}

那么如何将网页的html下载到我可以搜索的html文档中呢?

2 个答案:

答案 0 :(得分:1)

这就是我这样做的方式。

  1. 首先,您要在字符串变量中定义您的网址。
  2. 然后下载HttpWebRequest类的字符串。
  3. 我使用HtmlAgilityPack,因此您应该将其包含在项目中(例如使用Nugger)。
  4. 创建HtmlDocument的对象,并将数据加载到此对象。
  5. 现在,您可以浏览HtmlDocument

     string urlAddress = "url.com";
    
     HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
     HttpWebResponse response = (HttpWebResponse)request.GetResponse();
     string data = "";
     if (response.StatusCode == HttpStatusCode.OK)
     {
     Stream receiveStream = response.GetResponseStream();
     StreamReader readStream = null;
    
     if (response.CharacterSet == null)
     {
         readStream = new StreamReader(receiveStream);
     }
     else
     {
         readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
     }
    
     data = readStream.ReadToEnd();
    
    
     response.Close();
     readStream.Close();
    }
    
     HtmlDocument document2 = new HtmlAgilityPack.HtmlDocument();
     document2.LoadHtml(data);
    

答案 1 :(得分:-2)

您可以使用StreamWriter将下载的数据写入文件:

string s = string.Empty;
using(WebClient client = new WebClient()) 
{
  string s = client.DownloadString(url);
}

using (FileStream fs = new FileStream("test.html", FileMode.Create)) 
 { 
  using (StreamWriter w = new StreamWriter(fs, Encoding.UTF8)) 
   { 
    w.WriteLine(s); 
   } 
  } 
相关问题