无法在.net中下载网页

时间:2019-03-26 17:02:10

标签: c# html-agility-pack webrequest

我做了一批解析gearbest.com的html页面以提取项目数据的示例(示例链接link)。 该网站更新之前,它一直工作到2-3周之前。 因此,我无法下载要解析的页面,并且我也无法理解为什么。 在更新之前,我确实要求使用HtmlAgilityPack提供以下代码。

HtmlWeb web = new HtmlWeb();    
HtmlDocument doc = null;    
doc = web.Load(url); //now this the point where is throw the exception

我尝试不使用框架,但在请求中添加了一些日期

HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html");
request.Credentials = CredentialCache.DefaultCredentials;
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36";
request.ContentType = "text/html; charset=UTF-8";
request.CookieContainer = new CookieContainer();
request.Headers.Add("accept-language", "it-IT,it;q=0.9,en-US;q=0.8,en;q=0.7");
request.Headers.Add("accept-encoding", "gzip, deflate, br");
request.Headers.Add("upgrade-insecure-requests", "1");
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8";
request.CookieContainer = new CookieContainer();

Response response = request.GetResponse();  //exception

例外是:

  • IOException:无法从传输连接读取数据
  • SocketException:无法建立连接。

如果我尝试请求主页(https://it.gearbest.com),它将起作用。

您认为出了什么问题?

2 个答案:

答案 0 :(得分:0)

可能值得一试...

HttpRequest.KeepAlive = false; 
HttpRequest.ProtocolVersion = HttpVersion.Version10;

https://stackoverflow.com/a/16140621/1302730

答案 1 :(得分:0)

由于某种原因,它不喜欢提供的用户代理。如果您省略设置int TestGetMatrixSection(Eigen::MatrixXi const & matrixToTest, int trials=1) { int maxR = std::max(matrixToTest.rows(), matrixToTest.cols()); ExtendedMatrix<int> eMat(matrixToTest, maxR); int result = 0; for(int t = 0; t < trials; ++t) { // eMat.getBaseMatrix().setRandom(); // or change individual entries ... for (int i = 1; i <= std::max(matrixToTest.rows(), matrixToTest.cols()); ++i) { for (int j = 0; j < matrixToTest.rows(); ++j) { for (int k = 0; k < matrixToTest.cols(); ++k) { // std::cout << GetMatrixSection(matrixToTest, j, k, i) << "/n/n"; // printout result += eMat.getMatrixSection(j, k, i).cols(); } } } } return result; } ,则一切正常

UserAgent

另一种解决方案是将HttpWebRequest request = (HttpWebRequest) WebRequest.Create("https://it.gearbest.com/tv-box/pp_009940949913.html"); request.Credentials = CredentialCache.DefaultCredentials; //request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"; request.ContentType = "text/html; charset=UTF-8"; 设置为随机字符串(而不是request.Connectionkeep-alive

close

它也可以,但是我无法解释原因。