无法从传输连接中读取数据:C#HtmlAgilityPack

时间:2013-08-25 22:43:47

标签: c# web-scraping html-agility-pack inner-exception

所以我正在用C#中的HtmlAgilityPack创建一个程序(用于自己的目的),在某个点加载一个网页。 加载大量页面后,我收到此错误:

Unhandled Exception: System.IO.IOException: Unable to read data from the transpo
rt connection: An existing connection was forcibly closed by the remote host. --
-> System.Net.Sockets.SocketException: An existing connection was forcibly close
d by the remote host
   at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size,
 SocketFlags socketFlags)
   at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 s
ize)
   --- End of inner exception stack trace ---
   at System.Net.ConnectStream.Read(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.StreamReader.ReadBuffer()
   at System.IO.StreamReader.ReadToEnd()
   at HtmlAgilityPack.HtmlDocument.Load(TextReader reader) in d:\Source\htmlagil
itypack.new\Trunk\HtmlAgilityPack\HtmlDocument.cs:line 612
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocum
ent doc, IWebProxy proxy, ICredentials creds) in d:\Source\htmlagilitypack.new\T
runk\HtmlAgilityPack\HtmlWeb.cs:line 1422
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, Ne
tworkCredential creds) in d:\Source\htmlagilitypack.new\Trunk\HtmlAgilityPack\Ht
mlWeb.cs:line 1479
   at HtmlAgilityPack.HtmlWeb.Load(String url, String method) in d:\Source\htmla
gilitypack.new\Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1103
   at HtmlAgilityPack.HtmlWeb.Load(String url) in d:\Source\htmlagilitypack.new\
Trunk\HtmlAgilityPack\HtmlWeb.cs:line 1061
   at ConsoleApplication1.Program.Main(String[] args) in 
c:\Users\...ConsoleApplication1\Program.c
s:line 37

在第37行,我在forloop中加载一个页面:

for (var i = 0; i< 5000; i++)
    var page = web.Load(url+Convert.ToString(i+1)+"/");

我曾尝试对错误进行一些研究,但没有太多的内容 在那里形成。

1 个答案:

答案 0 :(得分:0)

下载了1000多个网页后,我遇到了同样的错误。在循环中解决了与IOException有关的额外问题。 这是我的代码:

HtmlWeb web = new HtmlWeb();
web.PreRequest = delegate(HttpWebRequest webRequest)
{
   webRequest.Timeout = 15000;
   return true;
};

try { doc = web.Load(yUrl); }
catch (WebException ex)
{
    reTryCounter++;
    if (reTryCounter == 19) { MessageBox.Show("Error Program 1121 , Download webpage \n" + ex.ToString());  }
}
catch (IOException ex2)
{
    MessageBox.Show("Error Program 1125 , IOException Download webpage \n" + ex2.ToString());
    return null;
}