我正在尝试使用代理服务器抓取网站。这是我的代码
try{
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(proxy+link);
req.Timeout = 60000;//timeout 2 minutes
req.UserAgent = AGENTLIST[agentCount];
req.ServicePoint.ConnectionLimit = 1;
//response
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
Stream dataStream = res.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string html = reader.ReadToEnd();
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
body = doc.DocumentNode.SelectSingleNode("//body");
}catch(Exception e)
{ Console.WriteLine(e.Message);
Console.WriteLine(e.StackTrace);
Console.ReadLine();
}
我收到此错误
System.Net.WebException: The underlying connection was closed: An unexpected error occurred on a receive. ---> System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.PooledStream.Read(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.Connection.SyncRead(HttpWebRequest request, Boolean userRetrievedStream, Boolean probeRead)
--- End of inner exception stack trace ---
at System.Net.HttpWebRequest.GetResponse()
at ASP.osproxy_getpage_aspx.Page_Load(Object sender, EventArgs e) in c:\SharedCrawl\Dropbox\sharedcrawl\osproxy\GetPage.aspx:line 30
如果我这样做
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(link); //no proxy
我没有收到错误。
有人可以解释错误的含义吗?
由于 [R
答案 0 :(得分:0)
这不是您在Web请求上使用代理的方式。您应该使用HttpWebRequest.Proxy属性。
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(link);
req.Proxy = new WebProxy(proxyAddress, proxyPort);