Question

您好我正在为网站制作一个抓取工具。在大约3个小时的爬网后，我的应用程序停止了WebException。下面是我在c＃中的代码。 client是预定义的WebClient对象，每次gameDoc处理完毕后都会被处理掉。 gameDoc是HtmlDocument对象（来自HtmlAgilityPack）

while (retrygamedoc)
{
    try
    {
        gameDoc.LoadHtml(client.DownloadString(url)); // this line caused the exception
        retrygamedoc = false;
    }
    catch
    {
        client.Dispose();
        client = new WebClient();

        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

我尝试使用以下代码（以保持webclient新鲜）来自this回答

while (retrygamedoc)
{
    try
    {
        using (WebClient client2 = new WebClient())
        {
            gameDoc.LoadHtml(client2.DownloadString(url)); // this line cause the exception
            retrygamedoc = false;
        }
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

但结果仍然相同。然后我使用StreamReader，结果保持不变！下面是我使用StreamReader的代码。

while (retrygamedoc)
{
    try
    {
        // using native to check the result
        HttpWebRequest webreq = (HttpWebRequest)WebRequest.Create(url);
        string responsestring = string.Empty;
        HttpWebResponse response = (HttpWebResponse)webreq.GetResponse(); // this cause the exception
        using (StreamReader reader = new StreamReader(response.GetResponseStream()))
        {
            responsestring = reader.ReadToEnd();
        }
        gameDoc.LoadHtml(client.DownloadString(url));

        retrygamedoc = false;
    }
    catch
    {
        retrygamedoc = true;
        Thread.Sleep(500);
    }
}

我该怎么做并检查？我很困惑，因为我能够在一些页面上爬行，在同一个站点上，然后在大约1000次结果中，它会导致异常。来自例外的邮件仅为The request was aborted: The connection was closed unexpectedly.，状态为ConnectionClosed

PS。该应用程序是一个桌面表单应用程序。

更新：

现在我正在跳过这些值并将它们变为null，以便爬行可以继续。但是如果确实需要数据，我仍然需要手动更新爬行结果，这很累，因为结果包含数千条记录。请帮帮我。

示例：

就像你从网站上下载了大约1300个数据一样，然后当你所有的互联网连接仍然保持良好的速度时，应用程序停止说The request was aborted: The connection was closed unexpectedly.。

Answer 1

ConnectionClosed可能表示（并且可能确实）您正在下载的服务器正在关闭连接。也许它注意到了客户的大量请求，并拒绝为您提供额外的服务。

由于您无法控制服务器端的恶作剧，我建议您稍后重试下载时使用某种逻辑。

Answer 2

收到此错误，因为它从服务器返回为 404。

经过长时间运行后，连接意外关闭了C＃

2 个答案: