WebRequest无法正确下载大文件(~1 GB)

时间:2012-12-07 18:39:00

标签: c# file web download webrequest

我正在尝试从公共网址下载大文件。它起初似乎运行良好,但1/10计算机似乎超时。我最初的尝试是使用WebClient.DownloadFileAsync但是因为它永远不会完成我回到使用WebRequest.Create并直接读取响应流。

我使用WebRequest.Create的第一个版本发现了与WebClient.DownloadFileAsync相同的问题。操作超时,文件未完成。

如果下载超时,我的下一个版本会添加重试次数。这是奇怪的。下载最终完成1次重试以完成最后7092字节。因此,文件的下载大小完全相同但文件已损坏且与源文件不同。现在我希望腐败在最后7092字节,但事实并非如此。

使用BeyondCompare我发现损坏的文件中缺少2个字节块,总计缺少7092个字节!这个缺失的字节位于1CA49FF01E31F380,在下载超时并重新启动之前。

这可能会发生什么?有关如何进一步追踪此问题的任何提示?

以下是相关代码。

public void DownloadFile(string sourceUri, string destinationPath)
{
    //roughly based on: http://stackoverflow.com/questions/2269607/how-to-programmatically-download-a-large-file-in-c-sharp
    //not using WebClient.DownloadFileAsync as it seems to stall out on large files rarely for unknown reasons.

    using (var fileStream = File.Open(destinationPath, FileMode.Create, FileAccess.Write, FileShare.Read))
    {
        long totalBytesToReceive = 0;
        long totalBytesReceived = 0;
        int attemptCount = 0;
        bool isFinished = false;

        while (!isFinished)
        {
            attemptCount += 1;

            if (attemptCount > 10)
            {
                throw new InvalidOperationException("Too many attempts to download. Aborting.");
            }

            try
            {
                var request = (HttpWebRequest)WebRequest.Create(sourceUri);

                request.Proxy = null;//http://stackoverflow.com/questions/754333/why-is-this-webrequest-code-slow/935728#935728
                _log.AddInformation("Request #{0}.", attemptCount);

                //continue downloading from last attempt.
                if (totalBytesReceived != 0)
                {
                    _log.AddInformation("Request resuming with range: {0} , {1}", totalBytesReceived, totalBytesToReceive);
                    request.AddRange(totalBytesReceived, totalBytesToReceive);
                }

                using (var response = request.GetResponse())
                {
                    _log.AddInformation("Received response. ContentLength={0} , ContentType={1}", response.ContentLength, response.ContentType);

                    if (totalBytesToReceive == 0)
                    {
                        totalBytesToReceive = response.ContentLength;
                    }

                    using (var responseStream = response.GetResponseStream())
                    {
                        _log.AddInformation("Beginning read of response stream.");
                        var buffer = new byte[4096];
                        int bytesRead = responseStream.Read(buffer, 0, buffer.Length);
                        while (bytesRead > 0)
                        {
                            fileStream.Write(buffer, 0, bytesRead);
                            totalBytesReceived += bytesRead;
                            bytesRead = responseStream.Read(buffer, 0, buffer.Length);
                        }

                        _log.AddInformation("Finished read of response stream.");
                    }
                }

                _log.AddInformation("Finished downloading file.");
                isFinished = true;
            }
            catch (Exception ex)
            {
                _log.AddInformation("Response raised exception ({0}). {1}", ex.GetType(), ex.Message);
            }
        }
    }
}

以下是损坏下载的日志输出:

Request #1.
Received response. ContentLength=939302925 , ContentType=application/zip
Beginning read of response stream.
Response raised exception (System.Net.WebException). The operation has timed out.
Request #2.
Request resuming with range: 939295833 , 939302925
Received response. ContentLength=7092 , ContentType=application/zip
Beginning read of response stream.
Finished read of response stream.
Finished downloading file.

4 个答案:

答案 0 :(得分:1)

这是我经常使用的方法,到目前为止,我还没有让你失败,因为你需要同样的负载。尝试使用我的代码更改你的代码,看看是否有帮助。

if (!Directory.Exists(localFolder))
{
    Directory.CreateDirectory(localFolder);   
}


try
{
    HttpWebRequest httpRequest = (HttpWebRequest)WebRequest.Create(Path.Combine(uri, filename));
    httpRequest.Method = "GET";

    // if the URI doesn't exist, exception gets thrown here...
    using (HttpWebResponse httpResponse = (HttpWebResponse)httpRequest.GetResponse())
    {
        using (Stream responseStream = httpResponse.GetResponseStream())
        {
            using (FileStream localFileStream = 
                new FileStream(Path.Combine(localFolder, filename), FileMode.Create))
            {
                var buffer = new byte[4096];
                long totalBytesRead = 0;
                int bytesRead;

                while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
                {
                    totalBytesRead += bytesRead;
                    localFileStream.Write(buffer, 0, bytesRead);
                }
            }
        }
    }
}
catch (Exception ex)
{        
    throw;
}

答案 1 :(得分:0)

您应该更改超时设置。似乎有两种可能的超时问题:

  • 客户端超时 - 尝试更改WebClient中的超时。我发现大文件下载有时我需要这样做。
  • 服务器端超时 - 尝试更改服务器上的超时。您可以使用其他客户端验证这是问题,例如邮差

答案 2 :(得分:0)

对我来说,关于如何通过缓冲读取文件的方法看起来很奇怪。 也许问题是,你做的

while(bytesRead > 0)

如果由于某种原因,流在某些时候没有返回任何字节但仍未完成下载,那么它将退出循环并且永远不会返回。您应该获取Content-Length,并通过bytesRead递增变量totalBytesReceived。最后,将循环更改为

while(totalBytesReceived < ContentLength)

答案 3 :(得分:0)

分配大于预期文件大小的缓冲区大小。

byte [] byteBuffer = new byte [65536];

这样,如果文件大小为1GiB,则分配1 GiB缓冲区,然后尝试在一次调用中填充整个缓冲区。这种填充可能返回更少的字节,但您仍然分配了整个缓冲区。请注意,.NET中单个数组的最大长度是32位数,这意味着即使您重新编译64位程序并且实际上有足够的可用内存。