用Java下载几个文件后,下载速度会大大降低

时间:2019-03-07 00:34:15

标签: java networking download wget

我正在尝试从SEC下载数据。我在本地下载它们,将它们上传到Amazon S3,然后在本地删除文件。文件范围约为10-100MB。下载几个文件后,下载速度会大大降低。

这是我的输出示例。请忽略190 of 5761的内容;我是从189开始的,因此以前的条目不会下载。因此,这里发生的是该过程启动并高速下载了3个文件,总共约112 MB。然后,随后它仅以2-3 MB / s的速度下载,并且从未恢复到原始的快速速度(即使在20个以上的文件之后,这也是我保持运行状态的最长时间)。我还包括了上传速度-似乎并没有降低速度,不确定是否有用。

        Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961220.nc.tar.gz
        Elapsed: 0.934
        Size was 33 MB. Avg speed was 26 MB/s. 
        Finished uploading. Speed was 32 MB/s.
Downloading 190 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961223.nc.tar.gz (current total: 33 MB; average speed: 14 MB/s)
        Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961223.nc.tar.gz
        Elapsed: 1.523
        Size was 41 MB. Avg speed was 26 MB/s.
        Finished uploading. Speed was 59 MB/s.
Downloading 191 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961224.nc.tar.gz (current total: 74 MB; average speed: 16 MB/s)
        Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961224.nc.tar.gz
        Elapsed: 0.479
        Size was 38 MB. Avg speed was 79 MB/s.
        Finished uploading. Speed was 63 MB/s.
Downloading 192 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961226.nc.tar.gz (current total: 112 MB; average speed: 20 MB/s)
        Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961226.nc.tar.gz
        Elapsed: 3.859
        Size was 10 MB. Avg speed was 2 MB/s.
        Finished uploading. Speed was 37 MB/s.
Downloading 193 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961227.nc.tar.gz (current total: 122 MB; average speed: 12 MB/s)
        Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961227.nc.tar.gz
        Elapsed: 8.327
        Size was 30 MB. Avg speed was 3 MB/s.
        Finished uploading. Speed was 55 MB/s.

其他信息:

  • 我非常确定服务器上没有速率限制,因为当我在当前正在下载的URL上从命令行运行单独的wget时,下载速度很快
  • 我没有将这些文件保留在内存中。
  • 我已经跑了好几次了,这种行为在2-10次快速下载后持续发生。
  • 我将文件上传到单独的服务器后立即将其删除。
  • 在进程运行时,CPU和内存使用率不会增加。
  • 实际文件/ URL似乎无关紧要:我可以从任何索引(而不是189)开始下载,并且行为相同。
  • 我正在使用全新安装+ OpenJDK 1.8在Amazon EC2上运行它。

我尝试了两种不同的文件下载实现方式,但是两个版本都有相同的问题。这是我的下载实现。您可以从wget实现中特别看到,此瓶颈是在单个wget进程中发生的,因为Elapsed: __的时间在下载速度下降之后就增加了。

使用Apache FileUtils

下载
public static File downloadFileFromURL(String tmpPrefix, String url) throws IOException {
    File tmp = File.createTempFile(tmpPrefix, null);
    tmp.deleteOnExit();
    FileUtils.copyURLToFile(new URL(url), tmp);
    return tmp;
}

通过调用wget

下载
public static File downloadFileFromURLViaWget(String url) throws IOException, InterruptedException {
    System.out.println("\tDownloading via wget: " + url);
    long start = System.currentTimeMillis();
    Process pr = Runtime.getRuntime().exec(new String[]{"wget", "-nv", url});
    pr.waitFor();
    System.out.println("\tElapsed: " + (System.currentTimeMillis() - start) / 1000.0);
    String fileName = url.substring(url.lastIndexOf('/') + 1);
    File f = new File(fileName);
    f.deleteOnExit();
    return f;
}

我非常感谢对此有任何见解。我到处搜索,但未找到有关此问题的其他任何帖子。

0 个答案:

没有答案