我正在尝试从SEC下载数据。我在本地下载它们,将它们上传到Amazon S3,然后在本地删除文件。文件范围约为10-100MB。下载几个文件后,下载速度会大大降低。
这是我的输出示例。请忽略190 of 5761
的内容;我是从189开始的,因此以前的条目不会下载。因此,这里发生的是该过程启动并高速下载了3个文件,总共约112 MB。然后,随后它仅以2-3 MB / s的速度下载,并且从未恢复到原始的快速速度(即使在20个以上的文件之后,这也是我保持运行状态的最长时间)。我还包括了上传速度-似乎并没有降低速度,不确定是否有用。
Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961220.nc.tar.gz
Elapsed: 0.934
Size was 33 MB. Avg speed was 26 MB/s.
Finished uploading. Speed was 32 MB/s.
Downloading 190 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961223.nc.tar.gz (current total: 33 MB; average speed: 14 MB/s)
Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961223.nc.tar.gz
Elapsed: 1.523
Size was 41 MB. Avg speed was 26 MB/s.
Finished uploading. Speed was 59 MB/s.
Downloading 191 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961224.nc.tar.gz (current total: 74 MB; average speed: 16 MB/s)
Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961224.nc.tar.gz
Elapsed: 0.479
Size was 38 MB. Avg speed was 79 MB/s.
Finished uploading. Speed was 63 MB/s.
Downloading 192 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961226.nc.tar.gz (current total: 112 MB; average speed: 20 MB/s)
Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961226.nc.tar.gz
Elapsed: 3.859
Size was 10 MB. Avg speed was 2 MB/s.
Finished uploading. Speed was 37 MB/s.
Downloading 193 of 5761: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961227.nc.tar.gz (current total: 122 MB; average speed: 12 MB/s)
Downloading via wget: https://www.sec.gov/Archives/edgar/Feed/1996/QTR4/19961227.nc.tar.gz
Elapsed: 8.327
Size was 30 MB. Avg speed was 3 MB/s.
Finished uploading. Speed was 55 MB/s.
其他信息:
wget
时,下载速度很快我尝试了两种不同的文件下载实现方式,但是两个版本都有相同的问题。这是我的下载实现。您可以从wget实现中特别看到,此瓶颈是在单个wget
进程中发生的,因为Elapsed: __
的时间在下载速度下降之后就增加了。
使用Apache FileUtils
public static File downloadFileFromURL(String tmpPrefix, String url) throws IOException {
File tmp = File.createTempFile(tmpPrefix, null);
tmp.deleteOnExit();
FileUtils.copyURLToFile(new URL(url), tmp);
return tmp;
}
通过调用wget
public static File downloadFileFromURLViaWget(String url) throws IOException, InterruptedException {
System.out.println("\tDownloading via wget: " + url);
long start = System.currentTimeMillis();
Process pr = Runtime.getRuntime().exec(new String[]{"wget", "-nv", url});
pr.waitFor();
System.out.println("\tElapsed: " + (System.currentTimeMillis() - start) / 1000.0);
String fileName = url.substring(url.lastIndexOf('/') + 1);
File f = new File(fileName);
f.deleteOnExit();
return f;
}
我非常感谢对此有任何见解。我到处搜索,但未找到有关此问题的其他任何帖子。