Question

我正在尝试创建一个java程序，将某些资产文件从FTP服务器下载到本地文件。因为我的（免费）FTP服务器不支持超过几兆字节的文件大小，所以我决定在文件上传时将其拆分，并在程序下载时重新组合它们。这有效，但速度相当慢，因为对于每个文件，它必须得到InputStream，这需要一些时间。

我使用的FTP服务器有一种方法可以在不实际登录服务器的情况下下载文件，所以我使用这段代码来获取InputStream：

private static final InputStream getInputStream(String file) throws IOException {
    return new URL("http://site.website.com/path/" + file).openStream();
}

要获取资产文件的InputStream我正在使用此代码：

public static InputStream getAssetInputStream(String asset, int num) throws IOException, FTPException {
    try {
        return getInputStream("assets/" + asset + "_" + num + ".raf");
    } catch (Exception e) {
        // error handling
    }
}

因为getAssetInputStreams(String, int)方法需要一些时间来运行（特别是如果文件大小超过兆字节），我决定使实际下载文件的代码多线程。这是我的问题所在。

final Map<Integer, Boolean> done = new HashMap<Integer, Boolean>();
final Map<Integer, byte[]> parts = new HashMap<Integer, byte[]>();

for (int i = 0; i < numParts; i++) {
    final int part = i;
    done.put(part, false);

    new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                InputStream is = FTP.getAssetInputStream(asset, part);
                ByteArrayOutputStream baos = new ByteArrayOutputStream();

                byte[] buf = new byte[DOWNLOAD_BUFFER_SIZE];
                int len = 0;

                while ((len = is.read(buf)) > 0) {
                    baos.write(buf, 0, len);
                    curDownload.addAndGet(len);
                    totAssets.addAndGet(len);
                }

                parts.put(part, baos.toByteArray());
                done.put(part, true);
            } catch (IOException e) {
                // error handling
            } catch (FTPException e) {
                // error handling
            }
        }
    }, "Download-" + asset + "-" + i).start();
}

while (done.values().contains(false)) {
    try {
        Thread.sleep(100);
    } catch(InterruptedException e) {
        e.printStackTrace();
    }
}

File assetFile = new File(dir, "assets/" + asset + ".raf");
assetFile.createNewFile();
FileOutputStream fos = new FileOutputStream(assetFile);

for (int i = 0; i < numParts; i++) {
    fos.write(parts.get(i));
}

fos.close();

此代码有效，但并非总是如此。当我在台式计算机上运行它时，它几乎总是有效。不是100％的时间，但通常它工作得很好。在我的笔记本电脑上，它的互联网连接更糟糕，它几乎无法运行。结果是文件不完整。有时，它会下载50％的文件。有时，它下载90％的文件，每次都不同。

现在，如果我将.start()替换为.run()，那么即使在我的笔记本电脑上，代码也可以100％正常运行。然而，它非常慢，所以我宁愿不使用.run()。

有没有办法可以更改我的代码，以便它可以多线程工作？任何帮助将不胜感激。

Answer 1

首先，更换你的FTP服务器，有很多免费的FTP服务器支持任意文件大小服务和附加功能，但我离题了......

您的代码似乎有许多无关的问题，可能会导致您看到的行为，如下所述：

您从多个线程访问来自不受保护/未同步的访问的done和parts映射时存在竞争条件。这可能会导致线程之间的数据损坏和这些变量失去同步，这可能会导致done.values().contains(false)返回true，即使它确实没有。
您正以高频率反复呼叫done.values().contains()。虽然javadoc没有明确说明，但是哈希映射可能以O（n）方式遍历每个值以检查给定映射是否包含值。再加上其他线程正在修改地图这一事实，您将获得未定义的行为。根据{{1}} javadoc：

如果在对集合进行迭代时修改了映射（除了通过迭代器自己的删除操作），迭代的结果是未定义的。
您以某种方式呼叫values()，但声明您正在使用FTP。链接中的new URL("http://site.website.com/path/" + file).openStream();定义了试图打开的协议http://，openStream()不是http://。不确定这是否是拼写错误，或者您是指HTTP（或者您是否有HTTP服务器提供相同的文件）。
任何提升任何类型的ftp://的线程都会导致代码失败，因为并非所有部分都会完成＆＃34;＃34; （基于您的忙等待循环设计）。当然，您可能会修改一些其他逻辑以防止这种情况，但否则这是代码的潜在问题。
您没有关闭任何已打开的视频流。这可能意味着底层套接字本身也是开放的。这不仅构成资源泄漏，如果服务器本身具有某种最大数量的同时连接限制，则只会导致新连接失败，因为旧的，已完成的传输未关闭。

基于上述问题，我建议将下载逻辑移动到Callable任务中并通过Exception运行它们，如下所示：

ExecutorService

使用执行程序服务，您可以进一步优化多线程方案，因为只要片段（按顺序）可用，输出文件就会开始写入，并且线程本身会被重用以节省线程创建成本。

如上所述，可能存在同时链接太多导致服务器拒绝连接的情况（甚至更危险的是，编写EOF以使您认为该部件已下载）。在这种情况下，可以通过LinkedList<Callable<byte[]>> tasksToExecute = new LinkedList<>(); // Populate tasks to run for(int i = 0; i < numParts; i++){ final int part = i; // Lambda to tasksToExecute.add(() -> { InputStream is = null; try{ is = FTP.getAssetInputStream(asset, part); ByteArrayOutputStream baos = new ByteArrayOutputStream(); byte[] buf = new byte[DOWNLOAD_BUFFER_SIZE]; int len = 0; while((len = is.read(buf)) > 0){ baos.write(buf, 0, len); curDownload.addAndGet(len); totAssets.addAndGet(len); } return baos.toByteArray(); }catch(IOException e){ // handle exception }catch(FTPException e){ // handle exception }finally{ if(is != null){ try{ is.close(); }catch(IOException ignored){} } } return null; }); } // Retrieve an ExecutorService instance, note the use of work stealing pool is Java 8 only // This can be substituted for newFixedThreadPool(nThreads) for Java < 8 as well for tight control over number of simultaneous links ExecutorService executor = Executors.newWorkStealingPool(4); // Tells the executor to execute all the tasks and give us the results List<Future<byte[]>> resultFutures = executor.invokeAll(tasksToExecute); // Populates the file File assetFile = new File(dir, "assets/" + asset + ".raf"); assetFile.createNewFile(); try(FileOutputStream fos = new FileOutputStream(assetFile)){ // Iterate through the futures, writing them to file in order for(Future<byte[]> result : resultFutures){ byte[] partData = result.get(); if(partData == null){ // exception occured during downloading this part, handle appropriately }else{ fos.write(partData); } } }catch(IOException ex(){ // handle exception }调整工作线程的数量，以确保在任何给定时间，只有newFixedThreadPool(nThreads)次下载量可以同时发生。

多线程FTP InputStream的输出不一致

1 个答案: