Question

我有N个ID的long。对于每个ID，我都需要执行一个Runnable（即，我不在乎返回值），并等到所有它们都完成为止。每个Runnable可能需要花费几秒钟到几分钟的时间，并且可以并行运行大约100个线程。

在当前解决方案中，我们使用Executors.newFixedThreadPool（），为每个ID调用Submit（），然后在每个返回的Future上调用get（）。

代码运行良好，并且非常简单，因为我不必处理线程，复杂的等待逻辑等。它有一个缺点：内存占用量。

所有仍排队的Runnable的内存消耗（比很长的时间要多8个字节：这是我的Java类，带有某些内部状态），并且所有N Future实例也都消耗内存（这些Java类具有状态，也仅用于等待，但不需要实际结果）。我查看了一个堆转储，我估计N = 1000万将占用1 GiB以上的内存。阵列中的1000万个长度只会消耗76 MiB。

是否有办法仅将ID保留在内存中，最好不采用低级并发编程来解决此问题？

Answer 1

是的：您可以有一个共享的long队列。您向执行器提交了n个Runnable，其中n是执行器中的线程数，在run方法的结尾，您从队列中获得了下一个long，然后重新提交了一个新的{ {1}}。

Answer 2

创建特定的线程池而不是创建数百万个Runnable，这些线程池需要很长时间才能完成任务。不用等待任务完成Future.get（），而使用CountdownLatch。

该线程池可以这样实现：

int N = 1000000;// number of tasks;
int T = 100; // number of threads;
CountdownLatch latch = new CountdownLatch(N);
ArrayBlockingQueue<Long> queue = new ArrayBlockingQueue<>();

for (int k=0; k<N; k++) {
   queue.put(createNumber(k));
}
for (int k=0; k<T; k++) {
  new WorkingThread().start();
}
CountdownLatch.await();

class WorkingThread extends Thread {
  public void run() {
      while (latch.getCount() != 0) {
           processNumber(queue.take());
           latch.countDown();
      }
  }
}

Answer 3

使用ExecutorCompletionService怎么样？类似于以下内容（可能包含错误，但我没有对其进行测试）：

import java.util.concurrent.Executor;
import java.util.concurrent.ExecutorCompletionService;
import java.util.function.LongFunction;

public class Foo {

  private final ExecutorCompletionService<Void> completionService;
  private final LongFunction<Runnable> taskCreator;
  private final long maxRunning; // max tasks running or queued

  public Foo(Executor executor, LongFunction<Runnable> taskCreator, long maxRunning) {
    this.completionService = new ExecutorCompletionService<>(executor);
    this.taskCreator = taskCreator;
    this.maxRunning = maxRunning;
  }

  public synchronized void processIds(long[] ids) throws InterruptedException {
    int completed = 0;

    int running = 0;
    for (long id : ids) {
      if (running < maxRunning) {
        completionService.submit(taskCreator.apply(id), null);
        running++;
      } else {
        completionService.take();
        running--;
        completed++;
      }
    }

    while (completed < ids.length) {
      completionService.take();
      completed++;
    }

  }

}

上述内容的另一个版本可以使用Semaphore和CountDownLatch，而不是CompletionService。

public static void processIds(long[] ids, Executor executor,
                              int max, LongFunction<Runnable> taskSup) throws InterruptedException {
  CountDownLatch latch = new CountDownLatch(ids.length);
  Semaphore semaphore = new Semaphore(max);

  for (long id : ids) {
    semaphore.acquire();

    Runnable task = taskSup.apply(id);
    executor.execute(() -> {
      try {
        task.run();
      } finally {
        semaphore.release();
        latch.countDown();
      }
    });

  }

  latch.await();
}

Answer 4

这是我通常用Producer / Consummer模式和一个BlockingQueue协调两者的事情，或者如果我有项目，可以使用Akka actor。

但是我认为我建议依靠Java的Stream行为来解决一些问题。

直觉是流的惰性执行将用于限制工作单元，期货及其结果的创建。

1.3.20

说实话，我不能确定规范是否保证了这种行为，但是根据我的经验，它是可行的。并行的“块”中的每一个都获取一些ID，将其馈送到管道（映射到工作单元，调度到线程池，等待结果，过滤异常），这意味着可以很快达到平衡平衡有效工作单位数量与public static void main(String[] args) { // So we have a list of ids, I stream it // (note : if we have an iterator, you could group it by a batch of, say 100, // and then flat map each batch) LongStream ids = LongStream.range(0, 10_000_000L); // This is were the actual tasks will be dispatched ExecutorService executor = Executors.newFixedThreadPool(4); // For each id to compute, create a runnable, which I call "WorkUnit" Optional<Exception> error = ids.mapToObj(WorkUnit::new) // create a parralel stream // this allows the stream engine to launch the next instructions concurrently .parallel() // We dispatch ("parallely") the work units to a thread and have them execute .map(workUnit -> CompletableFuture.runAsync(workUnit, executor)) // And then we wait for the unit of work to complete .map(future -> { try { future.get(); } catch (Exception e) { // we do care about exceptions return e; } finally { System.out.println("Done with a work unit "); } // we do not care for the result return null; }) // Keep exceptions on the stream .filter(Objects::nonNull) // Stop as soon as one is found .findFirst(); executor.shutdown(); System.out.println(error.isPresent()); }的数量。

如果要微调并行“块”的数量，请在此处跟进：Custom thread pool in Java 8 parallel stream

以较小的内存占用空间执行数百万个Runnable

4 个答案: