Question

我与主线程逐行浏览大型文本文件（5GB）。创建了一些其他线程来同时格式化这些行。

我使用Runnable类和Semaphore编写了一个解决方案，它控制了运行的线程数量。很遗憾Runnable没有提供返回值或抛出异常。如果抛出任何线程中的异常，我希望我的整个应用程序停止。

我现在正在尝试使用Callable和Future，但我出现内存错误。

public class ProcessLine implements Callable<Boolean> {
  private final String inputLine;

  public ProcessLine(String inputLine) {
    this.inputLine = inputLine;
  }

  @Override
  public Boolean call() throws Exception {
    formatLine(inputLine); // huge method which can throw exceptions

    return true;
  }
}

在打开文本文件之前：

ExecutorService executor = Executors.newFixedThreadPool(threads, new DaemonThreadFactory());
List<Future> futures = new ArrayList<Future>();

然后在遍历所有行的循环中：

ProcessLine processLine = new ProcessLine(inputLine);

Future f = executor.submit(processLine);
futures.add(f);

这里的第一个问题是Future列表中收集了所有futures个对象。当我每行有一个项目时，我的内存耗尽并不奇怪。

第二个问题是：我在处理文本文件的最后用Future方法检查所有get()个项目。我甚至没有注意到第一行是否抛出异常。

请帮我找出解决方法。

Answer 1

您可以使用this constructor创建自定义ThreadPoolExecutor，以限制待处理任务的数量，如下所示：

ExecutorService executor = new ThreadPoolExecutor(
        threads,
        threads,
        0L,
        TimeUnit.MILLISECONDS,
        new LinkedBlockingQueue<Runnable>(WORK_QUEUE_SIZE));

其中WORK_QUEUE_SIZE确定最大待定行数。

这是我提出的另一种方法。我不确定如何以优雅的方式合并ExecutorService。

import java.io.*;
import java.util.*;
import java.util.concurrent.*;
import java.util.concurrent.atomic.*;

public class Scratch {

    static Object lock = new Object();
    static AtomicBoolean keepRunning = new AtomicBoolean(true);
    static BlockingQueue<String> buf = new LinkedBlockingDeque<>(100);
    static List<Consumer> consumers  = Arrays.asList(new Consumer(),
                                                     new Consumer(),
                                                     new Consumer(),
                                                     new Consumer());

    public static void main(String [] args) {    

        // Start a producer
        new Producer().start();

        // Start consumers
        for (Consumer c : consumers)
            c.start();
    }

    static void stopConsumers() {
        System.out.println("Stopping consumers");
        keepRunning.set(false);
        for (Consumer c : consumers)
            c.interrupt();
    }

    static class Producer extends Thread {
        public void run() {
            try (BufferedReader br =
                    new BufferedReader(new FileReader("lines.txt"))) {
                String line;
                while (null != (line = br.readLine())) {
                    System.out.println(line);
                    buf.put(line);
                }
            } catch (Exception e) {
                e.printStackTrace();
                // Producer exception
            }

            // Wait for the consumers to finish off the last lines in the queue
            synchronized (lock) {
                while (!buf.isEmpty()) {
                    try {
                        lock.wait();
                    } catch (InterruptedException e) {
                        // TODO: Deal with interruption
                    }
                }
            }

            // The consumers are now hanging on buf.take. Interrupt them!
            stopConsumers();
        }
    }


    static class Consumer extends Thread {

        // Dummy process
        private boolean process(String str) {
            try {
                Thread.sleep(20);
            } catch (InterruptedException e) {
            }
            return true;
        }

        public void run() {
            System.out.println("Starting");

            while (keepRunning.get()) {
                try {
                    process(buf.take());
                } catch (InterruptedException e) {
                    // TODO: Handle interrupt
                    e.printStackTrace();
                } catch (Exception e) {
                    stopConsumers();  // Processing exception: Graceful shutdown
                }

                // Notify the producer that the queue might be empty.
                synchronized (lock) {
                    lock.notify();
                }
            }

            System.out.println("Stopping");
        }
    }

}

Answer 2

因此，存储任务处理的所有结果（每个使用Future）会占用太多内存，但您可以单独对这些结果进行进一步处理，而无需完整设置（对吗？）。

您可以考虑让每个任务将其结果传递到另一个工作队列，以供另一个线程池处理。如果第二个工作队列具有固定大小，则保证内存使用是有限的。这是管道和过滤器设计模式的变体。它具有很好的特性，如果第二阶段的处理太慢，最终第二个工作队列将填满，导致第一个线程池的线程被阻塞。然后，更多的CPU时间可用于第二个线程池的线程。也就是说，它以最大化吞吐量的方式自动在线程池之间共享CPU时间。

如果处理开始时（当处理的行数等于第二个队列的大小时），可以保证在有限时间内检查处理文件第一行的结果，这可以用于满足您对迅速处理问题的要求。

我已经将这个设计用于下载数据并将其写入文件的程序，以防止程序保留太多等待处理的数据。

Answer 3

我尝试了其他一些解决方案，但我认为我自己找到了最好的解决方案。

public static final ThreadStatus threadStatus = new ThreadStatus();

public static class ThreadStatus {
 private Exception exception = null;

 public void setException(Exception exception) {
   if(exception == null) {
     return;
   }

   this.exception = exception;
 }

 public Exception getException() {
   return exception;
 }

 public boolean exceptionThrown() {
   return exception != null;
 }

}

然后在线程的run()方法中：

catch(Exception e) {
  Main.threadStatus.setException(e);
}

在迭代所有行的循环中：

if(Main.threadStatus.exceptionThrown()) {
  throw Main.threadStatus.getException();
}

感谢所有帮助过我的人。

使用固定数量的线程处理大量多线程数据并允许异常

3 个答案: