Question

我是Java中的多线程新手。我在java中实现了一个多线程程序来处理数组，需要你的帮助和建议来优化它并尽可能地重构它。

方案
我们得到一个巨大的csv文件，它有超过1000行，我们需要处理它。所以我基本上将它们转换为数组，拆分它们并传递给执行程序，输入将是数组的子集。现在我将数组拆分为20个相等的子集并传递给20个线程执行。这需要约2分钟，这很好。如果没有多线程，则需要30分钟。

需要帮助

我正在给出下面代码的快照。虽然它工作正常，但我想知道是否有任何方法可以将其标准化并重构它。它现在看起来很笨拙。更具体地说，如果我可以参数化它，而不是创建单独的线程运行器，那么它会很棒。

代码

private static void ProcessRecords(List<String[]> inputCSVData)    
{    

// Do some operation    
}    


**In the main program**    

public static void main(String[] args)throws ClassNotFoundException, SQLException, IOException, InterruptedException         
{            
    int size = csvData.size();        
    // Split the array        
    int firstArraySize = (size / 20);        
    int secondArrayEndIndex = (firstArraySize * 2) - 1;        

    csvData1 = csvData.subList(1, firstArraySize);    
    csvData2 = csvData.subList(firstArraySize, secondArrayEndIndex);    
    // ....    and so on    

    Thread thread1 = new Thread(new Runnable() {    
    public void run() {    
    try {    
    ProcessRecords(csvData1);        
    } catch (ClassNotFoundException | SQLException | IOException e) {    
    // TODO Auto-generated catch block    
    e.printStackTrace();    
    }    
    }    
    });    

    Thread thread2 = new Thread(new Runnable() {    
    public void run() 
    {     
    try {    
    ProcessRecords(csvData2);            
    } catch (ClassNotFoundException | SQLException | IOException e) {    
    // TODO Auto-generated catch block    
    e.printStackTrace();        
    }    
    }    
    });    

    **and so on for 20 times**    

    thread1.start();     
    thread2.start();    
    //... For all remaining threads    
    // thread20.start();    

    thread1.join();    
    thread2.join();    
    //... For all remaining threads    
    // thread20.join();    

    }

Answer 1

从Java 7开始，由于Fork/Join Framework，您可以开箱即用地实现这种机制。从Java 8开始，您可以使用Stream API更准确地使用并行流在场景后面使用ForkJoinPool，以便利用其work-stealing算法提供最好的表现。

在您的情况下，您可以逐行处理它：

csvData.parallelStream().forEach(MyClass::ProcessRecord);

使用类型为ProcessRecord的方法MyClass：

private static void ProcessRecord(String[] inputCSVData){
    // Do some operation
}

默认情况下，并行流将使用大小与ForkJoinPool相对应的公共 Runtime.getRuntime().availableProcessors()，如果您有IO任务，那么这对于IO很少的任务就足够了如果您希望增加池的大小，只需向自定义ForkJoinPool提供初始任务，则并行流将使用您的池而不是公共池。

ForkJoinPool forkJoinPool = new ForkJoinPool(20);
forkJoinPool.submit(() -> csvData.parallelStream().forEach(MyClass::ProcessRecord)).get();

Answer 2

你做了很多多余的工作来到这里。您可以将ExecutorService与FixedThreadPool一起使用，并将任务提交给线程池，而不是硬编码20个线程。

另外，如何确定线程数的值为20？使用，

Runtime.getRuntime().availableProcessors();

确定运行时的核心数。

public static void main(String[] args) throws ClassNotFoundException, SQLException, IOException, InterruptedException {
    int size = csvData.size();
    int threadCount = Runtime.getRuntime().availableProcessors();
    ExecutorService executorService = Executors.newFixedThreadPool(threadCount);

    int index = 0;
    int chunkSize = size / threadCount;
    while (index < size) {
        final int start = index;
        executorService.submit(new Runnable() {
            @Override
            public void run() {
                try {
                    ProcessRecords(csvData.subList(start, chunkSize));
                } catch (ClassNotFoundException | SQLException | IOException e) {
                    e.printStackTrace();
                }
            }
        });
        index += chunkSize;
    }
    executorService.shutdown();

    while(!executorService.isTerminated()){
        Thread.sleep(1000); //soround with try catch for InterruptedException
    }
}

使用多线程处理大型数组的替代应用程序

2 个答案: