Question

根据我的项目，

通过查询从数据库中提取数据，结果集上有一个迭代器，并且数据已连续添加到此结果集中。

通过遍历Iterator对象，结果将添加到ArrayList。获得所有条目（超过200000）后，将其写入文件。

但由于它使用了更多的jvm堆空间，我需要使用一个在后台运行的工作线程并将数据写入文件。

由于我是多线程的新手，我想通过创建1个线程的固定线程池来使用Executor服务，并且每当条目达到50000的计数时，然后将这些条目提交给执行器以将它们附加到文件。

如果这种方法很好或者我需要遵循任何其他方法，请建议我。

Answer 1

我认为你不需要ThreadPool来处理单线程。您可以通过创建单个线程（伪代码）来实现：

    List<Entry> list = new ArraList<Entry>(); // class member that will hold the entries from Result set. I Assume entry as `Entry` here
    ....
    void addEntry(Entry entry){
      list.add(entry);
      if(list.size() >= 20000){
        //assign current list to a temp list inorder to reinitialze the list for next set of entries.
        final List tempList = list;// tempList has 20000 entries!
        list =  new ArraList<Entry>();// list is reinitialized

        // initiate a thread to write tempList to file
        Thread t =  new Thread(new Runnable(){

                public void run() {
                    // stuff that will write `tempList` to file

                }});

           t.start();// start thread for writing.It will be run in background and 
                     //the calling thread (from where you called `addEntry()` )will continue to add new entries to reinitialized list
       }//end of if condition
   }

注意：你提到了堆空间 - 即使我们使用线程，它仍然使用堆。

Answer 2

在线程中执行进程将释放主线程以执行其他操作。它不会解决您的堆空间问题。

堆空间问题是由查询返回的条目数引起的。您可以更改查询以仅返回设定数量的行。处理并从您处理的最后一行开始再次执行查询。

如果您使用的是MS SQL，则此处已经有了如何拆分查询的答案。

Row offset in SQL Server

Answer 3

在将它们写入文件之前，您不需要获取所有20000个条目，除非它们彼此之间存在某些依赖关系。

在最简单的情况下，您可以在获取文件时将条目直接写入文件，从而无需拥有大量堆。

它的高级版本是生产者 - 消费者模式，然后您可以调整该模式以获得不同的速度/内存使用特性。

创建在后台执行特定任务的工作线程

3 个答案: