Question

我正在阅读各种文件，每个文件包含大约10,000,000行。前几个文件被快速读取但性能在第7个文件大约降低。事实上，我必须使用-XX:-UseGCOverheadLimit

这是非常低效的

HashMap<String,String> hm = new HashMap();
File dir2 = new File(direc);
File[] directoryListing2= null;

   directoryListing2 = dir2.listFiles();

  if (directoryListing2 != null) {

    for (File child2 : directoryListing2) {
        BufferedReader br2= null;   

        br2 = new BufferedReader(new FileReader(child2));

        String line2=null;

            while ((line2 = br2.readLine()) != null) {
                if(!(line2.isEmpty())){


                    JSONObject thedata = new JSONObject(line2);

                         String name = (String)thedata.get("name");
                         String surname = (String)thedata.get("surname");
                         hm.put(name, surname);

                     }
                }
            br2.close();

            }

    }

为什么性能会降低太多，如何才能提高效率呢？

Answer 1

您在地图中插入了10百万条记录 - 每个条目至少使用28个字节（假设姓氏为一个字符），如果姓氏较长则更多。

28是一个粗略的估计：每个字符串指针4个字节= 8个字节，1个字符串为16个字节，4个字节用于引用地图中的条目 - 它可能需要更多但是给出一个数量级

因此每个文件读取使用至少280MB的堆。你正在做7次=＆gt; 2GB。这就是假设所有的值都是一个字符长 - 我想它们不是。

你需要有一个足够大的最大堆大小，否则代码会给垃圾收集器带来很大的压力，并且可能会耗尽内存。

正如评论中所提到的，你也可以假设地图避免过多的重复。

提高阅读表现

1 个答案: