Question

我有一个大文件（类似于3GB）并读入ArrayList 当我运行下面的代码时，几分钟后代码运行速度非常慢，CPU使用率很高。几分钟后，eclipse控制台显示错误java.lang.OutOfMemoryError：超出GC开销限制。

OS：windows2008R2，
4杯，
32GB内存
java版“1.7.0_60”

的eclipse.ini

-startup
plugins/org.eclipse.equinox.launcher_1.3.0.v20130327-1440.jar
--launcher.library
plugins/org.eclipse.equinox.launcher.win32.win32.x86_64_1.1.200.v20140116-2212
-product
org.eclipse.epp.package.standard.product
--launcher.defaultAction
openFile
#--launcher.XXMaxPermSize
#256M
-showsplash
org.eclipse.platform
#--launcher.XXMaxPermSize
#256m
--launcher.defaultAction
openFile
--launcher.appendVmargs
-vmargs
-Dosgi.requiredJavaVersion=1.6
-Xms10G
-Xmx10G
-XX:+UseParallelGC
-XX:ParallelGCThreads=24
-XX:MaxGCPauseMillis=1000
-XX:+UseAdaptiveSizePolicy

java代码：

BufferedInputStream bis = new BufferedInputStream(new FileInputStream(new File("/words/wordlist.dat")));        
            InputStreamReader isr = new InputStreamReader(bis,"utf-8");
            BufferedReader in = new BufferedReader(isr,1024*1024*512);

            String strTemp = null;
            long ind = 0;

            while (((strTemp = in.readLine()) != null)) 
            {
                matcher.reset(strTemp);

                if(strTemp.contains("$"))
                {
                    al.add(strTemp);
                    strTemp = null;
                }
                ind = ind + 1;
                if(ind%100000==0)
                {
                    System.out.println(ind+"    100,000 +");
                }

            }
            in.close();

我的用例：

neural network
java
oracle
solaris
quick sort
apple
green fluorescent protein
acm
trs

Answer 1

在java中编写程序以获取有关在搜索词日志列表中找到该关键字的次数的统计信息

我建议你这样做。创建一个地图，计算关键字的出现次数或所有单词的重要性。

使用Java 8流，您可以在一行或两行中执行此操作，而无需一次将整个文件加载到内存中。

try (Stream<String> s = Files.lines(Paths.get("filename"))) {
    Map<String, Long> count = s.flatMap(line -> Stream.of(line.trim().split(" +")))
            .collect(Collectors.groupingBy(w -> w, Collectors.counting()));
}

用Java读取大文件，速度太慢，超出了gc开销限制

1 个答案: