Question

我有一个大小约为2 GB的文本文件。该文件的每一行都具有以下格式：

某些文字可能用逗号分隔，唯一整数

我需要把每一行分成两部分： text，唯一整数，并将其作为键值对放在Hashmap中。

现在，即使堆大小设置为10 GB，我也面临OutOfMemory Error。

这可能有两个原因： 1.我正在阅读文件的方式是错误的。 2.我创建了太多不必要的String对象。

这就是我在做的事情：

InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("filename.txt");

InputStreamReader stream = new InputStreamReader(is, StandardCharsets.UTF_8);

BufferedReader reader = new BufferedReader(stream);

while(true)
{
 line =reader.readLine();
 if (line == null) {
  break;
 }
 String text= line.substring(0, line.lastIndexOf(",")).trim();

 String id = line.substring(line.lastIndexOf(",") + 1).trim();

 //put this in a hashmap and other processing
}

由于我需要将文本的每一行分成两部分，而第一部分（文本）也可能有逗号，所以我正在使用substring（）方法。

我使用trim的原因是我需要将文本和id放在Hashmap中，而不是尾随和引导空格。

错误讯息：

 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at java.lang.String.substring(String.java:1969)

Answer 1

你应该添加循环条件。请使用下划线代码再次尝试。它似乎工作！

    try {
        String line;

        while ((line = reader.readLine()) != null) {
            String text = line.substring(0, line.lastIndexOf(",")).trim();

            String id = line.substring(line.lastIndexOf(",") + 1).trim();

            //put this in a hashmap and other processing
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

java.lang.OutOfMemoryError：读取大文本文件时超出了GC开销限制

1 个答案: