我正在开发一个读取文本文件并创建报告的程序。报告的内容如下:文件中的每个字符串的数量,其“状态”以及每个字符串开头的一些符号。它适用于高达100 Mb的文件。
但是当我使用大于1,5Gb且包含超过100000行的输入文件运行程序时,我收到以下错误:
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source) at
> java.lang.String.<init>(Unknown Source) at
> java.lang.StringBuffer.toString(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:771) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:723) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:745) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1512) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1528) at
> org.apache.commons.io.ReadFileToListSample.main(ReadFileToListSample.java:43)
我将VM参数增加到-Xms128m -Xmx1600m(在eclipse运行配置中),但这没有帮助。来自OTN论坛的专家建议我阅读一些书籍并提高我的计划表现。有人可以帮我改进吗?谢谢。
代码:
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.util.List;
public class ReadFileToList {
public static void main(String[] args) throws FileNotFoundException
{
File file_out = new File ("D:\\Docs\\test_out.txt");
FileOutputStream fos = new FileOutputStream(file_out);
PrintStream ps = new PrintStream (fos);
System.setOut (ps);
// Create a file object
File file = new File("D:\\Docs\\test_in.txt");
FileReader fr = null;
LineNumberReader lnr = null;
try {
// Here we read a file, sample.txt, using FileUtils
// class of commons-io. Using FileUtils.readLines()
// we can read file content line by line and return
// the result as a List of string.
List<String> contents = FileUtils.readLines(file);
//
// Iterate the result to print each line of the file.
fr = new FileReader(file);
lnr = new LineNumberReader(fr);
for (String line : contents)
{
String begin_line = line.substring(0, 38); // return 38 chars from the string
String begin_line_without_null = begin_line.replace("\u0000", " ");
String begin_line_without_null_spaces = begin_line_without_null.replaceAll(" +", " ");
int stringlenght = line.length();
line = lnr.readLine();
int line_num = lnr.getLineNumber();
String status;
// some correct length for if
int c_u_length_f = 12;
int c_ea_length_f = 13;
int c_a_length_f = 2130;
int c_u_length_e = 3430;
int c_ea_length_e = 1331;
int c_a_length_e = 442;
int h_ext = 6;
int t_ext = 6;
if ( stringlenght == c_u_length_f ||
stringlenght == c_ea_length_f ||
stringlenght == c_a_length_f ||
stringlenght == c_u_length_e ||
stringlenght == c_ea_length_e ||
stringlenght == c_a_length_e ||
stringlenght == h_ext ||
stringlenght == t_ext)
status = "ok";
else status = "fail";
System.out.println(+ line_num + stringlenght + status + begin_line_without_null_spaces);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
来自OTN的专家表示,该程序打开输入并读取两次。可能是“for statement”中的一些错误?但我找不到它。 谢谢。
答案 0 :(得分:1)
你在循环中声明变量并做了许多不必要的工作,包括两次读取文件 - 对于性能也不好。您可以使用行号阅读器获取行号和文本,并重用行变量(在循环外声明)。这是一个缩短版本,可以满足您的需求。您需要完成validLength方法来检查所有值,因为我只包含前几个测试。
import java.io.*;
public class TestFile {
//a method to determine if the length is valid implemented outside the method that does the reading
private static String validLength(int length) {
if (length == 12 || length == 13 || length == 2130) //you can finish it
return "ok";
return "fail";
}
public static void main(String[] args) {
try {
LineNumberReader lnr = new LineNumberReader(new FileReader(args[0]));
BufferedWriter out = new BufferedWriter(new FileWriter(args[1]));
String line;
int length;
while (null != (line = lnr.readLine())) {
length = line.length();
line = line.substring(0,38);
line = line.replace("\u0000", " ");
line = line.replace("+", " ");
out.write( lnr.getLineNumber() + length + validLength(length) + line);
out.newLine();
}
out.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
将其称为java TestFile D:\ Docs \ test_in.txt D:\ Docs \ test_in.txt或者如果要对其进行硬编码,请将args [0]和args [1]替换为文件名。