计算txt文件中的字符串时出现问题

时间:2012-03-26 13:55:13

标签: for-loop heap filereader fileutils

我正在开发一个读取文本文件并创建报告的程序。报告的内容如下:文件中的每个字符串的数量,其“状态”以及每个字符串开头的一些符​​号。它适用于高达100 Mb的文件。

但是当我使用大于1,5Gb且包含超过100000行的输入文件运行程序时,我收到以下错误:

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source) at
> java.lang.String.<init>(Unknown Source) at
> java.lang.StringBuffer.toString(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:771) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:723) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:745) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1512) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1528) at
> org.apache.commons.io.ReadFileToListSample.main(ReadFileToListSample.java:43)

我将VM参数增加到-Xms128m -Xmx1600m(在eclipse运行配置中),但这没有帮助。来自OTN论坛的专家建议我阅读一些书籍并提高我的计划表现。有人可以帮我改进吗?谢谢。

代码:

import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.util.List;

public class ReadFileToList {

public static void main(String[] args) throws FileNotFoundException
{


File file_out = new File ("D:\\Docs\\test_out.txt");
FileOutputStream fos = new FileOutputStream(file_out); 
PrintStream ps = new PrintStream (fos);
System.setOut (ps);

// Create a file object
File file = new File("D:\\Docs\\test_in.txt");


FileReader fr = null;
LineNumberReader lnr = null; 


try {
// Here we read a file, sample.txt, using FileUtils
// class of commons-io. Using FileUtils.readLines()
// we can read file content line by line and return
// the result as a List of string.

List<String> contents = FileUtils.readLines(file);
//
// Iterate the result to print each line of the file.


fr = new FileReader(file); 
lnr = new LineNumberReader(fr); 

for (String line : contents)
{
String begin_line = line.substring(0, 38); // return 38 chars from the string
String begin_line_without_null = begin_line.replace("\u0000", " ");
String begin_line_without_null_spaces = begin_line_without_null.replaceAll(" +", " "); 

int stringlenght = line.length();
line = lnr.readLine(); 
int line_num = lnr.getLineNumber();

String status;

// some correct length for if
int c_u_length_f = 12;
int c_ea_length_f = 13;
int c_a_length_f = 2130;
int c_u_length_e = 3430;
int c_ea_length_e = 1331;
int c_a_length_e = 442;
int h_ext = 6;
int t_ext = 6;


if ( stringlenght == c_u_length_f ||
stringlenght == c_ea_length_f ||
stringlenght == c_a_length_f ||
stringlenght == c_u_length_e ||
stringlenght == c_ea_length_e ||
stringlenght == c_a_length_e ||
stringlenght == h_ext ||
stringlenght == t_ext)
status = "ok";
else status = "fail";



System.out.println(+ line_num + stringlenght + status + begin_line_without_null_spaces);


}
} catch (IOException e) {
e.printStackTrace();
}
}
}

来自OTN的专家表示,该程序打开输入并读取两次。可能是“for statement”中的一些错误?但我找不到它。 谢谢。

1 个答案:

答案 0 :(得分:1)

你在循环中声明变量并做了许多不必要的工作,包括两次读取文件 - 对于性能也不好。您可以使用行号阅读器获取行号和文本,并重用行变量(在循环外声明)。这是一个缩短版本,可以满足您的需求。您需要完成validLength方法来检查所有值,因为我只包含前几个测试。

import java.io.*;

public class TestFile {

//a method to determine if the length is valid implemented outside the method that does the reading
    private static String validLength(int length) {
        if (length == 12 || length == 13 || length == 2130) //you can finish it
            return "ok";
        return "fail";
    }

    public static void main(String[] args) {
        try {
            LineNumberReader lnr = new LineNumberReader(new FileReader(args[0]));
            BufferedWriter out = new BufferedWriter(new FileWriter(args[1]));
            String line;
            int length;
            while (null != (line = lnr.readLine())) {
                length = line.length();
                line = line.substring(0,38);
                line = line.replace("\u0000", " ");
                line = line.replace("+", " ");
                out.write( lnr.getLineNumber() + length + validLength(length) + line);
                out.newLine();
            }
            out.close();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

将其称为java TestFile D:\ Docs \ test_in.txt D:\ Docs \ test_in.txt或者如果要对其进行硬编码,请将args [0]和args [1]替换为文件名。