我正在尝试编写一个过滤数据的程序。该数据包含27,000行,超过150mb。无论我如何尝试实现该功能,它都会在4,300线附近过早停止打印。我测试了循环而没有打印数据(只打印行号),它达到了完整的27,000行。我认为这可能是一个内存问题,但由于我是Java的新手,我不太确定问题出在哪里。现在的两个主要嫌疑人是line.substring和PrintStream类。请帮忙!
public static void main(String[] args) {
// tries to print output to output.csv in same directory
try {
PrintStream out = new PrintStream(new FileOutputStream("output.csv"));
System.setOut(out);
}
catch(IOException e1) {
System.out.println("Error during reading/writing");
}
// read input file
File inputFile = new File("my-large-file.txt");
if(!inputFile.canRead()) {
System.out.println("Required input file not found; exiting.");
System.exit(1);
}
// doesn't allow me to use scanner without try for some reason
try {
Scanner input = new Scanner(inputFile);
while (input.hasNextLine()) {
String line = input.nextLine();
// scan through each line
Scanner lineScan = new Scanner(line);
// if we find the line that we want to look through
if(lineScan.next().startsWith("1")) {
// prints the specific data to output
String a= line.substring(007, 666);
if (!(a== "the-number-that-I-don't-want")) {
String current = line.substring(1, 10);
String another = line.substring(10, 20).replaceAll("\\s+","");
String third = line.substring(20, 30).replaceAll("\\s +","");
String fourth = line.substring(40, 50);
...
String nth = line.substring(999, 1000);
System.out.print(current + ", ");
System.out.print(another + ", ");
System.out.print(third + ", ");
System.out.print(fourth + ", ");
...
System.out.print(nth);
System.out.println();
}
}
}
}
catch(IOException e) {
e.printStackTrace();
}
}
答案 0 :(得分:0)
String.substring需要有效的索引。字符串之间的比较使用equals
。
if (line.length() >= 666) { // Or even 1000
String a = line.substring(007, 666);
if (!a.equals("the-number-that-I-don't-want")) {
...
}
然后你应该关闭所有打开的东西。 lineScan
,尤其是input
。
在这种情况下,BufferedReader可能比分割令牌的Scanner更直观。 BufferedReader更简单,而且可能更快。
答案 1 :(得分:0)
我能够弄明白!谢谢你们指点我正确的方向。
我的程序的问题是我在内存中存储太多。我将每行存储在我的文件中,然后存储另一个扫描程序以扫描整行,存储字符串,连接字符串等。
使用StringBuffer而不是String,因为它们在进行连接时会提高性能。
以下是我修改后的解决方案,现在可以按预期运行文件和过滤器:
public static void main(String[] args) throws IOException {
FileInputStream inputStream = null;
Scanner sc = null;
try {
PrintStream out = new PrintStream(new FileOutputStream("output.csv"));
System.setOut(out);
}
catch(IOException e1) {
System.out.println("Error during reading/writing");
}
try {
inputStream = new FileInputStream("my-large-file.txt");
sc = new Scanner(inputStream, "UTF-8");
while (sc.hasNextLine()) {
String line = sc.nextLine();
// note the specific indecies of the substring are random nums, and does not affect the program. They could be anything.
if (!line.startsWith("the-number-that-I-don't-want"))) {
String filter2 = line.substring(55, 66);
if (!(filter2.equals("another-string-to-filter-out"))) {
StringBuffer current = new StringBuffer(line.substring(1, 10));
StringBuffer another = new StringBuffer(line.substring(10, 20).replaceAll("\\s+",""));
StringBuffer third = new StringBuffer(line.substring(22, 37).replaceAll("\\s +",""));
StringBuffer fourth = new StringBuffer(line.substring(37, 56));
...
StringBuffer nth = new StringBuffer(line.substring(999, 1000));
System.out.println(currentS + ", " + firstName + ", " + lastName + ", " + birthday + ", " + distributedAmt + ", " +awardYear + ", " + transactionNum + ", " + disbursementDate + ", " + efc + ", " + percentEligUsed + ", " + grantType);
}
}
}
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
if (inputStream != null) {
inputStream.close();
}
if (sc != null) {
sc.close();
}
}
}
此链接帮助了我很多:http://www.baeldung.com/java-read-lines-large-file