如何在java中标记输入文件

时间:2011-07-24 03:11:32

标签: java tokenize

我正在用java标记文本文件。我想读取一个输入文件,对其进行标记,然后将已经标记化的某个字符写入输出文件。这就是我到目前为止所做的:

 
package org.apache.lucene.analysis;

import java.io.*;

class StringProcessing { // Create BufferedReader class instance public static void main(String[] args) throws IOException { InputStreamReader input = new InputStreamReader(System.in); BufferedReader keyboardInput = new BufferedReader(input); System.out.print("Please enter a java file name: "); String filename = keyboardInput.readLine(); if(!filename.endsWith(".DAT")) { System.out.println("This is not a DAT file."); System.exit(0); } File File = new File(filename); if(File.exists()) { FileReader file = new FileReader(filename); StreamTokenizer streamTokenizer = new StreamTokenizer(file); int i=0; int numberOfTokensGenerated = 0; while(i != StreamTokenizer.TT_EOF) { i = streamTokenizer.nextToken(); numberOfTokensGenerated++; } // Output number of characters in the line System.out.println("Number of tokens = " + numberOfTokensGenerated); // Output tokens for (int counter=0; counter < numberOfTokensGenerated; counter++) { char character = file.toString().charAt(counter); if (character == ' ') System.out.println();
else System.out.print(character); } } else { System.out.println("File does not exist!"); System.exit(0); }

System.out.println("\n"); }//end main }//end class <code>

当我运行此代码时,这就是我得到的:

Please enter a java file name: D://eclipse-java-helios-SR1-win32/LexractData.DAT Number of tokens = 129 java.io.FileReader@19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25 at java.lang.String.charAt(Unknown Source) at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40)

输入文件如下所示:

`-K1账户

- Op1撤回

--- Param1 an

----输入Int

--- Param2金额

----输入Int

- Op2存款

--- Param1 an

----输入Int

--- Param2金额

----输入Int

- CA1 acNo

---输入Int

-K2 CheckAccount

- SC帐户

- CA1 credit_limit

---输入Int

-K3客户

- CA1名称

---输入字符串

-K4交易

- CA1日期

---键入日期

- CA2时间

---输入时间

-K5支票簿

-K6检查

-K7 BalanceAccount

- SC账户`

我只想阅读以-K1-K2-K3开头的字符串,等等......有人可以帮助我吗?

2 个答案:

答案 0 :(得分:1)

问题出在这一行 -

char character = file.toString().charAt(counter);

file是对未实现FileReader的{​​{1}}的引用..它调用 Object.toString(),它会打印大约25个字符的引用长。这就是为什么你的异常在第26个字符处显示OutofBoundsException。

要正确读取文件,您应该使用bufferedreader包装文件读取器,然后将每个readline放入stringbuffer。

toString()

// FileReader fr = new FileReader(filename); BufferedReader br = new BufferedReader(fr); StringBuilder sb = new StringBuilder(); String s; while((s = br.readLine()) != null) { sb.append(s); }

答案 1 :(得分:1)

如果您想要对输入文件进行标记,那么显而易见的选择是使用扫描仪。 Scanner类读取给定的输入流,并可以输出令牌或其他扫描类型(scanner.nextInt(),scanner.nextLine()等)。

import java.util.Scanner;
import java.io.File;
import java.io.IOException;
public static void main(String[] args) throws IOException {
    Scanner in = new Scanner(new File("filename.dat"));
    while (in.hasNext) {
        String s = in.next(); //get the next token in the file
        // Now s contains a token from the file
    }
}

查看Oracle's documentation of the Scanner class了解详情。