计算文件中的所有字符,包括\ n等

时间:2013-07-18 04:09:19

标签: java java.util.scanner stringtokenizer

我正在尝试遍历txt文件并计算所有字符。这包括\ n新行字符和其他任何内容。我只能通读一次文件。我还记录了字母频率,行数,单词数量等。我无法弄清楚在哪里计算字符总数。 (见下面的代码)我知道在使用StringTokenizer之前我需要。 (顺便说一下,我必须使用它)。我已经尝试了多种方法,但只是无法弄明白。任何帮助,将不胜感激。提前致谢。注意*我的变量numChars只计算字母字符(a,b,c等)编辑发布类变量以使代码更有意义

private final int NUMCHARS = 26;
private int[] characters = new int[NUMCHARS];
private final int WORDLENGTH = 23;
private int[] wordLengthCount = new int[WORDLENGTH];
private int numChars = 0;
private int numWords = 0;
private int numLines = 0;
private int numTotalChars = 0;
DecimalFormat df = new DecimalFormat("#.##");

public void countLetters(Scanner scan) {
    char current;
    //int word;
    String token1;

    while (scan.hasNext()) {

        String line = scan.nextLine().toLowerCase();
        numLines++;

        StringTokenizer token = new StringTokenizer(line,
            " , .;:'\"&!?-_\n\t12345678910[]{}()@#$%^*/+-");
        for (int w = 0; w < token.countTokens(); w++) {
            numWords++;
        }

        while (token.hasMoreTokens()) {
            token1 = token.nextToken();
            if (token1.length() >= wordLengthCount.length) {
                wordLengthCount[wordLengthCount.length - 1]++;
            } else {
                wordLengthCount[token1.length() - 1]++;

            }

        }
        for (int ch = 0; ch < line.length(); ch++) {
            current = line.charAt(ch);
            if (current >= 'a' && current <= 'z') {
                characters[current - 'a']++;
                numChars++;

            }
        }
    }
}

2 个答案:

答案 0 :(得分:0)

使用string.toCharArray(),例如:

while (scan.hasNext()) {
    String line = scan.nextLine();
    numberchars += line.toCharArray().length;
    // ...
}

替代方案是直接使用string.length

while (scan.hasNext()) {
    String line = scan.nextLine();
    numberchars += line.length;
    // ...    
}

使用BfferedReader,你可以像this

那样做
BufferedReader reader = new BufferedReader(
    new InputStreamReader(
        new FileInputStream(file), charsetName));
int charCount = 0;
while (reader.read() > -1) {
    charCount++;
}

答案 1 :(得分:0)

我会使用BufferedReader从文件读取char并使用Guava Multiset计算字符

BufferedReader rdr = Files.newBufferedReader(path, charSet);
HashMultiset < Character > ms = HashMultiset.create();
for (int c;
(c = rdr.read()) != -1;) {
    ms.add((char) c);
}
for (Multiset.Entry < Character > e: ms.entrySet()) {
    char c = e.getElement();
    int n = e.getCount();
}