WordCount项目缺陷

时间:2015-11-26 00:32:22

标签: java

我正在开发一个类项目,它计算文本文件中的单词,行,字符和段落的总数。到目前为止,它一直在工作,但我的字符数似乎是3,并且该段似乎是计算两个额外的空白行,我得到5而不是4。

这是我到目前为止所做的:

import java.util.*;
import java.io.*;

public class WordStats {

    /* getWordCount() method will receive a String parameter
     * and return the total number of words by splitting 
     * the received string into words and increment word count */
    public static int getWordCount (String line){

        int wordCount = 0; 

        String str [] = line.split((" "));
        for (int i = 0; i <str.length; i ++){
            if(str[i].length() > 0 ){
                wordCount++;
            }
        }

        return wordCount;
    }

    /* getParsCount method receives a string parameter 
     * and returns the total number of paragraphs in 
     * the text file. */
    /*public static int getParsCount(String line){

        int parCount=0;
        boolean isText = false;

        if(!line.isEmpty()){
            isText=false;
            }

        else {
                isText=true;
                parCount++;

        }

        return parCount;
    }
    */

     public static int getParsCount(String line) {
         boolean isText=false;  
         if (!line.isEmpty()) {
                if (!isText) {
                    isText = true;
                    return 1;
                }
            }
            else {
                isText = false;
            }

            return 0;
        }
    public static void main(String[] args) {

        try{

            int chars =0, words = 1, lines =0, pars=0;

            // creates new Scanner inFile
            Scanner inFile = new Scanner(new File("data.txt")); 

            //creates file to write updated data file.
            PrintWriter outFile = new PrintWriter(new FileOutputStream("dataCopy.txt"));

            //Loop that sends string variables to methods so long as there is another
            //line break in the file. 
            while(inFile.hasNextLine()){ 

                String line = inFile.nextLine();// read aline from the input file

                lines++;                        //increment line count
                chars += (line.length());       //increment char count
                words += getWordCount(line);    //Increment word count
                pars += getParsCount(line);     // increment paragraph count.
                outFile.println(line + "\n");
            }

            System.out.println("The number of Characters in the file are: " + chars);
            System.out.println("The number of Words in the file are: " + words);
            System.out.println("The number of Lines in the file are: " + lines);
            System.out.println("The number of Paragraphs in the file are: " + pars);
            inFile.close(); // closes file input. 
            outFile.close();// closes output file.
            System.out.print("File Written");
        }

        catch(FileNotFoundException e){
            System.out.print("ERROR: CANNOT PROCESS FILE");
        }

    }

}

这是输入文件:

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in
Liberty, and dedicated to the proposition that all men are created equal. 

Now   we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so
dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a
portion of that field, as a final resting place for those who here gave their lives that that nation might
live. It is altogether fitting and proper that we should do this. 

But,    in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground.
The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add
or detract. The world will little note, nor long remember what we say here, but it can never forget
what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which
they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great
task remaining before us -- that from these honored dead we take increased devotion to that cause for which
they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have
died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of
the people, by the people, for the people, shall not perish from the earth.



Abraham Lincoln
November 19, 1863

输出是这样的:

The number of Characters in the file are: 1495
The number of Words in the file are: 283
The number of Lines in the file are: 22
The number of Paragraphs in the file are: 5

1 个答案:

答案 0 :(得分:0)

以下是对代码所做的更改,以允许它正确计算输入文件中的段落数或连续文本块数。我创建了一个boolean标志,如果当前行有内容,则设置为true,如果是空行,则设置为false。如果两个段落被多个空行分隔,则多个空行只会被计数一次。此外,输入文件末尾的额外空行将被忽略。

public class WordStats2 {

    boolean isText = false;

    public static int getParsCount(String line) {
        if (!line.trim().isEmpty()) {
            if (!isText) {
                isText = true;
                return 1;
            }
        }
        else {
            isText = false;
        }

        return 0;
    }
}

由于您从未向我们展示您的输入,因此我们只能推测字符数也为何也是关闭的。一种可能性是文件末尾的额外空行也是罪魁祸首。这些&#34;空&#34;行不为空,但实际上包含一个或多个行尾字符(Windows中为\r\n,Linux中为\n)。所以你的程序可能会计算这些字符。发表您的意见,我可以修改我的答案。