Question

我正在尝试构建一个程序，该程序接收文件并输出文件中的单词数。当一切都在一个整段内时，它完美地运作。但是，如果有多个段落，则不会考虑新段落的第一个单词。例如，如果文件显示“我的名字是约翰”，则程序将输出“4个单词”。但是，如果文件读取“我的名字是约翰”，每个单词是一个新段落，程序将输出“1个单词”。我知道它必须是我的if语句，但我认为在新段落之前有空格会考虑新段落中的第一个单词。这是我的代码：

import java.io.*;
public class HelloWorld
{
    public static void main(String[]args)
    {
        try{
            // Open the file that is the first
            // command line parameter
            FileInputStream fstream = new FileInputStream("health.txt");
            // Use DataInputStream to read binary NOT text.
            BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
            String strLine;

            int word2 =0;
            int word3 =0;
            //Read File Line By Line
            while ((strLine = br.readLine()) != null)   {
                // Print the content on the console
                ;
                int wordLength = strLine.length();
                System.out.println(strLine);
                for(int i = 0 ; i < wordLength -1 ; i++)
                    {
                        Character a = strLine.charAt(i);
                        Character b= strLine.charAt(i + 1);
                        **if(a == ' ' && b != '.' &&b != '?' && b != '!' && b != ' ' )**
                            {
                                word2++;
                                //doesnt take into account 1st character of new paragraph
                            }
                    }
                word3 = word2 + 1;
            }



            System.out.println("There are " + word3 + " "
                               + "words in your file.");
            //Close the input stream
            in.close();
        }catch (Exception e){//Catch exception if any
            System.err.println("Error: " + e.getMessage());
        }


    }
}

我已经尝试过调整多个团队的if语句，但似乎没有什么区别。有谁知道我搞砸了哪里？

我是一个非常新的用户，并且在几天之前问过类似的问题，人们指责我要求太多的用户，所以希望这会稍微缩小我的问题。我真的很困惑为什么它不考虑新段落的第一个字。如果您需要更多信息，请与我们联系。谢谢！！

Answer 1

首先，您的计数逻辑不正确。考虑：

word3 = word2 + 1;

想想这是做什么的。每次通过循环时，当您读取一行时，基本上计算该行中的单词，然后将总计数重置为word2 + 1。提示：如果要计算文件中的总数，则每次都要递增 word3，而不是将其替换为当前行的字数。

其次，你的单词解析逻辑略有偏差。考虑一个空行的情况。您将看不到其中的任何单词，但您将该行中的单词count视为word2 + 1，这意味着您错误地将空行计为1个单词。提示：如果该行的第一个字符是一个字母，那么该行以一个单词开头。

尽管您的实施存在轻微缺陷，但您的方法仍然合理。作为备用选项，您可能需要考虑每行String.split()。结果数组中的元素数是该行上的单词数。

顺便说一句，如果您为变量使用有意义的名称（例如totalWords而不是word3），则可以提高代码的可读性并简化调试。

Answer 2

如果您的段落不是由空格开始的，那么您的if条件将不计入第一个单词。 “我的名字是John”，程序会输出“4个单词”，这是不对的，因为你错过了第一个单词但是后面加了一个单词。试试这个：

String strLine;
strLine = strLine.trime();//remove leading and trailing whitespace
String[] words = strLine.split(" ");
int numOfWords = words.length;

Answer 3

我个人更喜欢使用基于令牌扫描的常规扫描仪来进行此类操作。这样的事情怎么样：

int words = 0;
Scanner lineScan = new Scanner(new File("fileName.txt"));
while (lineScan.hasNext()) {
    Scanner tokenScan = new Scanner(lineScan.Next());
    while (tokenScan.hasNext()) {
        tokenScan.Next();
        words++;
    }
}

这会遍历文件中的每一行。对于文件中的每一行，它遍历每个标记（在本例中为单词）并递增字数。

Answer 4

我不确定“段落”是什么意思，但是我试图按照你的建议使用大写字母，它工作得非常好。我使用了Appache Commons IO库

 package Project1;

import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld
{
    private static String fileStr = "";
    private static String[] tokens;
    public static void main(String[]args)
    {


    try{
        // Open the file that is the first
        // command line parameter
        try {
             File f = new File("c:\\TestFile\\test.txt");
             fileStr = FileUtils.readFileToString(f);
             tokens = fileStr.split(" ");
             System.out.println("Words in file : " + tokens.length);
        }
    catch(Exception ex){
        System.out.println(ex);
    }           

    }catch (Exception e){//Catch exception if any
        System.err.println("Error: " + e.getMessage());
    }


}

}

如何考虑新段落的第一句话？

4 个答案: