我正在从一本书中自己练习Java。我阅读了有关文本处理和包装类的章节,并尝试了下面的练习。
Word Counter
编写一个程序,询问用户是否有文件名。程序应显示文件包含的单词数。
import java.io.File;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
public class FileWordCounter {
public static void main(String[] args) throws IOException {
// Create a Scanner object
Scanner keyboard = new Scanner(System.in);
// Ask user for filename
System.out.print("Enter the name of a file: ");
String filename = keyboard.nextLine();
// Open file for reading
File file = new File(filename);
Scanner inputFile = new Scanner(file);
int words = 0;
String word = "";
while (inputFile.hasNextLine()) {
String line = inputFile.nextLine();
System.out.println(line); // for debugging
StringTokenizer stringTokenizer = new StringTokenizer(line, " \n.!?;,()"); // Create a StringTokenizer object and use the current line contents and delimiters as parameters
while (stringTokenizer.hasMoreTokens()) { // for each line do this
word = stringTokenizer.nextToken();
System.out.println(word); // for debugging
words++;
}
System.out.println("Line contains " + words + " words");
}
// Close file
inputFile.close();
System.out.println("The file has " + words + " words.");
}
}
我从网上选择了这首随机诗来测试这个节目。我将这首诗放在一个名为TheSniper.txt的文件中:
Two hundred yards away he saw his head;
He raised his rifle, took quick aim and shot him.
Two hundred yards away the man dropped dead;
With bright exulting eye he turned and said,
'By Jove, I got him!'
And he was jubilant; had he not won
The meed of praise his comrades haste to pay?
He smiled; he could not see what he had done;
The dead man lay two hundred yards away.
He could not see the dead, reproachful eyes,
The youthful face which Death had not defiled
But had transfigured when he claimed his prize.
Had he seen this perhaps he had not smiled.
He could not see the woman as she wept
To the news two hundred miles away,
Or through his very dream she would have crept.
And into all his thoughts by night and day.
Two hundred yards away, and, bending o'er
A body in a trench, rough men proclaim
Sadly, that Fritz, the merry is no more.
(Or shall we call him Jack? It's all the same.)
以下是我的一些输出... 出于调试目的,我打印出每行和文件中的总字数,包括当前行中的字。
Enter the name of a file: TheSniper.txt
Two hundred yards away he saw his head;
Two
hundred
yards
away
he
saw
his
head
Line contains 8 words
He raised his rifle, took quick aim and shot him.
He
raised
his
rifle
took
quick
aim
and
shot
him
Line contains 18 words
...
最后,我的节目显示这首诗有176个单词。但是,Microsoft Word有174个单词。我从打印每个单词看到我错误地计算了撇号和单引号。这是我输出中出现问题的诗的最后一部分:
(Or shall we call him Jack? It's all the same.)
Or
shall
we
call
him
Jack
It
s
all
the
same
Line contains 176 words
The file has 176 words
在我的StringTokenizer参数列表中,当我不分隔单个引号(看起来像撇号)时,单词“It's”被计为一个。但是,当我这样做时,它被算作两个单词(It和s),因为撇号(看起来像单引号)被分隔。还有,“通过Jove,我找到了他!”当我没有划定单引号/撇号时,会错误计算。在分隔它们时,撇号和单引号是否相同?我不确定如何划分包含短语的单引号,而不是像“It's”这样的单词之间的撇号。我希望我在问我的问题时有点清楚。请询问任何说明。任何指导表示赞赏。谢谢!
答案 0 :(得分:0)
为什么不为每行使用另一台扫描仪来计算单词数?
int words = 0;
while (inputFile.hasNextLine()) {
int lineLength = 0;
Scanner lineScanner = new Scanner(inputFile.nextLine());
while (lineScanner.hasNext()) {
System.out.println(lineScanner.next());
lineLength++;
}
System.out.println("Line contains " + lineLength + " words");
words += lineLength;
}
我不相信可以划分单词引用的短语,例如"'通过Jove,我得到了他!'",但忽略它"它' S"除非您使用正则表达式搜索忽略单词中间的单引号。
或者,你可以对待角色"。!?;,()"作为单个单词的一部分(例如,"杰克?"是一个单词),它将为您提供正确的单词计数。这就是扫描仪的作用。只需将StringTokenizer中的分隔符更改为" " (\ n因为您已经在搜索每一行,所以不需要):
StringTokenizer stringTokenizer = new StringTokenizer(line, " ");