在我完成这个程序的过程中,我遇到了一个方法。我正在编写的方法读取某个.txt文件并创建一个HashMap并将每个单词设置为一个Key,它出现的时间是它的Value。我已经设法解决了另一种方法,但这一次,该方法正在读取的.txt文件是一种奇怪的格式。具体做法是:
more 2
morning's 1
most 3
mostly 1
mythology. 1
native 1
nearly 2
northern 1
occupying 1
of 29
off 1
等等。 现在,该方法只返回文件中的一行。
以下是该方法的代码:
public static HashMap<String,Integer> readVocabulary(String fileName) {
// Declare the HashMap to be returned
HashMap<String, Integer> wordCount = new HashMap();
String toRead = fileName;
try {
FileReader reader = new FileReader(toRead);
BufferedReader br = new BufferedReader(reader);
// The BufferedReader reads the lines
String line = br.readLine();
// Split the line into a String array to loop through
String[] words = line.split(" ");
// for loop goes through every word
for (int i = 0; i < words.length; i++) {
// Case if the HashMap already contains the key.
// If so, just increments the value.
if (wordCount.containsKey(words[i])) {
int n = wordCount.get(words[i]);
wordCount.put(words[i], ++n);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], 1);
}
}
br.close();
}
// Catching the file not found error
// and any other errors
catch (FileNotFoundException fnfe) {
System.err.println("File not found.");
}
catch (Exception e) {
System.err.print(e);
}
return wordCount;
}
问题是我不知道如何让方法忽略.txt文件的2和1和29。我尝试制作一个'else if'语句来捕获所有这些案例,但是有太多。有没有办法让我从1-100中捕获所有的内容,并将它们排除在HashMap中的Keys之外?我在网上搜索了但发现了一些东西。
感谢您提供任何帮助!
答案 0 :(得分:1)
在完成拆分后,如何为每一行wordCount.put(words[0],1)
做wordcount
。如果模式始终是“单词编号”,则只需要拆分数组中的第一个项目。
经过一些来回更新
public static HashMap<String,Integer> readVocabulary(String toRead)
{
// Declare the HashMap to be returned
HashMap<String, Integer> wordCount = new HashMap<String, Integer>();
String line = null;
String[] words = null;
int lineNumber = 0;
FileReader reader = null;
BufferedReader br = null;
try {
reader = new FileReader(toRead);
br = new BufferedReader(reader);
// Split the line into a String array to loop through
while ((line = br.readLine()) != null) {
lineNumber++;
words = line.split(" ");
if (words.length == 2) {
if (wordCount.containsKey(words[0]))
{
int n = wordCount.get(words[0]);
wordCount.put(words[0], ++n);
}
// Otherwise, puts the word into the HashMap
else
{
boolean word2IsInteger = true;
try
{
Integer.parseInt(words[1]);
}
catch(NumberFormatException nfe)
{
word2IsInteger = false;
}
if (word2IsInteger) {
wordCount.put(words[0], Integer.parseInt(words[1]));
}
}
}
}
br.close();
br = null;
reader.close();
reader = null;
}
// Catching the file not found error
// and any other errors
catch (FileNotFoundException fnfe) {
System.err.println("File not found.");
}
catch (Exception e) {
System.err.print(e);
}
return wordCount;
}
答案 1 :(得分:1)
要检查字符串是否只包含数字,请使用String的matches()方法,例如
if (!words[i].matches("^\\d+$")){
// NOT a String containing only digits
}
这不需要检查异常,如果数字不适合整数,则无关紧要。
答案 2 :(得分:0)
使用Integer.parseInt()或Double.parseInt()并捕获异常。
// for loop goes through every word
for (int i = 0; i < words.length; i++) {
try {
int wordAsInt = Integer.parseInt(words[i]);
} catch(NumberFormatException e) {
// Case if the HashMap already contains the key.
// If so, just increments the value.
if (wordCount.containsKey(words[i])) {
int n = wordCount.get(words[i]);
wordCount.put(words[i], ++n);
}
// Otherwise, puts the word into the HashMap
else {
wordCount.put(words[i], 1);
}
}
}
有一种Double.parseDouble(String)
方法,如果您想要消除所有数字而不仅仅是整数,可以使用它代替上面的Integer.parseInt(String)
。
另一种选择是一次解析输入的一个字符,并忽略任何不是字母的字符。扫描空格时,可以将刚刚扫描的字符生成的单词添加到HashMap中。与上面提到的方法不同,按字符扫描将允许您忽略数字,即使它们紧挨着其他字符出现。