我的程序读取文本文件并列出文件中每个单词的频率。接下来我需要做的是在阅读文件时忽略某些单词,例如'the','an'。我有一个创建这些单词的列表,但不知道如何在while循环中实现它。感谢。
public static String [] ConnectingWords = {"and", "it", "you"};
public static void readWordFile(LinkedHashMap<String, Integer> wordcount) {
// FileReader fileReader = null;
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
// LinkedHashMap <String, Integer> wordcount = new LinkedHashMap<String, Integer> ();
try {
wordFile = new Scanner(new FileReader("/Applications/text.txt"));
wordFile.useDelimiter(" ");
} catch (FileNotFoundException e) {
System.err.println(e);
return;
}
while (wordFile.hasNext()) {
word = wordFile.next();
word = word.toLowerCase();
if (word.contains("the")) {
count = getCount(word, wordcount) + 0;
wordcount.put(word, count);
}
// Get the current count of this word, add one, and then store the
// new count:
count = getCount(word, wordcount) + 1;
wordcount.put(word, count);
}
}
答案 0 :(得分:2)
创建一个列表,其中包含需要忽略的单词列表:
List<String> ignoreAll= Arrays.asList("and","it", "you");
然后在while循环中添加一个将忽略单词的条件包含这些单词
if(ignoreAll.contains(word)){
continue;
}
答案 1 :(得分:2)
您可以尝试以下代码。
public static HashSet<String> connectingWords;
public static Map<String,Integer> frequencyMap;
static {
connectingWords = new HashSet<>();
connectingWords.add("and");
connectingWords.add("it");
connectingWords.add("you");
frequencyMap = new HashMap<>();
}
public static void main(String[] args) {
BufferedReader reader = null;
String line;
try {
reader = new BufferedReader(new FileReader("src/files/temp2.txt"));
while ((line = reader.readLine()) != null) {
String[] words = line.split("-");
for (String word : words) {
if(connectingWords.contains(word)) {
continue;
}
Integer value = frequencyMap.get(word);
if(value != null) {
frequencyMap.put(word,value+1);
} else {
frequencyMap.put(word,0);
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
reader.close();
}
System.out.println(frequencyMap.values());
}
最好在HashSet
中存储连接字,因为每次为文件中的每个单词调用contains
时,它都会提供快速访问。该词及其频率也可以保持在Map
。另外我假设单词的分隔符为-
,如果是其他内容则可以修改代码。此外,如果您有任何与case
相关的特殊要求,您可以更改代码。我已尝试使用What-the-hell-is-going-on-and-it-is-good
输入的文件,它工作正常。
答案 2 :(得分:0)
有排除列表的列表单词。在更新计数之前,请检查排除列表。
public static void readWordFile (LinkedHashMap<String, Integer> wordcount) {
List<String> excludeList = new ArrayList<>();
excludeList.add("the"); // and so on
// FileReader fileReader = null;
Scanner wordFile;
String word; // A word read from the file
Integer count; // The number of occurrences of the word
// LinkedHashMap <String, Integer> wordcount = new LinkedHashMap <String, Integer> ();
try
{
wordFile = new Scanner(new FileReader("/Applications/text.txt"));
wordFile.useDelimiter(" ");
}
catch (FileNotFoundException e)
{
System.err.println(e);
return;
}
while (wordFile.hasNext())
{
word = wordFile.next( );
word = word.toLowerCase();
if(!excludeList.contains(word)) {
count = wordcount.get(word) + 1;
wordcount.put(word, count);
}
}