我需要在java中创建一个单词计数器来计算文件中相同单词的数量。例如,如果文件中的句子是 “一只鸟和一只兔子在吃午餐 这只鸟喜欢苹果,兔子喜欢草莓。他们分享了午餐,都喜欢苹果和草莓。“ 然后计数器将找到并输出每个单词的使用量 输出将是
A:2
AND:4
APPLES:2
BIRD:2
两种:1
吃饭:1
等...........
我对Java知之甚少,但我需要这样做。有没有人对如何做到这一点有任何想法?
答案 0 :(得分:1)
首先,您应该考虑一下文件大小。如果它是一个大文件,关于使用地图的想法可能会导致内存问题。你必须检查不同的方法。 其次要考虑一下你打算学习的java版本。这可以简化一些步骤。由于您是一名学习者,您可以使用大多数评论中所述的Map来遵循一个简单的算法。
Algorthim和代码段
逐行阅读文件:您可以通过多种方式实现此目的。通过使用标准的java类,如Scanner,BufferredReader等,或者使用第三方库,如appache-commons。或者更好的是,如果您使用的是JDK 7,则可以使用Files类作为
List<String> list = Files.readAllLines(new File("test.txt").toPath());
遍历行列表并获取单词:只需使用for循环。再次,如果您使用JDk 5+,for-each循环是您最好的appracoh。使用String类的split方法逐字逐句获取,并将其迭代为
for(String line : list){
for(String word : line.split(" ")){
//More code
}
}
添加到地图:使用键作为单词和值来维护一个地图作为计数。然后用每个单词检查它是否在地图中。如果它在map中,则获取计数并增加它,否则将count添加到count中。重复此过程,直到步骤2中的2 for循环完成。
//Intialize outside the loops
Map<String, Integer> counter = new HashMap<String, Integer>();
//Inside loop
Integer val = counter.get(word);
if(val == null){
counter.put(word, 1);
} else {
counter.put(word, ++val);
}
使用for-each循环和keySet再次打印值
for(String key : counter.keySet()){
System.out.println(key + " : " + counter.get(key));
}
但是在步骤3中,您需要考虑密钥的区分大小写。如果您需要不区分大小写的比较,请使用word.toUpperCase()方法插入和搜索键。
全部放在一起
import java.io.File;
import java.nio.file.Files;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class WordCounter {
public static void main(String[] args) throws Exception {
Map<String, Integer> counter = new HashMap<String, Integer>();
List<String> list = Files.readAllLines(new File("test.txt").toPath());
for(String line : list){
for(String word : line.split(" ")){
Integer val = counter.get(word);
if(val == null){
counter.put(word, 1);
} else {
counter.put(word, ++val);
}
}
}
for(String key : counter.keySet()){
System.out.println(key + " : " + counter.get(key));
}
}
}
对于学习,你可以玩这个来找到独特的单词或重复的单词或单词,重复超过3次等等。快乐编码:)
答案 1 :(得分:0)
这是完整的工作解决方案,你可以使用它作为参考,我添加了注释,使程序更容易理解: -
import java.io.BufferedReader; //imports
import java.io.FileReader;
import java.util.*;
public class Test {
public static void main(String args[]) {
String name[] = null; //array to store each individual word in the file -separated by whitespace
try { //code to read input from file
BufferedReader br = new BufferedReader(
new FileReader("D:\\file.txt")); //Enter your complete file path here
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line); //read through the entire file
line = br.readLine();
}
String everything = sb.toString(); //append everything to a single string
name = everything.trim().split(" "); //split the entire string in terms of whitespace char
br.close(); //close the BufferedReader
} catch (Exception e) {
e.printStackTrace();
}
Map<String, Integer> map = new HashMap<String, Integer>(); //Map to store the (word, No of
// occurences of that word)
//word is key and occurences is value
Set<String> set = new HashSet<String>(); //set to eliminate duplicate words and store unique words
int counter = 0; //main counter that counts occurences
int i = 0;
try {
for (i = 0; i < name.length; i++) {
set.add(name[i].toLowerCase()); //populate set from name array
}
System.out.println(set); output set
Iterator it = set.iterator(); //iterate the entire set matching each unique word to the entire no of words stored in the name array
while (it.hasNext()) {
String temp = (String) it.next();
// for each word iterate the name array and look for matches, initialize counter to zero for each word
for (i = 0, counter = 0; i < name.length; i++) {
if (temp.equalsIgnoreCase(name[i])) {//condition to check if the word in set matches word in name array
counter = counter + 1; // increase occurence counter if it does
if (map.containsKey(temp)) {//if word is already inserted in the map then remove and insert it again with updated counter
map.remove(temp);
map.put(temp, counter);
} else {
map.put(temp, counter);// if it is the first time entering the word in map simply enter with current counter
}
}
}
}
System.out.println(map); // print the map
} catch (Exception e) {
e.printStackTrace();
}
}
}