我正在尝试计算文档频率(即每个单词出现的文档数量),例如:
Doc1:this phone is the greatest phone ever.
Doc2:what's your phone number.
结果:
this 1
phone 2
is 1
the 1
ever 1
what's 1
your 1
number 1
我在Java中有以下代码
HashMap<String, String> wordDoc = new HashMap<String, String>();
HashMap<String, Integer> countDfIndex = new HashMap<String, Integer>();
if (!wordDoc.containsKey(word)) {
wordDoc.put(word,docno);
countDfIndex.put(word, 1);
}
if (wordDoc.get(word)!=null) {
if(!wordDoc.containsValue(docno)) {
wordDoc.put(word,docno);
countDfIndex.put(word, countDfIndex.get(word)+1);
}
}
我没有得到正确的结果,请帮助!!
答案 0 :(得分:3)
我假设您正在尝试计算包含相应单词的文档数量,而不是总出现次数。
如果是这样的话:
Map<String, Integer> countDfIndex = new HashMap<String, Integer>();
for (... document : documents) {
Set<String> alreadyAdded = new HashSet<String>(); // new empty set for each document
...
if (!alreadyAdded.contains(word)) {
if (!countDfIndex.containsKey(word) {
countDfIndex.put(word, 1);
} else {
countDfIndex.put(word, countDfIndex.get(word) + 1);
}
alreadyAdded.add(word); // don't add the word anymore if found again in the document
}
}
答案 1 :(得分:2)
public static void add(Map<String, Integer> map, String word) {
map.put(word, map.containsKey(word) ? map.get(word) + 1 : 1);
}
for (String i : s.replace(".", "").split(" ")) add(map, i);
其中,
map = new HashMap<String, Integer>();
s = "this phone is the greatest phone ever. what's your phone number."
最后,地图包含
{the=1, ever=1, number=1, phone=3, this=1, what's=1, is=1, your=1, greatest=1}
答案 2 :(得分:2)
HashMap<String, Integer> countDfIndex = new HashMap<String, Integer>();
if (!countDfIndex.containsKey(word))
{
countDfIndex.put(word, 1);
}
else{
int i =countDfIndex.get(word);
countDfIndex.put(word,i+1);
}
for(Map.Entry<String,Integer> pair: countDfIndex.entrySet()){
int count=pair.getValue();
String word=pair.getKey();
System.out.println("word is "+word+"count is "+count);
}