for(String temp : uniqueSet) {
if((Collections.frequency(list, temp)) >= 2) {
System.out.println(temp + "=" + (Collections.frequency(list, temp) -1));
}
}
我只是想重复我重复的话数。但我找不到。
在我的代码段中,我想从文本文件中获取经常出现的单词。
问题是我可以从文本文件中获取重复单词的值,例如ram = 4 sam = 4 man = 2。 现在,
我想添加4 + 4 + 2并将重复的字重计为10。
欢迎任何建议。
我是java的初学者
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.commons.io.FileUtils;
public class testsrepeatedwords {
public static void main(String[] args) throws FileNotFoundException, IOException {
FilenameFilter filter = new FilenameFilter() {
public boolean accept(File dir, String name) {
return name.endsWith(".txt");
}
};
File folder = new File("E:\\testfolder\\");
File[] listOfFiles = folder.listFiles(filter);
for (int i = 0; i < listOfFiles.length; i++) {
File file1 = listOfFiles[i];
try {
String content = FileUtils.readFileToString(file1);
} catch (IOException e) {
e.printStackTrace();
}
BufferedReader ins = null;
try {
ins = new BufferedReader ( new InputStreamReader(new FileInputStream(file1)));
} catch (FileNotFoundException e) { e.printStackTrace(); }
String message = org.apache.commons.io.IOUtils.toString(ins);
String[] stringarray = message.split(" ");
List<String> list = new ArrayList<String>(Arrays.asList(stringarray));
list.removeAll(Arrays.asList("", null));
Set<String> uniqueSet = new HashSet<String>(list);
for (String temp : uniqueSet) {
if ( (Collections.frequency(list, temp) ) >= 2 ){
System.out.println(temp+"="+(Collections.frequency(list, temp) -1) ); //after subtraction
int oc = (Collections.frequency(list, temp) -1) ;
// System.out.println(oc);
// System.out.print(oc+" ");
}
}
}}}
这是我的完整代码。 :)
答案 0 :(得分:1)
是'uniqueSet'真的是一套吗?在集合元素中只出现一次。您应该首先检查您的uniqueSet实现。如果这确实是一个Set,那么Collections.frequency(list,temp))&gt; = 2总是假的。
答案 1 :(得分:0)
为什么不使用地图存储当前计数?像这样:
public static void getRepeatCount(String[] c) {
HashMap<String, Integer> wordCount = new HashMap<>();
for(String currStr : c) {
if(wordCount.containsKey(currStr)) {
wordCount.put(currStr, wordCount.get(currStr) + 1);
} else {
wordCount.put(currStr,1);
}
}
int repeatedWords = 0;
for (String currKey : wordCount.keySet()) {
int currRepeatCount = wordCount.get(currKey);
repeatedWords += currRepeatCount;
System.out.println(currKey+" => "+currRepeatCount);
}
System.out.println("Total reapeated words: "+repeatedWords);
}
测试:
public static void main(String[] args) {
String[] ar = {"abc","abc","aa","aa","b"};
getRepeatCount(ar);
}
输出:
aa => 2
b => 1
abc => 2
Total reapeated words: 5
答案 2 :(得分:0)
Java 8的流API提供了一种非常优雅的方式。您可以流式传输单词列表,将其收集到频率图中,然后流式传输该地图的值并将其减少为总和:
int countThreshold = 2;
long sum =
words.stream()
.collect(Collectors.groupingBy(Function.identity(),
Collectors.counting()))
.values()
.stream()
.filter(x -> x >= countThreshold)
.reduce(0L, Long::sum);