如何添加collections.frequency中的所有值以使用java获取重复的单词

时间:2016-01-25 09:13:26

标签: java arrays string list text

for(String temp : uniqueSet) {
    if((Collections.frequency(list, temp)) >= 2) {
        System.out.println(temp + "=" + (Collections.frequency(list, temp) -1));
    }
}

我只是想重复我重复的话数。但我找不到。

在我的代码段中,我想从文本文件中获取经常出现的单词。

问题是我可以从文本文件中获取重复单词的值,例如ram = 4 sam = 4 man = 2。 现在,

我想添加4 + 4 + 2并将重复的字重计为10。

欢迎任何建议。

我是java的初学者

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;

import java.io.FilenameFilter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;

import java.util.HashSet;
import java.util.List;
import java.util.Set;
import org.apache.commons.io.FileUtils;
public class testsrepeatedwords {
    public static void main(String[] args) throws FileNotFoundException, IOException {

    FilenameFilter filter = new FilenameFilter() {
        public boolean accept(File dir, String name) {
            return name.endsWith(".txt");
        }
    };

    File folder = new File("E:\\testfolder\\");
    File[] listOfFiles = folder.listFiles(filter);

    for (int i = 0; i < listOfFiles.length; i++) {
        File file1 = listOfFiles[i];
        try {
            String content = FileUtils.readFileToString(file1);

        } catch (IOException e) {

            e.printStackTrace();
        }

        BufferedReader ins = null;

        try {
            ins = new BufferedReader ( new InputStreamReader(new FileInputStream(file1)));
                } catch (FileNotFoundException e) {       e.printStackTrace();  }

        String message = org.apache.commons.io.IOUtils.toString(ins);
        String[] stringarray = message.split(" "); 
        List<String> list = new ArrayList<String>(Arrays.asList(stringarray));  
        list.removeAll(Arrays.asList("", null));
        Set<String> uniqueSet = new HashSet<String>(list); 
        for (String temp : uniqueSet) { 
                if ( (Collections.frequency(list, temp)  ) >= 2 ){

                    System.out.println(temp+"="+(Collections.frequency(list, temp)  -1)  );             //after subtraction 

                int oc = (Collections.frequency(list, temp)  -1) ;
            //  System.out.println(oc);     
           // System.out.print(oc+" ");


        } 
        }
        }}}

这是我的完整代码。 :)

3 个答案:

答案 0 :(得分:1)

是'uniqueSet'真的是一套吗?在集合元素中只出现一次。您应该首先检查您的uniqueSet实现。如果这确实是一个Set,那么Collections.frequency(list,temp))&gt; = 2总是假的。

答案 1 :(得分:0)

为什么不使用地图存储当前计数?像这样:

public static void getRepeatCount(String[] c) {
    HashMap<String, Integer> wordCount = new HashMap<>();
    for(String currStr : c) {
        if(wordCount.containsKey(currStr)) {
            wordCount.put(currStr, wordCount.get(currStr) + 1);
        } else {
            wordCount.put(currStr,1);
        }
    }
    int repeatedWords = 0;
    for (String currKey : wordCount.keySet()) {
        int currRepeatCount = wordCount.get(currKey);
        repeatedWords += currRepeatCount;
        System.out.println(currKey+" => "+currRepeatCount);
    }
    System.out.println("Total reapeated words: "+repeatedWords);
}

测试:

public static void main(String[] args) {                    
    String[] ar = {"abc","abc","aa","aa","b"};
    getRepeatCount(ar); 
}

输出:

aa  => 2
b   => 1
abc => 2
Total reapeated words: 5

答案 2 :(得分:0)

Java 8的流API提供了一种非常优雅的方式。您可以流式传输单词列表,将其收集到频率图中,然后流式传输该地图的值并将其减少为总和:

int countThreshold = 2;
long sum =
    words.stream()
          .collect(Collectors.groupingBy(Function.identity(),
                                         Collectors.counting()))
          .values()
          .stream()
          .filter(x -> x >= countThreshold)
          .reduce(0L, Long::sum);