Question

我正在尝试在java 8中实现字数统计程序，但我无法使其工作。该方法必须将字符串作为参数并返回Map<String,Integer>。

当我以旧java方式进行时，everthing工作正常。但是当我尝试在java 8中执行它时，它返回一个映射，其中键是空的，具有正确的出现次数。

这是我的java 8风格的代码：

public Map<String, Integer> countJava8(String input){
       return Pattern.compile("(\\w+)").splitAsStream(input).collect(Collectors.groupingBy(e -> e.toLowerCase(), Collectors.reducing(0, e -> 1, Integer::sum)));
    }

以下是我在正常情况下使用的代码：

public Map<String, Integer> count(String input){
        Map<String, Integer> wordcount = new HashMap<>();
        Pattern compile = Pattern.compile("(\\w+)");
        Matcher matcher = compile.matcher(input);

        while(matcher.find()){
            String word = matcher.group().toLowerCase();
            if(wordcount.containsKey(word)){
                Integer count = wordcount.get(word);
                wordcount.put(word, ++count);
            } else {
                wordcount.put(word.toLowerCase(), 1);
            }
        }
        return wordcount;
 }

主程序：

public static void main(String[] args) {
       WordCount wordCount = new WordCount();
       Map<String, Integer> phrase = wordCount.countJava8("one fish two fish red fish blue fish");
       Map<String, Integer> count = wordCount.count("one fish two fish red fish blue fish");

        System.out.println(phrase);
        System.out.println();
        System.out.println(count);
    }

当我运行此程序时，我拥有的输出：

{ =7, =1}
{red=1, blue=1, one=1, fish=4, two=1}

我认为方法splitAsStream会将正则表达式中的匹配元素流式传输为Stream。我怎么能纠正这个？

Answer 1

问题似乎是你实际上是用语言拆分，即你在不一个单词或 in的所有内容上流式传输在之间。不幸的是，似乎没有相同的流式传输实际匹配结果的方法（很难相信，但我没有找到;如果你知道的话，请随意发表评论）。

相反，您可以使用\W代替\w按非单词进行拆分。另外，正如评论中所述，使用String::toLowerCase代替lambda和Collectors.summingInt，可以使有点更具可读性。

public static Map<String, Integer> countJava8(String input) {
    return Pattern.compile("\\W+")
                  .splitAsStream(input)
                  .collect(Collectors.groupingBy(String::toLowerCase,
                                                 Collectors.summingInt(s -> 1)));
}

但恕我直言，这仍然很难理解，不仅仅是因为＆＃34;反向＆＃34;查找，并且很难概括为其他更复杂的模式。就个人而言，我会选择去旧学校＆＃34;解决方案，也许使用新的getOrDefault使其更紧凑。

public static Map<String, Integer> countOldschool(String input) {
    Map<String, Integer> wordcount = new HashMap<>();
    Matcher matcher = Pattern.compile("\\w+").matcher(input);
    while (matcher.find()) {
        String word = matcher.group().toLowerCase();
        wordcount.put(word, wordcount.getOrDefault(word, 0) + 1);
    }
    return wordcount;
}

两种情况下的结果似乎相同。

Answer 2

试试这个。

    String in = "go go go go og sd";
    Map<String, Integer> map = new HashMap<String, Integer>();
    //Replace all punctuation with space
    String[] s = in.replaceAll("\\p{Punct}", " ").split("\\s+");
    for(int i = 0; i < s.length; i++)
    {
        map.put(s[i], i);
    }
    Set<String> st = new HashSet<String>(map.keySet());
    for(int k = 0; k < s.length; k++)
    {
    int i = 0;
    Pattern p = Pattern.compile(s[k]);
    Matcher m = p.matcher(in);
    while (m.find()) {
        i++;
    }
    map.put(s[k], i);
    }
    for(String strin : st)
    {
        System.out.println("String: " + strin.toString() + " - Occurrency: " + map.get(strin.toString()));
    }
    System.out.println("Word: " + s.length);

这是输出

字符串：sd，Occurrency：1

字符串：go，Occurrency：4

字符串：og，Occurrency：1

字：6

使用java 8进行字数统计

2 个答案: