使用模式匹配器对字符串中的单词进行计数

时间:2018-10-28 10:09:51

标签: java algorithm pattern-matching

我想用Java编写一个方法,该方法获取输入字符串,并列出多次出现的单词(不区分大小写)并带有其计数数字。

例如:

input>> "I have two cars in my garage and my dad has one car in his garage"

它应该产生以下输出:

output>> my -- repeated 2 times
         in -- repeated 2 times
         ...

这是我的代码

public class classduplicate {
    private static final String REGEX = "\\b([A-Z]+)\\s+\\1\\b*";
   private static final String INPUT = "Cat cat cat cattie cat";
    public static void main(String[] args) {
      Pattern p = Pattern.compile(REGEX);
      Matcher m = p.matcher(INPUT);   // get a matcher object
      int count = 0;

      while(m.find()) {
         count++;
                 System.out.println(m.find());


      }
      System.out.println("Match number "+count);
    }
}

2 个答案:

答案 0 :(得分:1)

我认为您无法使用正则表达式解决此问题。
这是使用Set的解决方案:

    String str = " I have two cars   in my garage and my dad has one   car in his garage ";
    System.out.println(str);

    String low = str.trim().toLowerCase();

    String[] words = low.split("\\s+");

    Set<String> setOfWords = new HashSet<String>(Arrays.asList(words));

    low = " " + str.toLowerCase() + " ";
    low = low.replaceAll("\\s", "  ");

    for (String s : setOfWords) {
        String without = low.replaceAll(" " + s + " ", "");
        int counter = (low.length() - without.length()) / (s.length() + 2);
        if (counter > 1)
            System.out.println(s + " repeated " + counter + " times.");
    }

它将打印

 I have two cars   in my garage and my dad has one   car in his garage 
in repeated 2 times.
garage repeated 2 times.
my repeated 2 times.

答案 1 :(得分:0)

您可以找到重复的单词,如下所示的代码:

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package duplicatewords;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


/**
 *
 * @author sami
 */
public class DuplicateWords {

    private static final String INPUT = "Cat cat cat cattie cat";

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) {
        List<String> wordsWithCase = DuplicateWords(INPUT);
        List<String> wordsWithoutCase = DuplicateWordsDespiteOfCase(INPUT);
        CountDuplicateWords(INPUT, wordsWithCase);
        CountDuplicateWords(INPUT, wordsWithoutCase);
    }

    /**
     * Find the duplicate words with regards of upper and lower case
     * @param inputValue Input String
     * @return duplicateWords List of the words which are duplicated in the string.
     */
    private static List<String> DuplicateWords(String inputValue) {
        String[] breakWords = inputValue.split("\\s+");
        List<String> duplicateWords = new ArrayList<>();
        for (String word : breakWords) {
            if (!duplicateWords.contains(word)) {
                duplicateWords.add(word);
            }
        }
        return duplicateWords;
    }

    /**
     * Find the duplicate words despite of upper and lower case
     * @param inputValue Input String
     * @return duplicateWords List of the words which are duplicated in the string.
     */
    private static List<String> DuplicateWordsDespiteOfCase(String inputValue) {
        inputValue = inputValue.toLowerCase();
        String[] breakWords = inputValue.split("\\s+");
        List<String> duplicateWords = new ArrayList<>();
        for (String word : breakWords) {
            if (!duplicateWords.contains(word)) {
                duplicateWords.add(word);
            }
        }
        return duplicateWords;
    }

    /**
     * Found the Iteration of the the duplicated words in the string
     * @param inputValue Input String
     * @param duplicatedWords List of the duplicated words
     */
    private static void CountDuplicateWords(String inputValue, List<String> duplicatedWords) {
        int i;
        Pattern pattern;
        Matcher matcher;
        System.out.println("Duplicate Words With respect of Upper and Lower Case: " + duplicatedWords);
        for (String value : duplicatedWords) {
            i = 0;
            pattern = Pattern.compile(value);
            matcher = pattern.matcher(inputValue);
            while (matcher.find()) {
                i++;
            }
            System.out.println(i);
        }
    }
}

DuplicateWords方法获取字符串中所有大写和小写重复的单词,DuplicateWordsDespiteOfCase方法获取字符串中所有重复的大写和小写(您在问题中提到的内容)。一旦出现重复的单词,CountDuplicateWords就会检查它们是否出现在字符串中。

如果不想使用,可以删除DuplicateWords方法。仅供参考。