Question

我已经实现了代码来计算文本中单词的出现次数。但是，我的正则表达式由于某种原因不被接受，我收到以下错误： Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 12

我的代码是：

import java.util.*;

公共类CountOccurrenceOfWords {

/**
 * @param args the command line arguments
 */
public static void main(String[] args) {
    // TODO code application logic here
    char lf = '\n';

String text = "It was the best of times, it was the worst of times," + 
lf +
"it was the age of wisdom, it was the age of foolishness," + 
lf +
"it was the epoch of belief, it was the epoch of incredulity," + 
lf +
"it was the season of Light, it was the season of Darkness," + 
lf +
"it was the spring of hope, it was the winter of despair," + 
lf +
"we had everything before us, we had nothing before us," + 
lf +
"we were all going direct to Heaven, we were all going direct" + 
lf +
"the other way--in short, the period was so far like the present" + 
lf +
"period, that some of its noisiest authorities insisted on its" + 
lf +
"being received, for good or for evil, in the superlative degree" + 
lf +
"of comparison only." + 
lf +
"There were a king with a large jaw and a queen with a plain face," + 
lf +
"on the throne of England; there were a king with a large jaw and" + 
lf +
"a queen with a fair face, on the throne of France.  In both" + 
lf +
"countries it was clearer than crystal to the lords of the State" + 
lf +
"preserves of loaves and fishes, that things in general were" + 
lf +
"settled for ever";

    TreeMap<String, Integer> map = new TreeMap<String, Integer>();
    String[] words = text.split("[\n\t\r.,;:!?(){");
    for(int i = 0; i < words.length; i++){
        String key = words[i].toLowerCase();

        if(key.length() > 0) {
            if(map.get(key) == null){
                map.put(key, 1);
            }
            else{
                int value = map.get(key);
                value++;
                map.put(key, value);
            }
        }
    }

    Set<Map.Entry<String, Integer>> entrySet = map.entrySet();

    //Get key and value from each entry
    for(Map.Entry<String, Integer> entry: entrySet){
        System.out.println(entry.getValue() + "\t" + entry.getKey());
    }
    }
}

另外，您能否提供一个关于如何按字母顺序排列单词的提示？提前谢谢

Answer 1

您在正则表达式结束时错过了"]"。

"[\n\t\r.,;:!?(){"不正确。

您需要将正则表达式替换为"[\n\t\r.,;:!?(){]"

Answer 2

您需要转义正则表达式的特殊字符。在您的情况下，您尚未转义(，)，[，?，.和{。使用\转义它们。例如。 \[。您还可以考虑为空格\s预定义的字符类 - 这将匹配\r，\t等等。

Answer 3

您的问题是正则表达式中未关闭的字符类。 RegEx有一些“预定义”的字符，你需要在寻找它们时逃脱。

字符类是：

使用“字符类”，也称为“字符集”，您可以告诉正则表达式引擎只匹配多个字符中的一个。只需将要匹配的字符放在方括号中即可。 Source

这意味着您必须要转义这些字符：

\[\n\t\r\.,;:!\?\(\){

或关闭角色类

[\n\t\r\.,;:!\?\(\){]

无论哪种方式，你都需要逃避点，问号和圆括号。

不接受正则表达式

3 个答案: