我已经实现了代码来计算文本中单词的出现次数。但是,我的正则表达式由于某种原因不被接受,我收到以下错误:
Exception in thread "main" java.util.regex.PatternSyntaxException: Unclosed character class near index 12
我的代码是:
import java.util.*;
公共类CountOccurrenceOfWords {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
char lf = '\n';
String text = "It was the best of times, it was the worst of times," +
lf +
"it was the age of wisdom, it was the age of foolishness," +
lf +
"it was the epoch of belief, it was the epoch of incredulity," +
lf +
"it was the season of Light, it was the season of Darkness," +
lf +
"it was the spring of hope, it was the winter of despair," +
lf +
"we had everything before us, we had nothing before us," +
lf +
"we were all going direct to Heaven, we were all going direct" +
lf +
"the other way--in short, the period was so far like the present" +
lf +
"period, that some of its noisiest authorities insisted on its" +
lf +
"being received, for good or for evil, in the superlative degree" +
lf +
"of comparison only." +
lf +
"There were a king with a large jaw and a queen with a plain face," +
lf +
"on the throne of England; there were a king with a large jaw and" +
lf +
"a queen with a fair face, on the throne of France. In both" +
lf +
"countries it was clearer than crystal to the lords of the State" +
lf +
"preserves of loaves and fishes, that things in general were" +
lf +
"settled for ever";
TreeMap<String, Integer> map = new TreeMap<String, Integer>();
String[] words = text.split("[\n\t\r.,;:!?(){");
for(int i = 0; i < words.length; i++){
String key = words[i].toLowerCase();
if(key.length() > 0) {
if(map.get(key) == null){
map.put(key, 1);
}
else{
int value = map.get(key);
value++;
map.put(key, value);
}
}
}
Set<Map.Entry<String, Integer>> entrySet = map.entrySet();
//Get key and value from each entry
for(Map.Entry<String, Integer> entry: entrySet){
System.out.println(entry.getValue() + "\t" + entry.getKey());
}
}
}
另外,您能否提供一个关于如何按字母顺序排列单词的提示?提前谢谢
答案 0 :(得分:1)
您在正则表达式结束时错过了"]"
。
"[\n\t\r.,;:!?(){"
不正确。
您需要将正则表达式替换为"[\n\t\r.,;:!?(){]"
答案 1 :(得分:0)
您需要转义正则表达式的特殊字符。在您的情况下,您尚未转义(
,)
,[
,?
,.
和{
。使用\
转义它们。例如。 \[
。您还可以考虑为空格\s
预定义的字符类 - 这将匹配\r
,\t
等等。
答案 2 :(得分:0)
您的问题是正则表达式中未关闭的字符类。 RegEx有一些“预定义”的字符,你需要在寻找它们时逃脱。
字符类是:
使用“字符类”,也称为“字符集”,您可以告诉正则表达式引擎只匹配多个字符中的一个。只需将要匹配的字符放在方括号中即可。 Source
这意味着您必须要转义这些字符:
\[\n\t\r\.,;:!\?\(\){
或关闭角色类
[\n\t\r\.,;:!\?\(\){]
无论哪种方式,你都需要逃避点,问号和圆括号。