Question

我使用\P{M}\p{M}*来匹配所有字母（包括德语和法语）。

我选择了这个正则表达式，以避免定义每个unicode字符，例如： ^[a-zA-Z[\\u00c0-\\u01ff]]+[\\']?(([-]?[a-zA-Z[\\u00c0-\\u01ff]]*[\\s]?)|([\\s]?[a-zA-Z[\\u00c0-\\u01ff]]*[-]?)){1,2}[a-zA-Z[\\u00c0-\\u01ff]]+$

但是，尽管使用了上一个问题中定义的unicode格式，但ß或è等字符与正则表达式不匹配。

我使用的是JDK 6.

我错过了什么。谢谢！

Answer 1

将posix字符类\p{L}用于“任何字母”：

System.out.println("abcßè".matches("\\p{L}+")); // true

Answer 2

使用java 6这段代码

 public static void main(String[] args) {
       String str = "hello ß you";
       Pattern p = Pattern.compile("(:?\\P{M}\\p{M}*)+");
       Matcher matcher = p.matcher(str);
       System.out.println("replaced: '" + matcher.replaceAll("") + "'");
}

返回：替换：''

'ß'匹配

Java unicode正则表达式与德语字符不匹配

2 个答案: