I am trying to find the tokens in a string, which has words, numbers, and special chars. I tried the following code:
String Pattern = "(\\s)+";
String Example = "This `99 is my small \"yy\" xx`example ";
String[] splitString = (Example.split(Pattern));
System.out.println(splitString.length);
for (String string : splitString) {
System.out.println(string);
}
And got the following output:
This:`99:is:my:small:"yy":xx`example:
But what I actually want is this, ie I want the special chars also as separate tokens:
This:`:99:is:my:small:":yy:":xx:`:example:
I tried to put the special chars inside the pattern, but now the special characters vanished completely:
String Pattern = "(\"|`|\\.|\\s+)";
This::99:is:my:small::yy::xx:example:
With what pattern will I get my desired output? Or should I try a different approach than using regex?
答案 0 :(得分:2)
您可以使用匹配方法来匹配字母条纹(带或不带组合标记),除字和空格之外的数字或单个字符。我认为_
应该被视为这种方法中的特殊字符。
使用
"(?U)(?>[^\\W\\d]\\p{M}*+)+|\\d+|[^\\w\\s]"
请参阅regex demo。
<强>详情:
(?U)
- Pattern.UNICODE_CHARACTER_CLASS
修饰符的内嵌版本(?>[^\\W\\d]\\p{M}*+)+
- 在_
有/无组合标记
|
- 或\\d+
- 任意1位数字|
- 或[^\\w\\s]
- 一个字符,可以是任何字符,但只是一个单词和空格。请参阅Java demo:
String str = "This `99 is my small \"yy\" xx`example_and_more ";
Pattern ptrn = Pattern.compile("(?U)(?>[^\\W\\d]\\p{M}*+)+|\\d+|[^\\w\\s]");
List<String> res = new ArrayList<>();
Matcher matcher = ptrn.matcher(str);
while (matcher.find()) {
res.add(matcher.group());
}
System.out.println(res);
// => [This, `, 99, is, my, small, ", yy, ", xx, `, example_and_more]