Question

仅使用正则表达式方法，方法String.replaceAll和ArrayList 如何将String拆分为标记，但忽略引号内存在的分隔符？分隔符是任何不是字母数字或引用文本的字符

例如：字符串：

你好^ world'this *有两个令牌'

应输出：

你好

worldthis *有两个令牌

Answer 1

我知道有一个该死的好的和已接受的答案已经存在，但我想添加另一个正则表达式（我可以说更简单）的方法来分割给定的文本使用任何非字母数字分隔符，不使用单引号

<强>正则表达式：

/(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+/

这基本上意味着匹配非字母数字文本，如果它是后跟偶数个单引号，换句话说匹配非字母数字文本（如果它在单引号之外）。

<强>代码：

String string = "hello^world'this*has two tokens'#2ndToken";
System.out.println(Arrays.toString(
     string.split("(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+"))
);

<强>输出：

[hello, world'this*has two tokens', 2ndToken]

<强>演示：

Here is a live working Demo of the above code.

Answer 2

你不能以任何合理的方式。你正在构成正则表达式不擅长的问题。

Answer 3

使用Matcher标识要保留的部分，而不是要分割的部分：

String s = "hello^world'this*has two tokens'";
Pattern pattern = Pattern.compile("([a-zA-Z0-9]+|'[^']*')+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
    System.out.println(matcher.group(0));
}

查看在线工作：ideone

Answer 4

不要使用正则表达式。它不会起作用。改为使用/编写解析器。

您应该使用正确的工具来完成正确的任务。

java正则表达式 - 拆分但忽略引号内的文字？

4 个答案: