Question

我想在其中一个字符（下面列出）上分割一个句子。我的正则表达式能够基于大多数字符进行拆分，但不能在'['，']'（打开和关闭方括号）上进行拆分。如果我将字符串SPECIAL_CHARACTERS_REGEX更改为[ :;'=\\()!-\\[\\]]，它将开始拆分字符串中的整数，而不是拆分方括号。如何将正则表达式分割为方括号而不是整数（'[]'表示所有整数）。

另一个相关的问题是，有没有办法从字符串中拆分数字？例如。 9pm应分为9和pm。

This:

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'=\\()!-]";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);

Gives:

Input: let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]
output: [let, s, meet, tomorrow, at, 9, 30p?, 7, 8pm?, i, you, go, , no, Go, , , [to, do, , ]]

和

This:

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'=\\()!-\\[\\]]";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]"
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);

Gives:
let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]
[let, s, meet, tomorrow, at, , , , , p, , , , , pm, , i, you, go, , no, , o, , , , to, do]

预期产出：

{"let", "s", "meet", "tomorrow", "at", "9", "30", "p", "7", "8", "pm", "i", "you", "go", "no", "Go", "to", "do"}

Answer 1

将短划线放在最后（或开始或转义），否则，它将被视为一系列字符：

[ :;'=\\()!\\[\\]-]

您的原始正则表达式匹配!和[之间的所有字符，其中包括数字，大写字母和一堆其他符号，例如(，)等等

要获得您期望的结果，您可以使用以下内容：

[ ?:;'=\\()!\\[\\]-]+|(?<=\\d)(?=\\D)

(?<=\d)(?=\D)用于分隔数字和非数字（或者您可能还想使用[0-9]和[^0-9]，这应该更有效/更快一些）

ideone demo

Answer 2

如果您将短划线留在角色类的中间，则还需要将其移除。

但是，请将其放在角色类的开头或结尾处以避免这种情况。此外，您无需在此处转义()，并且您可能希望在角色类之后使用量词，*或+。

更新：为了获得预期的结果，您可以这样做。

private static final String SPECIAL_CHARACTERS_REGEX = "[ :;'?=()!\\[\\]-]+|(?<=\\d)(?=\\D)";
String rawMessage = "let's meet tomorrow at 9:30p? 7-8pm? i=you go (no Go!) [to do !]";
String[] tokens = rawMessage.split(SPECIAL_CHARACTERS_REGEX);
System.out.println(Arrays.toString(tokens));

正则表达式：

[ :;'?=()!\[\]-]+    any character of: ' ', ':', ';', ''', '?',
                       '=', '(', ')', '!', '\[', '\]', '-' (1 or more times)
 |                   OR
  (?<=               look behind to see if there is:
   \d                digits (0-9)
  )                  end of look-behind
   (?=               look ahead to see if there is:
    \D               non-digits (all but 0-9)
   )                 end of look-ahead

请参阅Working demo

输出

[let, s, meet, tomorrow, at, 9, 30, p, 7, 8, pm, i, you, go, no, Go, to, do]

Answer 3

在正则表达式中使用此选项将在数字后跟字母的任何位置分割：

(?<=\\d)(?=[A-Za-z])

我在模式中使用了上面的测试。要将其添加到您已有的内容中，请使用|在你的正则表达式中拆分上面的或你已经拥有的东西：

String[] parts = s.split("[ :;'=()!\\[\\]-]+|(?<=\\d)(?=[A-Za-z])");

（使用hwnd的答案）。 ?<=是一个lookbehind，如果一个点后面的模式匹配，则匹配，?=是一个前瞻，如果一个点之后的模式匹配，则匹配。

Answer 4

首先在字母数字组合之间引入空格，例如晚上8点，然后根据特殊字符拆分“['和']'的转义序列：

String rawMessage  = "let's meet tomorrow at 9:30pm 7-8pm? i=you go (no Go!) [to do !]";
String rawMessage2 = rawMessage.replaceAll("(?<=[0-9])(?=[a-zA-Z])", " ");
String[] tokens  = rawMessage2.split("[ :;'=()!\\[\\]]+");

在多个字符上拆分字符串

4 个答案: