将字符串拆分为单词和标点符号,但不拆分内部标点符号

时间:2019-01-09 20:48:21

标签: java string

我有一个字符串There is a boy's puppy. Really?。我需要找到外部词句,并将其从附加词中分离出来,然后将其视为另一个词。输出为:

  • boy's是一个字(内部标点符号)
  • puppy.是两个词,puppy.
  • Really?是两个词,Really?

我已在代码中根据外部标点将单词拆分,但我希望将它们作为单独的单词。

String[] Res = word.split("[\\p{Punct}\\s]+");

我该怎么做?

1 个答案:

答案 0 :(得分:1)

What you want to do with your reg ex is using a non-capturing group so that it becomes part of the output, so in the reg ex I have two groups separated by an OR (|) where the first is capturing and the second one is non-capturing. I am not sure I've included all external punctuation you wanted in my non-capturing group, (?=X).

String word = "There is a boy's puppy. Really?";
String[] res = word.split("(\\s+)|(?=[\\.\\?])");

for (String s: res ) {
    System.out.print("[" + s + "]");
} 

Output is

[There][is][a][boy's][puppy][.][Really][?]