当提供正则表达式时,Java中的String.split()方法究竟是如何工作的?

时间:2014-03-07 20:08:24

标签: java regex split ocpjp

我正在准备OCPJP考试,我遇到了以下示例:

class Test {
   public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\\S");
      System.out.println(tokens.length);
   }
}

此代码打印16.我期待像no_of_characters + 1这样的东西。有人可以解释一下,split()方法在这种情况下实际上做了什么?我只是不明白......

1 个答案:

答案 0 :(得分:13)

它在正则表达式引擎中表示"\\S"非空白字符的每个\S上分裂。

因此,我们尝试在非空格("x x")上拆分\S。由于这个正则表达式可以匹配一个字符,我们可以迭代它们以标记拆分位置(我们将使用管道|)。

  • 'x'非空白?是的,所以我们将其标记为| x
  • ' '非空白?不,所以我们保持原样
  • 是最后'x'非空白?是的,所以我们将其标记为| |

因此,我们需要在开始和结束时拆分我们的字符串,最初给出结果数组

["", " ", ""]
   ^    ^ - here we split

但是由于删除了尾随空字符串,结果将是

[""," "]     <- result
        ,""] <- removed trailing empty string

所以split返回数组["", " "],它只包含两个元素。

顺便说一句。要关闭删除最后一个空字符串,您需要使用split(regex,limit),其值为负值split("\\S",-1)


现在让我们回到你的例子。如果你的数据是分裂的

I am preparing for OCPJP
| || ||||||||| ||| |||||

表示

 ""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""

所以这代表了这个数组

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  

但由于尾随空字符串""被删除(如果它们的存在是由分裂引起的 - 更多信息请参见:Confusing output from String.split

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  
                                                     ^^ ^^ ^^ ^^ ^^

你得到的结果数组只包含这部分:

[""," ",""," ","","","","","","","",""," ","",""," "]  

正好是16个元素。