正则表达式在字符数上分割字符串,但要获得整个单词

时间:2016-08-17 12:22:39

标签: regex

我目前正在使用正则表达式将字符串拆分为15个字符的子字符串:

(?<=\G.{15})

示例文字: 第一个第二个第三个第四个第五个第六个第六个第六个第六个第七个第七个

分成:

[0] => First second th
[1] => ird fourth fifth
[2] =>  sixthsixthsixth
[3] => sixthsixthsixths
[4] => ixth seventh

我想稍微修改一下: 分成15个字符或更少,但只能在空格上分割才能得到整个单词。 如果#1中的分割中有一个长度超过15个字符的单词,则将其拆分。

但这可能会变得混乱。如果我有一个超过15个字符的单词,我希望将该单词拆分,然后下面的子字符串也应该是15个字符,而不仅仅是单词的后半部分。

对于上面的例子,我想:

[0] => First second 
[1] => third fourth 
[2] => fifth
[3] => sixthsixthsixth
[4] => sixthsixthsixth 
[5] => sixth seventh

我也很满意:

[0] => First second 
[1] => third fourth 
[2] => fifth sixthsixt
[3] => hsixthsixthsixt
[4] => hsixthsixth 
[5] => seventh

如果前两个不能在一个正则表达式中完成,那么我可能会满意:

[0] => First second 
[1] => third fourth 
[2] => fifth
[3] => sixthsixthsixth
[4] => sixthsixthsixth 
[5] => sixth 
[6] => seventh

两者之间的区别在于我将长字符串拆分为不同的位置。

是否可以使用单个正则表达式执行此操作?

1 个答案:

答案 0 :(得分:1)

了解语言可以更清楚地了解应该使用哪些令牌和构造。如果您使用的是Ruby v2.0或更高版本,则可以使用此版本:

(.{1,15}\b|.{15})\K(?: +|\B|\Z)

通过用换行符\n替换匹配,您将获得所需的字符串分割:

First second
third fourth
fifth
sixthsixthsixth
sixthsixthsixth
sixth seventh

Live demo

如果您只需将它们作为捕获组的数组,那么有一个更短的方法:

(.{1,15}\b|.{15})

Live demo

说明:

  (           # Begin capturing group (1)
    .{1,15}   #   Match 15 characters max (greedy)
    \b        #   Till reaching a word boundary
    |         #   Or
    .{15}     #   Match those parts of a long word
  )           # End of (1)

  \K          # Reset whatever is matched so far

  (?:         # Begin non-capturing group
     +        #   Match white-spaces
    |         #   Or
    \B        #   A non-word boudanry
    |         #   Or
    \Z        #   End of string
  )           # End of non-capturing group