分割字符串会遗漏用于分割字符串的单词

时间:2019-11-08 06:13:53

标签: ruby

我有一个字符串

a="Tamilnadu is far away from Kashmir"

如果我使用“ Tamilnadu”分割了这个字符串,那么我找不到Tamilnadu作为数组的一部分,我在那里找到了空字符串,如果我分割了字符串“ away”,则结果数组中不存在away ,它在离开处有空字符串。我应该怎么做,而不要包含空字符串。

示例

a="Tamilnadu is far away from Kashmir"

p a.split("Tamilnadu")

则输出为

["", " is far away from Kashmir"]

但是我想要

["Tamilnadu", " is far away from Kashmir"]

2 个答案:

答案 0 :(得分:3)

来自文档:

  

如果pattern是Regexp,则str会在匹配模式的地方分开。每当模式与零长度字符串匹配时,str都会被拆分为单个字符。如果pattern包含组,则各自的匹配项也会在数组中返回。

所以...被"Tamilnadu"分割并保留在列表中,使其成为捕获组:

"Tamilnadu is far away from Kashmir".split(/(Tamilnadu)/)
# => ["", "Tamilnadu", " is far away from Kashmir"]

或者,如果要在之后 "Tamilnadu"进行拆分,请使用lookbehind在其后进行零宽度匹配:

"Tamilnadu is far away from Kashmir".split(/(?<=Tamilnadu)/)
# => ["Tamilnadu", " is far away from Kashmir"]

答案 1 :(得分:1)

如果您不知道字符串中"Tamilnadu"的位置,但是想在字符串的前后进行拆分,并且结果数组中没有空字符串,则可以使用String#scan

def split_it(str, substring)
  str.scan(/\A.+(?= #{substring}\b)|\b#{substring}\b|(?<=\b#{substring} ).+/)
end

substring = "Tamilnadu"

split_it("Tamilnadu is far away from Kashmir", substring)
  #=> ["Tamilnadu", "is far away from Kashmir"] 
split_it("Far away is Tamilnadu from Kashmir", substring)
  #=> ["Far away is", "Tamilnadu", "from Kashmir"] 
split_it("Far away from Kashmir is Tamilnadu", substring)
  #=> ["Far away from Kashmir is", "Tamilnadu"] 
split_it("Far away is Daluth from Kashmir", substring)
  #=> []
split_it("Far away is Tamilnaduland from Kashmir", substring)
  #=> []

我假设substring在字符串中最多出现一次。

可以以自由间距模式编写正则表达式以使其具有自记录功能:

substring = "Tamilnadu"

/
\A.+                  # match the beginning of the string followed by > 0 characters     
(?=\ #{substring}\b)  # match the value of substring preceded by a space and
                      # followed by a word break, in a positive lookahead
|                     # or
\b#{substring}\b      # match the value of substring with a word break before and after
|                     # or
(?<=\b#{substring}\ ) # match the value of substring preceded by a word break 
                      # and followed by a space, in a positive lookbehind
.+                    # match > 0 characters
/x                    # free-spacing regex definition mode
  #=>
  /
  \A.+                  # ...
  (?=\ Tamilnadu\b)     # ...
  |                     # ...
  \bTamilnadu\b         # ...
  |                     # ...
  (?<=\bTamilnadu\ )    # ...
  .+                    # ...
  /x

自由间距模式会在解析正则表达式之前删除所有空格,包括可能打算成为表达式一部分的空格。正是由于这个原因,我逃脱了两个空间。我可以将它们分别放在字符类([ ])中,或使用\s[[:space:]]\p{Space},尽管它们匹配空格,但并不完全相同。