我有一个字符串
a="Tamilnadu is far away from Kashmir"
如果我使用“ Tamilnadu”分割了这个字符串,那么我找不到Tamilnadu作为数组的一部分,我在那里找到了空字符串,如果我分割了字符串“ away”,则结果数组中不存在away ,它在离开处有空字符串。我应该怎么做,而不要包含空字符串。
示例
a="Tamilnadu is far away from Kashmir"
p a.split("Tamilnadu")
则输出为
["", " is far away from Kashmir"]
但是我想要
["Tamilnadu", " is far away from Kashmir"]
答案 0 :(得分:3)
来自文档:
如果pattern是
Regexp
,则str
会在匹配模式的地方分开。每当模式与零长度字符串匹配时,str
都会被拆分为单个字符。如果pattern包含组,则各自的匹配项也会在数组中返回。
所以...被"Tamilnadu"
分割并保留在列表中,使其成为捕获组:
"Tamilnadu is far away from Kashmir".split(/(Tamilnadu)/)
# => ["", "Tamilnadu", " is far away from Kashmir"]
或者,如果要在之后 "Tamilnadu"
进行拆分,请使用lookbehind在其后进行零宽度匹配:
"Tamilnadu is far away from Kashmir".split(/(?<=Tamilnadu)/)
# => ["Tamilnadu", " is far away from Kashmir"]
答案 1 :(得分:1)
如果您不知道字符串中"Tamilnadu"
的位置,但是想在字符串的前后进行拆分,并且结果数组中没有空字符串,则可以使用String#scan :
def split_it(str, substring)
str.scan(/\A.+(?= #{substring}\b)|\b#{substring}\b|(?<=\b#{substring} ).+/)
end
substring = "Tamilnadu"
split_it("Tamilnadu is far away from Kashmir", substring)
#=> ["Tamilnadu", "is far away from Kashmir"]
split_it("Far away is Tamilnadu from Kashmir", substring)
#=> ["Far away is", "Tamilnadu", "from Kashmir"]
split_it("Far away from Kashmir is Tamilnadu", substring)
#=> ["Far away from Kashmir is", "Tamilnadu"]
split_it("Far away is Daluth from Kashmir", substring)
#=> []
split_it("Far away is Tamilnaduland from Kashmir", substring)
#=> []
我假设substring
在字符串中最多出现一次。
可以以自由间距模式编写正则表达式以使其具有自记录功能:
substring = "Tamilnadu"
/
\A.+ # match the beginning of the string followed by > 0 characters
(?=\ #{substring}\b) # match the value of substring preceded by a space and
# followed by a word break, in a positive lookahead
| # or
\b#{substring}\b # match the value of substring with a word break before and after
| # or
(?<=\b#{substring}\ ) # match the value of substring preceded by a word break
# and followed by a space, in a positive lookbehind
.+ # match > 0 characters
/x # free-spacing regex definition mode
#=>
/
\A.+ # ...
(?=\ Tamilnadu\b) # ...
| # ...
\bTamilnadu\b # ...
| # ...
(?<=\bTamilnadu\ ) # ...
.+ # ...
/x
自由间距模式会在解析正则表达式之前删除所有空格,包括可能打算成为表达式一部分的空格。正是由于这个原因,我逃脱了两个空间。我可以将它们分别放在字符类([ ]
)中,或使用\s
,[[:space:]]
或\p{Space}
,尽管它们匹配空格,但并不完全相同。>