我目前正在努力想出一个正则表达式,它可以将一个字符串拆分成单词,其中单词被定义为由空格包围的字符序列,或者用双引号括起来。我正在使用String#scan
例如,字符串:
' hello "my name" is "Tom"'
应匹配单词:
hello
my name
is
Tom
我设法使用双引号括起来:
/"([^\"]*)"/
但是我无法弄清楚如何将空白字符包围起来以获得'hello','is'和'Tom',同时又不会搞砸'我的名字'。
对此有任何帮助将不胜感激!
答案 0 :(得分:22)
result = ' hello "my name" is "Tom"'.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/)
会为你效劳。它会打印
=> ["", "hello", "\"my name\"", "is", "\"Tom\""]
只需忽略空字符串。
<强>解释强>
"
\\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
(?: # Match the regular expression below
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\$ # Assert position at the end of a line (at the end of the string or before a line break character)
)
"
您可以像这样使用reject
来避免空字符串
result = ' hello "my name" is "Tom"'
.split(/\s+(?=(?:[^"]*"[^"]*")*[^"]*$)/).reject {|s| s.empty?}
打印
=> ["hello", "\"my name\"", "is", "\"Tom\""]
答案 1 :(得分:4)
text = ' hello "my name" is "Tom"'
text.scan(/\s*("([^"]+)"|\w+)\s*/).each {|match| puts match[1] || match[0]}
产地:
hello
my name
is
Tom
说明:
0或更多空格后跟
或者
双引号中的一些单词OR
一个单词
后跟0或更多空格
答案 2 :(得分:1)