irb(main):161:0> "Ready for your my next session?".scan(/[A-Za-z]+|\d+|. /)
=> ["Ready", "for", "your", "my", "next", "session"]
=> ["Ready", "for", "your", "my", "next", "session", "?"] #==> EXPECTED
irb(main):162:0> "yo mr. menon how are you? call at 9 a.m. \"okay\"".scan(/[A-Za-z]+|\d+|. /)
=> ["yo", "mr", ". ", "menon", "how", "are", "you", "? ", "call", "at", "9", "a", "m", ". ", "okay"]
=> ["yo", "mr", ". ", "menon", "how", "are", "you", "? ", "call", "at", "9", "a",".", "m", ".", "``", "okay", "''"] #==> EXPECTED
我正在尝试使用此scan(/[A-Za-z]+|\d+|. /)
来标记字符串甚至是标点符号,即使字符串中存在转义引号,\"
但它在字符串的不同结构上表现不同?怎么纠正?
答案 0 :(得分:1)
r = /
(?: # begin a non-capture group
\"? # optionally (?) match a double-quote
\p{alpha}+ # match one or more letters
\"? # optionally (?) match a double-quote
) # end non-capture group
| # or
\d+ # match one or more digits
| # or
[.,?!:;] # match a punctuation mark
/x # free-spacing regex definition mode
"yo mr. menon how are you? call at 9 a.m. \"okay\"".scan(r)
#=> ["yo", "mr", ".", "menon", "how", "are", "you", "?", "call", "at", "9",
# "a", ".", "m", ".", "\"okay\""]
puts "\"okay\""
# "okay"
正则表达式通常是
/(?:\"?\p{alpha}+\"?)|\d+|[.,?!:;]/