我正在分割搜索结果字符串,以便可以使用Rails Highlight突出显示这些术语。在某些情况下,同一搜索词中的和单词会完全匹配,而我正在尝试编写可在一次通过中完成的正则表达式。
search_term = 'pizza cheese "ham and pineapple" pepperoni'
search_term.split(/\W+/)
=> ["pizza", "cheese", "ham", "and", "pineapple", "pepperoni"]
search_term.split(/(?=\")\W+/)
=> ["pizza cheese ", "ham and pineapple", "pepperoni"]
我可以自己获得ham and pineapple
(不需要多余的引号),并且我可以轻松地拆分所有单词,但是是否有一些正则表达式会返回类似以下的数组:
search_term.split(♂️)
=> ["pizza", "cheese", "ham and pineapple", "pepperoni"]
答案 0 :(得分:4)
是:
/"[^"]*?"|\w+/
https://regex101.com/r/fzHI4g/2
未拆分。只需用引号或单个单词收起来……每一个都是一个匹配项。
£ cat pizza
pizza "a and b" pie
£ ruby -ne 'print $_.scan(/"[^"]*?"|\w+/)' pizza
["pizza", "\"a and b\"", "pie"]
£
所以... search_term.scan(/regex/)
似乎返回了您想要的数组。
要排除引号,您需要: 这样会将引号放在环顾四周,断言匹配的表达式在引号前面(从后面),在引号后面(向前)而不是包含引号。
/(?<=")\w[^"]*?(?=")|\w+/
请注意,由于最后一个正则表达式不使用引号,因此它使用空格来确定开始引号和结束引号,因此" a bear"
并不可行。这可以通过捕获组来解决,但是如果这是一个问题,如我在评论中所说,我建议您只对每个数组元素的引号进行修整,并在答案的顶部使用正则表达式。
答案 1 :(得分:1)
r = /
(?<=\") # match a double quote in a positive lookbehind
(?!\s) # next char cannot be a whitespace, negative lookahead
[^"]+ # match one or more characters other than double-quote
(?<!\s) # previous char cannot be a whitespace, negative lookbehind
(?=\") # match a double quote in a positive lookahead
| # or
\w+ # match one or more word characters
/x # free-spacing regex definition mode
str = 'pizza "ham and pineapple" mushroom pepperoni "sausage and anchovies"'
str.scan r
#=> ["pizza", "ham and pineapple", "mushroom", "pepperoni", "sausage and anchovies"]
"sausage and anchovies"]