匹配正则表达式中的确切短语和单词

时间:2018-08-17 01:49:47

标签: ruby regex

我正在分割搜索结果字符串,以便可以使用Rails Highlight突出显示这些术语。在某些情况下,同一搜索词中的单词会完全匹配,而我正在尝试编写可在一次通过中完成的正则表达式。

search_term = 'pizza cheese "ham and pineapple" pepperoni'

search_term.split(/\W+/)
=> ["pizza", "cheese", "ham", "and", "pineapple", "pepperoni"]

search_term.split(/(?=\")\W+/)
=> ["pizza cheese ", "ham and pineapple", "pepperoni"]

我可以自己获得ham and pineapple(不需要多余的引号),并且我可以轻松地拆分所有单词,但是是否有一些正则表达式会返回类似以下的数组:

search_term.split(‍♂️)
=> ["pizza", "cheese", "ham and pineapple", "pepperoni"]

2 个答案:

答案 0 :(得分:4)

是:

/"[^"]*?"|\w+/

https://regex101.com/r/fzHI4g/2

未拆分。只需用引号或单个单词收起来……每一个都是一个匹配项。

£ cat pizza
pizza "a and b" pie
£ ruby -ne 'print $_.scan(/"[^"]*?"|\w+/)' pizza
["pizza", "\"a and b\"", "pie"]
£

所以... search_term.scan(/regex/)似乎返回了您想要的数组。

要排除引号,您需要: 这样会将引号放在环顾四周,断言匹配的表达式在引号前面(从后面),在引号后面(向前)而不是包含引号。

/(?<=")\w[^"]*?(?=")|\w+/

请注意,由于最后一个正则表达式不使用引号,因此它使用空格来确定开始引号和结束引号,因此" a bear"并不可行。这可以通过捕获组来解决,但是如果这是一个问题,如我在评论中所说,我建议您只对每个数组元素的引号进行修整,并在答案的顶部使用正则表达式。

答案 1 :(得分:1)

r = /
    (?<=\") # match a double quote in a positive lookbehind
    (?!\s)  # next char cannot be a whitespace, negative lookahead
    [^"]+   # match one or more characters other than double-quote  
    (?<!\s) # previous char cannot be a whitespace, negative lookbehind
    (?=\")  # match a double quote in a positive lookahead
    |       # or
    \w+     # match one or more word characters
    /x      # free-spacing regex definition mode

str = 'pizza "ham and pineapple" mushroom pepperoni "sausage and anchovies"'

str.scan r
  #=> ["pizza", "ham and pineapple", "mushroom", "pepperoni", "sausage and anchovies"]
       "sausage and anchovies"]