如何使用REGEX在Ruby中查找范围内的一组唯一单词?

时间:2013-12-20 12:47:46

标签: ruby regex jruby

我正在寻找一个符合以下要求的正则表达式:

1)必须充当“AND”声明

2)两个词应​​该在彼此的范围内

3)它不计算两个相同的单词。

到目前为止,我有这个工作的REGEX,它满足1和2。

/(word1|word2)(?:\W+\w+){0,3}?\W+(word1|word2)/i

示例正则表达式:
/(cat|dog)(?:\W+\w+){0,3}?\W+(cat|dog)/i

现在有效的字符串

  • 猫害怕另一只猫。

  • 猫喜欢狗。

  • 狗喜欢这只猫。

  • 狗讨厌狗。

我不想要的字符串

  • 猫害怕另一只猫。

  • 狗讨厌狗。

诸如“猫害怕另一只猫”之类的短语。将匹配此REGEX,因为它正在搜索第二个分组中的任何单词,其中包括cat。但是,我不想让它自己搜索。我只想搜索它。

1 个答案:

答案 0 :(得分:2)

怎么样:

/(cat|dog)(?:\W+\w+){0,3}?\W+(?!\1)(cat|dog)/

<强>解释

The regular expression:

(?-imsx:(cat|dog)(?:\W+\w+){0,3}?\W+(?!\1)(cat|dog))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    cat                      'cat'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    dog                      'dog'
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (between 0 and 3
                           times (matching the least amount
                           possible)):
----------------------------------------------------------------------
    \W+                      non-word characters (all but a-z, A-Z,
                             0-9, _) (1 or more times (matching the
                             most amount possible))
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  ){0,3}?                  end of grouping
----------------------------------------------------------------------
  \W+                      non-word characters (all but a-z, A-Z, 0-
                           9, _) (1 or more times (matching the most
                           amount possible))
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    \1                       what was matched by capture \1
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    cat                      'cat'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    dog                      'dog'
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------