Question

我需要捕获可能包含某些文本的单词，例如，我需要捕获Microsoft和office（如果它们存在于此文本中）

Microsoft have lunched her product office in 2003

我使用这个正则表达式

(?mix:(microsoft).{1,100}(office)?.{0,100}(2003)?)

但它不占领办公室。它认为它是介于两者之间的100个字符之一。

Answer 1

这可能对您有用：

regexp = /\A(Microsoft).*?(office)?(?:\s+\w+\s+)?(\d+)?\Z/i

http://rubular.com/r/K8SsjJcfjT

Answer 2

Safouen，这里有两个选项。

首先，如果你想像之前一样指定最多100个字符，请使用：

(?mi)(microsoft).{1,100}?(?:(office).{1,100}?(?:(2003)|$)|$)

它将捕获microsoft，以及可选的office和2003（如果存在）。有各种各样的写作方式，这只是我想到的那个。

其次，如果你不在乎两者之间有多少个字符，只需用*代替{1,100}：

(?mi)(microsoft).*?(?:(office).*?(?:(2003)|$)|$)

要检查Ruby中的匹配项，它可能如下所示：

subject.scan(/(?mi)(microsoft).*?(?:(office).*?(?:(2003)|$)|$)/) {|result|
    # If the regex has capturing groups, result is an array with the text matched by each group (but without the overall match)
    # If the regex has no capturing groups, result is a string with the overall regex match
}

如果您有疑问，请告诉我。

正则表达式不会捕获可选组

2 个答案: