Question

如果我可以使用正则表达式来匹配具有可变数量匹配的字符串。

我要解析的字符串如下：

'Every 15th of the month'
'Every 21st and 28th of the month'
'Every 21st, 22nd and 28th of the month'

ad无限...

我希望能够捕获序数（15日，21日等）

我正在使用的语言是Ruby，因为它的价值。

谢谢，亚历

Answer 1

您可以使用scan将它们捕获到一个数组中，该数组将匹配所有正则表达式：

irb(main):001:0> s = 'every 15th of the month'
=> "every 15th of the month"
irb(main):003:0> s2 = 'every 21st and 28th of the month'
=> "every 21st and 28th of the month"
irb(main):004:0> s3 = 'every 21st, 22nd, and 28th of the month'
=> "every 21st, 22nd, and 28th of the month"
irb(main):006:0> myarray = s3.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["21st"], ["22nd"], ["28th"]]
irb(main):007:0> myarray = s2.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["21st"], ["28th"]]
irb(main):008:0> myarray = s.scan(/(\d{1,2}(?:st|nd|rd|th))/)
=> [["15th"]]
irb(main):009:0>

当然，您可以使用典型的myarray[index]表示法访问每个匹配项（或循环遍历所有匹配项等）。

修改：根据您的评论，我会这样做：

ORDINALS = (1..31).map { |n| ActiveSupport::Inflector::ordinalize n } 
DAY_OF_MONTH_REGEX = /(#{ORDINALS.join('|')})/i
myarray = string.scan(DAY_OF_MONTH_REGEX)

这实际上只会被可能出现在其他短语中的序数引起。试图获得比这更严格的限制可能会非常难看，因为你必须涵盖一堆不同的案例。可能会想出一些东西......但它可能不值得。如果你想用真正细粒度的控件和可变数量的文本来解析字符串，那么说实话，这可能只是正则表达式的工作。如果不知道线条是什么格式，如果它来自具有其他类似线条的文件，如果您对字符串的格式/内容有任何控制等，则很难确定。

使用正则表达式捕获可变数量的匹配？

1 个答案: