如何通过Regex匹配bar,b-a-r,bar等字符串

时间:2014-05-03 06:42:36

标签: ruby regex

给定一个字符串,我想找到一个单词bar,b-a-r,b-a-r等,其中 - 可以是任何字母。但字母之间的间隔必须相同。

所有字母都是小写字母,之间没有间隙。

例如barbeayrqbowarprrwbxxxxxayyyyyrzzz应与此匹配。

我尝试/b[a-z]*a[a-z]*r/,但这与bxar匹配,这是错误的。

我想知道我是否用regexp实现了这个目标?

2 个答案:

答案 0 :(得分:1)

这是获得所有比赛的一种方式。

<强>代码

def all_matches_with_spacers(word, str)
  word_size = word.size
  word_arr = word.chars
  str_arr  = str.chars
  (0..(str.size - word_size)/(word_size-1)).each_with_object([]) do |n, arr|
    regex = Regexp.new(word_arr.join(".{#{n}}"))
    str_arr.each_cons(word_size + n * (word_size - 1))
           .map(&:join)
           .each { |substring| arr << substring if substring =~ regex }
  end
end

这需要word.size > 1

示例

all_matches_with_spacers('bar',  'bar')               #=> ["bar"]
all_matches_with_spacers('bar',  'beayr')             #=> ["beayr"]
all_matches_with_spacers('bar',  'qbowarprr')         #=> ["bowarpr"]
all_matches_with_spacers('bar',  'wbxxxxxayyyyyrzzz') #=> ["bxxxxxayyyyyr"]

all_matches_with_spacers('bobo', 'bobobocbcbocbcobcodbddoddbddobddoddbddob')
  #=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]

<强>解释

假设

word = 'bobo'
str =  'bobobocbcbocbcobcodbddoddbddobddoddbddob'

然后

word_size = word.size  #=> 4
word_arr  = word.chars #=> ["b", "o", "b", "o"]
str_arr = str.chars
  #=> ["b", "o", "b", "o", "b", "o", "c", "b", "c", "b", "o", "c", "b", "c",
  #    "o", "b", "c", "o", "d", "b", "d", "d", "o", "d", "d", "b", "d", "d",
  #    "o", "b", "d", "d", "o", "d", "d", "b", "d", "d", "o", "b"]

如果nword每个字母之间的间隔符数,我们需要

word.size + n * (word.size - 1) <= str.size

因此(自str.size => 40),

n <= (str.size - word_size)/(word_size-1) #=> (40-4)/(4-1) => 12

因此,我们将迭代0到12个间隔:

(0..12).each_with_object([]) do |n, arr| .. end

Enumerable#each_with_object创建一个由块变量arr表示的初始空数组。传递给块的第一个值是零(间隔符),分配给块变量n

然后我们

regex = Regexp.new(word_arr.join(".{#{0}}")) #=> /b.{0}o.{0}b.{0}o/

/bar/相同。 word nword_size + n * (word_size - 1) #=> 19 间隔符的长度为

str_arr

要使用此长度提取str_arr.each_cons(word_size + n * (word_size - 1)) 的所有子数组,我们调用:

n = 0

在这里,使用enum = str_arr.each_cons(4) #=> #<Enumerator: ["b", "o", "b", "o", "b", "o",...,"b"]:each_cons(4)> ,这是:

enum.to_a
  #=> [["b", "o", "b", "o"], ["o", "b", "o", "b"], ["b", "o", "b", "o"],
  #    ["o", "b", "o", "c"], ["b", "o", "c", "b"], ["o", "c", "b", "c"],
  #    ["c", "b", "c", "b"], ["b", "c", "b", "o"], ["c", "b", "o", "c"],
  #    ["b", "o", "c", "b"], ["o", "c", "b", "c"], ["c", "b", "c", "o"],
  #    ["b", "c", "o", "b"], ["c", "o", "b", "c"], ["o", "b", "c", "o"]]

此枚举器将以下内容传递给其块:

ar = enum.map(&:join)
  #=> ["bobo", "obob", "bobo", "oboc", "bocb", "ocbc", "cbcb", "bcbo",
  #    "cboc", "bocb", "ocbc", "cbco", "bcob", "cobc", "obco"]

我们接下来将这些转换为字符串:

substring

并将每个(分配给块变量arr)添加到数组substring =~ regex ar.each { |substring| arr << substring if substring =~ regex } arr => ["bobo", "bobo"] 中:

n = 1

接下来,我们将间隔符的数量增加到regex = Regexp.new(word_arr.join(".{#{1}}")) #=> /b.{1}o.{1}b.{1}o/ str_arr.each_cons(4 + 1 * (4 - 1)) #=> str_arr.each_cons(7) 。这具有以下效果:

ar = str_arr.each_cons(7).map(&:join)
  #=> ["boboboc", "obobocb", "bobocbc", "obocbcb", "bocbcbo", "ocbcboc",
  #    "cbcbocb", "bcbocbc", "cbocbco", "bocbcob", "ocbcobc", "cbcobco",
  #    "bcobcod", "cobcodb", "obcodbd", "bcodbdd", "codbddo", "odbddod",
  #    "dbddodd", "bddoddb", "ddoddbd", "doddbdd", "oddbddo", "ddbddob",
  #    "dbddobd", "bddobdd", "ddobddo", "dobddod", "obddodd", "bddoddb",
  #    "ddoddbd", "doddbdd", "oddbddo", "ddbddob"]

ar.each { |substring| arr << substring if substring =~ regex }

所以我们现在检查字符串

arr

与一个间隔符没有匹配项,因此arr #=> ["bobo", "bobo"] 保持不变:

n = 2

对于regex = Regexp.new(word_arr.join(".{#{2}}")) #=> /b.{2}o.{2}b.{2}o/ str_arr.each_cons(4 + 2 * (4 - 1)) #=> str_arr.each_cons(10) ar = str_arr.each_cons(10).map(&:join) #=> ["bobobocbcb", "obobocbcbo", "bobocbcboc", "obocbcbocb", "bocbcbocbc", # "ocbcbocbco", "cbcbocbcob", "bcbocbcobc", "cbocbcobco", "bocbcobcod", # ... # "ddoddbddob"] ar.each { |substring| arr << substring if substring =~ regex } arr #=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"] 太空人:

["bobo", "bobo", "bddoddbddo", "bddoddbddo"]

找不到两个以上间隔符的匹配项,因此该方法返回

{{1}}

答案 1 :(得分:0)

作为参考,对于正则表达式中可用的整体问题,有一个漂亮的解决方案,允许捕获组引用自身:

^[^b]*bar|b(?:[^a](?=[^a]*a(\1?+.)))+a\1r

遗憾的是,Ruby并不允许这样做。

有趣的一点是在交替的右侧。在匹配初始b之后,我们为b和a之间的字符定义一个非捕获组。将使用+重复此组。在a和r之间,我们将使用\ 1`注入捕获组1。这个组一次被捕获一个角色,每次传递都会覆盖自己,因为b和a之间的每个角色都被添加了。

请参阅Quantifier Capture @CasimiretHippolyte演示解决方案的地方,他提到了技术背后的想法&#34; qtax技巧&#34;。