给定一个字符串,我想找到一个单词bar,b-a-r,b-a-r等,其中 - 可以是任何字母。但字母之间的间隔必须相同。
所有字母都是小写字母,之间没有间隙。
例如bar
,beayr
,qbowarprr
,wbxxxxxayyyyyrzzz
应与此匹配。
我尝试/b[a-z]*a[a-z]*r/
,但这与bxar
匹配,这是错误的。
我想知道我是否用regexp实现了这个目标?
答案 0 :(得分:1)
这是获得所有比赛的一种方式。
<强>代码强>
def all_matches_with_spacers(word, str)
word_size = word.size
word_arr = word.chars
str_arr = str.chars
(0..(str.size - word_size)/(word_size-1)).each_with_object([]) do |n, arr|
regex = Regexp.new(word_arr.join(".{#{n}}"))
str_arr.each_cons(word_size + n * (word_size - 1))
.map(&:join)
.each { |substring| arr << substring if substring =~ regex }
end
end
这需要word.size > 1
。
示例强>
all_matches_with_spacers('bar', 'bar') #=> ["bar"]
all_matches_with_spacers('bar', 'beayr') #=> ["beayr"]
all_matches_with_spacers('bar', 'qbowarprr') #=> ["bowarpr"]
all_matches_with_spacers('bar', 'wbxxxxxayyyyyrzzz') #=> ["bxxxxxayyyyyr"]
all_matches_with_spacers('bobo', 'bobobocbcbocbcobcodbddoddbddobddoddbddob')
#=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
<强>解释强>
假设
word = 'bobo'
str = 'bobobocbcbocbcobcodbddoddbddobddoddbddob'
然后
word_size = word.size #=> 4
word_arr = word.chars #=> ["b", "o", "b", "o"]
str_arr = str.chars
#=> ["b", "o", "b", "o", "b", "o", "c", "b", "c", "b", "o", "c", "b", "c",
# "o", "b", "c", "o", "d", "b", "d", "d", "o", "d", "d", "b", "d", "d",
# "o", "b", "d", "d", "o", "d", "d", "b", "d", "d", "o", "b"]
如果n
是word
每个字母之间的间隔符数,我们需要
word.size + n * (word.size - 1) <= str.size
因此(自str.size => 40
),
n <= (str.size - word_size)/(word_size-1) #=> (40-4)/(4-1) => 12
因此,我们将迭代0到12个间隔:
(0..12).each_with_object([]) do |n, arr| .. end
Enumerable#each_with_object创建一个由块变量arr
表示的初始空数组。传递给块的第一个值是零(间隔符),分配给块变量n
。
然后我们
regex = Regexp.new(word_arr.join(".{#{0}}")) #=> /b.{0}o.{0}b.{0}o/
与/bar/
相同。 word
n
个word_size + n * (word_size - 1) #=> 19
间隔符的长度为
str_arr
要使用此长度提取str_arr.each_cons(word_size + n * (word_size - 1))
的所有子数组,我们调用:
n = 0
在这里,使用enum = str_arr.each_cons(4)
#=> #<Enumerator: ["b", "o", "b", "o", "b", "o",...,"b"]:each_cons(4)>
,这是:
enum.to_a
#=> [["b", "o", "b", "o"], ["o", "b", "o", "b"], ["b", "o", "b", "o"],
# ["o", "b", "o", "c"], ["b", "o", "c", "b"], ["o", "c", "b", "c"],
# ["c", "b", "c", "b"], ["b", "c", "b", "o"], ["c", "b", "o", "c"],
# ["b", "o", "c", "b"], ["o", "c", "b", "c"], ["c", "b", "c", "o"],
# ["b", "c", "o", "b"], ["c", "o", "b", "c"], ["o", "b", "c", "o"]]
此枚举器将以下内容传递给其块:
ar = enum.map(&:join)
#=> ["bobo", "obob", "bobo", "oboc", "bocb", "ocbc", "cbcb", "bcbo",
# "cboc", "bocb", "ocbc", "cbco", "bcob", "cobc", "obco"]
我们接下来将这些转换为字符串:
substring
并将每个(分配给块变量arr
)添加到数组substring =~ regex
ar.each { |substring| arr << substring if substring =~ regex }
arr => ["bobo", "bobo"]
中:
n = 1
接下来,我们将间隔符的数量增加到regex = Regexp.new(word_arr.join(".{#{1}}")) #=> /b.{1}o.{1}b.{1}o/
str_arr.each_cons(4 + 1 * (4 - 1)) #=> str_arr.each_cons(7)
。这具有以下效果:
ar = str_arr.each_cons(7).map(&:join)
#=> ["boboboc", "obobocb", "bobocbc", "obocbcb", "bocbcbo", "ocbcboc",
# "cbcbocb", "bcbocbc", "cbocbco", "bocbcob", "ocbcobc", "cbcobco",
# "bcobcod", "cobcodb", "obcodbd", "bcodbdd", "codbddo", "odbddod",
# "dbddodd", "bddoddb", "ddoddbd", "doddbdd", "oddbddo", "ddbddob",
# "dbddobd", "bddobdd", "ddobddo", "dobddod", "obddodd", "bddoddb",
# "ddoddbd", "doddbdd", "oddbddo", "ddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
所以我们现在检查字符串
arr
与一个间隔符没有匹配项,因此arr #=> ["bobo", "bobo"]
保持不变:
n = 2
对于regex = Regexp.new(word_arr.join(".{#{2}}")) #=> /b.{2}o.{2}b.{2}o/
str_arr.each_cons(4 + 2 * (4 - 1)) #=> str_arr.each_cons(10)
ar = str_arr.each_cons(10).map(&:join)
#=> ["bobobocbcb", "obobocbcbo", "bobocbcboc", "obocbcbocb", "bocbcbocbc",
# "ocbcbocbco", "cbcbocbcob", "bcbocbcobc", "cbocbcobco", "bocbcobcod",
# ...
# "ddoddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
arr #=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
太空人:
["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
找不到两个以上间隔符的匹配项,因此该方法返回
{{1}}
答案 1 :(得分:0)
作为参考,对于正则表达式中可用的整体问题,有一个漂亮的解决方案,允许捕获组引用自身:
^[^b]*bar|b(?:[^a](?=[^a]*a(\1?+.)))+a\1r
有趣的一点是在交替的右侧。在匹配初始b之后,我们为b和a之间的字符定义一个非捕获组。将使用+
重复此组。在a和r之间,我们将使用\
1`注入捕获组1。这个组一次被捕获一个角色,每次传递都会覆盖自己,因为b和a之间的每个角色都被添加了。
请参阅Quantifier Capture @CasimiretHippolyte演示解决方案的地方,他提到了技术背后的想法&#34; qtax技巧&#34;。