假设我有一个字符串"IICCIICCIICBIICCIICDII"
。该字符串的格式为II[CBD][CBD]II[CBD][CBD]II..
。这是一个重复的模式。现在我试图找到满足以下条件的所有重叠子串:
(?<=[CBD]), (?=[CBD])
[C]{m, n}
的内容是什么? 例如,对于至少有2个Cs的模式:CIIC(其中三个),2个C和1个B:CIICBIIC,BIICCIIC
我认为我的问题与其中一个答案中引用的问题类似。我看过那个问题(标题为&#34;最短的重复子串&#34;)。我的问题在于重复模式需要具有特定数量的某些字符的意义。引用的问题只是寻找最短的重复模式。这个问题虽然有用。
如果问题清楚且不重复,请告诉我。 感谢。
答案 0 :(得分:1)
最终分析
解决您的最新评论
正如所怀疑的那样,除非是,否则不能用正则表达式来完成
它可以做计数。具体来说,countin能够重置计数器
回溯时。
只有一个引擎可以做到这一点,它是Perl,不幸的是,
使用Python完成这项任务是不可能的。
我在下面添加Perl正则表达式来执行此操作。只添加它以显示
方法是否要在不使用正则表达式的情况下完成相同的任务
当然可以做到。
对不起,对你来说无非是一种帮助。 - sln
# (?{ $vb=0; $vc=0; $vd=0; })(?=(?![BCD]{2})(?![I])((?:(?:[B][I]*?)(?{ local $vb = $vb+1 })|(?:[C][I]*?)(?{ local $vc = $vc+1 })|(?:[D][I]*?)(?{ local $vd = $vd+1 }))+?)(?(?{$vb >= 2 && $vc >= 5 && $vd >= 2})(?{ $VB=$vb; $VC=$vc; $VD=$vd; })|(?!))(?<![I])(?<![BCD]{2}))
#
(?{ $vb=0; $vc=0; $vd=0; }) # Initialize local counters to zero
(?=
(?! [BCD]{2} ) # App Condition 5a, not start with 2 occurances of BCD
(?! [I] ) # App Condition 1a, not start with I
( # (1 start)
(?: # Cluster group start (App Conditions 2-4)
(?: [B] [I]*? ) # 'B'
(?{ local $vb = $vb+1 }) # Increment local 'B' counter
|
(?: [C] [I]*? ) # 'C'
(?{ local $vc = $vc+1 }) # Increment local 'C' counter
|
(?: [D] [I]*? ) # 'D'
(?{ local $vd = $vd+1 }) # Increment local 'D' counter
)+? # Cluster group end, do the minimum
# to satisfy conditions
) # (1 end)
(?(?{
# Code conditional - the local counters
# must be greater than or equal to these values
$vb >= 2 && $vc >= 5 && $vd >= 2
})
# Yes condition, copy local counters to global vars.
(?{ $VB=$vb; $VC=$vc; $VD=$vd; })
|
# No condition, fail the expression here
# force engine to backtrack (and reset local counters)
(?!)
)
(?<! [I] ) # App Condition 1b, not end with I
(?<! [BCD]{2} ) # App Condition 5b, not end with 2 occurances of BCD
)
Perl测试用例
$str = "IICCIICBIICCIIDCIICCIICDIICCIIBCIICCIICBIICCIIDCIICCIICCIICCII";
print "\n";
print "01234567890123456789012345678901234567890123456789012345678901\n";
print " 1 2 3 4 5 6\n";
print $str,"\n-------------------------------------------------------\n";
FindOverlaps(2,5,2);
FindOverlaps(1,2,0);
FindOverlaps(1,1,0);
FindOverlaps(1,1,1);
FindOverlaps(0,1,1);
FindOverlaps(1,0,1);
sub FindOverlaps
{
($MinB, $MinC, $MinD) = @_;
print "\nB=$MinB, C=$MinC, D=$MinD\n";
while ( $str =~ /
(?{ $vb=0; $vc=0; $vd=0; }) # Initialize local counters to zero
(?=
(?! [BCD]{2} ) # App Condition 5a, not start with 2 occurances of BCD
(?! [I] ) # App Condition 1a, not start with I
( # (1 start)
(?: # Cluster group start (App Conditions 2-4)
(?: [B] [I]*? ) # 'B'
(?{ local $vb = $vb+1 }) # Increment local 'B' counter
|
(?: [C] [I]*? ) # 'C'
(?{ local $vc = $vc+1 }) # Increment local 'C' counter
|
(?: [D] [I]*? ) # 'D'
(?{ local $vd = $vd+1 }) # Increment local 'D' counter
)+? # Cluster group end, do the minimum
# to satisfy conditions
) # (1 end)
(?(?{
# Code conditional - the local counters
# must be greater than or equal to these values
$vb >= $MinB && $vc >= $MinC && $vd >= $MinD
})
# Yes condition, copy local counters to global vars.
(?{ $VB=$vb; $VC=$vc; $VD=$vd; })
|
# No condition, fail the expression here
# force engine to backtrack (and reset local counters)
(?!)
)
(?<! [I] ) # App Condition 1b, not end with I
(?<! [BCD]{2} ) # App Condition 5b, not end with 2 occurances of BCD
)
/xg )
{
print sprintf("found: %-10s %-30s offset = %s\n", "\($VB,$VC,$VD\)", $1, @-[0]);
}
}
输出&gt;&gt;
01234567890123456789012345678901234567890123456789012345678901
1 2 3 4 5 6
IICCIICBIICCIIDCIICCIICDIICCIIBCIICCIICBIICCIIDCIICCIICCIICCII
-------------------------------------------------------
B=2, C=5, D=2
found: (2,10,2) CIICBIICCIIDCIICCIICDIICCIIB offset = 3
found: (2,8,2) BIICCIIDCIICCIICDIICCIIB offset = 7
found: (2,12,2) CIIDCIICCIICDIICCIIBCIICCIICBIIC offset = 11
found: (2,12,2) CIICCIICDIICCIIBCIICCIICBIICCIID offset = 15
found: (2,10,2) CIICDIICCIIBCIICCIICBIICCIID offset = 19
found: (2,8,2) DIICCIIBCIICCIICBIICCIID offset = 23
B=1, C=2, D=0
found: (1,3,0) CIICBIIC offset = 3
found: (1,2,1) BIICCIID offset = 7
found: (1,7,2) CIIDCIICCIICDIICCIIB offset = 11
found: (1,6,1) CIICCIICDIICCIIB offset = 15
found: (1,4,1) CIICDIICCIIB offset = 19
found: (1,2,1) DIICCIIB offset = 23
found: (1,3,0) CIIBCIIC offset = 27
found: (1,5,0) CIICCIICBIIC offset = 31
found: (1,3,0) CIICBIIC offset = 35
found: (1,2,1) BIICCIID offset = 39
B=1, C=1, D=0
found: (1,3,0) CIICBIIC offset = 3
found: (1,1,0) BIIC offset = 7
found: (1,7,2) CIIDCIICCIICDIICCIIB offset = 11
found: (1,6,1) CIICCIICDIICCIIB offset = 15
found: (1,4,1) CIICDIICCIIB offset = 19
found: (1,2,1) DIICCIIB offset = 23
found: (1,1,0) CIIB offset = 27
found: (1,5,0) CIICCIICBIIC offset = 31
found: (1,3,0) CIICBIIC offset = 35
found: (1,1,0) BIIC offset = 39
B=1, C=1, D=1
found: (1,4,1) CIICBIICCIID offset = 3
found: (1,2,1) BIICCIID offset = 7
found: (1,7,2) CIIDCIICCIICDIICCIIB offset = 11
found: (1,6,1) CIICCIICDIICCIIB offset = 15
found: (1,4,1) CIICDIICCIIB offset = 19
found: (1,2,1) DIICCIIB offset = 23
found: (2,7,1) CIIBCIICCIICBIICCIID offset = 27
found: (1,6,1) CIICCIICBIICCIID offset = 31
found: (1,4,1) CIICBIICCIID offset = 35
found: (1,2,1) BIICCIID offset = 39
B=0, C=1, D=1
found: (1,4,1) CIICBIICCIID offset = 3
found: (1,2,1) BIICCIID offset = 7
found: (0,1,1) CIID offset = 11
found: (0,5,1) CIICCIICDIIC offset = 15
found: (0,3,1) CIICDIIC offset = 19
found: (0,1,1) DIIC offset = 23
found: (2,7,1) CIIBCIICCIICBIICCIID offset = 27
found: (1,6,1) CIICCIICBIICCIID offset = 31
found: (1,4,1) CIICBIICCIID offset = 35
found: (1,2,1) BIICCIID offset = 39
found: (0,1,1) CIID offset = 43
B=1, C=0, D=1
found: (1,4,1) CIICBIICCIID offset = 3
found: (1,2,1) BIICCIID offset = 7
found: (1,7,2) CIIDCIICCIICDIICCIIB offset = 11
found: (1,6,1) CIICCIICDIICCIIB offset = 15
found: (1,4,1) CIICDIICCIIB offset = 19
found: (1,2,1) DIICCIIB offset = 23
found: (2,7,1) CIIBCIICCIICBIICCIID offset = 27
found: (1,6,1) CIICCIICBIICCIID offset = 31
found: (1,4,1) CIICBIICCIID offset = 35
found: (1,2,1) BIICCIID offset = 39
<强>(旧)强>
我认为这是你用正则表达式做的最好的
修改 - 针对新条件5进行了修改。
# String:
# (?=(?![BCD]{2})(?![I])((?:[B][IDC]*?){1}(?:[C][IDB]*?){2}(?:[D][IBC]*?){0}|(?:[C][IDB]*?){2}(?:[D][IBC]*?){0}(?:[B][IDC]*?){1}|(?:[D][IBC]*?){0}(?:[B][IDC]*?){1}(?:[C][IDB]*?){2}|(?:[C][IDB]*?){2}(?:[B][IDC]*?){1}(?:[D][IBC]*?){0})(?<![I])(?<![BCD]{2}))
# Example: Finds 1-B, 2-C's
(?=
(?! [BCD]{2} ) # Condition 5a, not start with 2 occurances of BCD
(?! [I] ) # Condition 1a, not start with I (not really necessary here)
( # (1 start), Conditions 2-4
(?: [B] [IDC]*? ){1}
(?: [C] [IDB]*? ){2}
(?: [D] [IBC]*? ){0}
|
(?: [C] [IDB]*? ){2}
(?: [D] [IBC]*? ){0}
(?: [B] [IDC]*? ){1}
|
(?: [D] [IBC]*? ){0}
(?: [B] [IDC]*? ){1}
(?: [C] [IDB]*? ){2}
|
(?: [C] [IDB]*? ){2}
(?: [B] [IDC]*? ){1}
(?: [D] [IBC]*? ){0}
) # (1 end)
(?<! [I] ) # Condition 1b, not end with I
(?<! [BCD]{2} ) # Condition 5b, not end with 2 occurances of BCD
)
Perl测试用例
$str = "IICCIICCIICBIICCIICDIIDIICCIIB";
print "\n";
print "012345678911234567892123456789\n";
print " + + \n";
print $str,"\n------------------------------\n";
($B,$C,$D) = (1,2,0);
FindOverlaps();
($B,$C,$D) = (1,1,0);
FindOverlaps();
($B,$C,$D) = (1,1,1);
FindOverlaps();
($B,$C,$D) = (0,1,1);
FindOverlaps();
($B,$C,$D) = (1,0,1);
FindOverlaps();
sub FindOverlaps
{
print "\nB=$B, C=$C, D=$D\n";
while ( $str =~ /(?=(?![BCD]{2})(?![I])((?:[B][IDC]*?){$B}(?:[C][IDB]*?){$C}(?:[D][IBC]*?){$D}|(?:[C][IDB]*?){$C}(?:[D][IBC]*?){$D}(?:[B][IDC]*?){$B}|(?:[D][IBC]*?){$D}(?:[B][IDC]*?){$B}(?:[C][IDB]*?){$C}|(?:[C][IDB]*?){$C}(?:[B][IDC]*?){$B}(?:[D][IBC]*?){$D})(?<![I])(?<![BCD]{2}))/g )
{
print "found: '$1' \t offset = @-[0]\n";
}
}
输出&gt;&gt;
012345678911234567892123456789
+ +
IICCIICCIICBIICCIICDIIDIICCIIB
------------------------------
B=1, C=2, D=0
found: 'CIICBIIC' offset = 7
found: 'BIICCIIC' offset = 11
B=1, C=1, D=0
found: 'BIIC' offset = 11
found: 'CIIB' offset = 26
B=1, C=1, D=1
found: 'BIICCIICDIID' offset = 11
B=0, C=1, D=1
found: 'DIIC' offset = 22
B=1, C=0, D=1
found: 'BIICCIICDIID' offset = 11
found: 'DIICCIIB' offset = 22