我正在python中编写一个解决谜题的应用程序。我正在搜索一些文本中的字符组合,如果我有一组字符[abcd],那么我需要在文本中找到只包含字符abcd的子字符串,并且每个字符串必须至少包含一个字符串字符 - 以便字符abcd匹配dcba或abbcdd,但不匹配acd,bbcd或abced。如果使用正则表达式[abcd] +那么我将得到不包含每个字符的子串
答案 0 :(得分:4)
为什么要在这里使用正则表达式?
def hasChars(search_string, chars):
return all(x in search_string for x in chars)
>>> hasChars('aaabcd', 'abc')
True
答案 1 :(得分:2)
如果字符串必须至少包含abcd,但可以包含其他字符串,那么这将起作用
(?=.*a)(?=.*b)(?=.*c)(?=.*d)
如果,他们只能包含abcd,那么这可能更好
^(?=.*a)(?=.*b)(?=.*c)(?=.*d)[abcd]+$
<强>更新强>
要回答你的问题,如果你正在寻找一个浮动版本,这可以做你想要的:
(?=([abcd]{4,}))(?=[bcd]*a)(?=[acd]*b)(?=[abd]*c)(?=[abc]*d)\1
扩展:
# At POSition
(?= # Lookahead
( # Capture grp 1
[abcd]{4,} # Get 4 or more (greedy) 'a' or 'b' or 'c' or 'd' characters
)
)
(?= # Lookahead, check for 'a' (still at POS)
[bcd]*a # 0 or more [bcd]'s then 'a'
)
(?= # Lookahead, check for 'b' (still at POS)
[acd]*b # 0 or more [acd]'s then 'b'
)
(?= # Lookahead, check for 'c' (still at POS)
[abd]*c # 0 or more [abd]'s then 'c'
)
(?= # Lookahead, check for 'd' (still at POS)
[abc]*d # 0 or more [abc]'s then 'd'
)
\1 # Backref to capt grp 1, consume it
# Passed test, now at POSition + length of capture group 1
更多
您可以从搜索字符串系统地构造正则表达式。我不太了解python,所以这里有一个如何在Perl中完成它的示例。但请注意,字符串越长,查找匹配所需的时间越长,但这应该相当快。
use strict;
use warnings;
my $samp = 'bddaaabcabbad characters abcd matches dcba or abbcdd, but not acd, bbcd or abced';
my $regex = '(?=([abcd]{4,}))(?=[bcd]*a)(?=[acd]*b)(?=[abd]*c)(?=[abc]*d)\1';
while ($samp =~/$regex/xg)
{
print "Found '$1'\n";
}
# Regex construction
# ------------------------------
my @AryofSearchStrs = (
'abcd',
'%^&*',
'hi( )there',
'==-yes',
);
for my $search_string (@AryofSearchStrs)
{
my $str = $search_string;
while( $str =~ s/(.)(.*)\1/$1$2/g) {}
my @astr = split '', $str;
my $rxformed = '(?=([' . quotemeta($str) . ']{' . length($str) . ',}))';
for (my $i = 0; $i < @astr; $i++)
{
$rxformed .=
'(?=['
. join( '', map { quotemeta($_) } @astr[0..($i-1), ($i+1)..$#astr] )
. ']*'
. quotemeta($astr[$i])
. ')';
}
$rxformed .= '\1';
print "\n\n============\n";
print "Search string = '$search_string'\n";
print "Normalized = '$str'\n";
print "Formed regex = \n$rxformed\n";
}
输出
Found 'bddaaabcabbad'
Found 'abcd'
Found 'dcba'
Found 'abbcdd'
============
Search string = 'abcd'
Normalized = 'abcd'
Formed regex =
(?=([abcd]{4,}))(?=[bcd]*a)(?=[acd]*b)(?=[abd]*c)(?=[abc]*d)\1
============
Search string = '%^&*'
Normalized = '%^&*'
Formed regex =
(?=([\%\^\&\*]{4,}))(?=[\^\&\*]*\%)(?=[\%\&\*]*\^)(?=[\%\^\*]*\&)(?=[\%\^\&]*\*)\1
============
Search string = 'hi( )there'
Normalized = 'hi( )ter'
Formed regex =
(?=([hi\(\ \)ter]{8,}))(?=[i\(\ \)ter]*h)(?=[h\(\ \)ter]*i)(?=[hi\ \)ter]*\()(?=[hi\(\)ter]*\ )(?=[hi\(\ ter]*\))(?=[hi\(\ \)er]*t)(?=[hi\(\ \)tr]*e)(?=[hi\(\ \)te]*r)\1
============
Search string = '==-yes'
Normalized = '=-yes'
Formed regex =
(?=([\=\-yes]{5,}))(?=[\-yes]*\=)(?=[\=yes]*\-)(?=[\=\-es]*y)(?=[\=\-ys]*e)(?=[\=\-ye]*s)\1