Question

我正在python中编写一个解决谜题的应用程序。我正在搜索一些文本中的字符组合，如果我有一组字符[abcd]，那么我需要在文本中找到只包含字符abcd的子字符串，并且每个字符串必须至少包含一个字符串字符 - 以便字符abcd匹配dcba或abbcdd，但不匹配acd，bbcd或abced。如果使用正则表达式[abcd] +那么我将得到不包含每个字符的子串

Answer 1

为什么要在这里使用正则表达式？

def hasChars(search_string, chars):
    return all(x in search_string for x in chars)

>>> hasChars('aaabcd', 'abc')
True

Answer 2

如果字符串必须至少包含abcd，但可以包含其他字符串，那么这将起作用

(?=.*a)(?=.*b)(?=.*c)(?=.*d)

如果，他们只能包含abcd，那么这可能更好

^(?=.*a)(?=.*b)(?=.*c)(?=.*d)[abcd]+$

<强>更新

要回答你的问题，如果你正在寻找一个浮动版本，这可以做你想要的：

(?=([abcd]{4,}))(?=[bcd]*a)(?=[acd]*b)(?=[abd]*c)(?=[abc]*d)\1

扩展：

      # At POSition
(?=                # Lookahead
   (                     # Capture grp 1
      [abcd]{4,}            # Get 4 or more (greedy) 'a' or 'b' or 'c' or 'd' characters
   )
)
(?=                # Lookahead, check for 'a' (still at POS) 
   [bcd]*a               # 0 or more [bcd]'s then 'a'
)
(?=                # Lookahead, check for 'b' (still at POS) 
   [acd]*b               # 0 or more [acd]'s then 'b'
)
(?=                # Lookahead, check for 'c' (still at POS) 
   [abd]*c               # 0 or more [abd]'s then 'c'
)
(?=                # Lookahead, check for 'd' (still at POS)
   [abc]*d               # 0 or more [abc]'s then 'd'
)
\1                 # Backref to capt grp 1, consume it

    # Passed test, now at POSition + length of capture group 1

更多

您可以从搜索字符串系统地构造正则表达式。我不太了解python，所以这里有一个如何在Perl中完成它的示例。但请注意，字符串越长，查找匹配所需的时间越长，但这应该相当快。

use strict;
use warnings;

my $samp  = 'bddaaabcabbad characters abcd matches dcba or abbcdd, but not acd, bbcd or abced';

my $regex = '(?=([abcd]{4,}))(?=[bcd]*a)(?=[acd]*b)(?=[abd]*c)(?=[abc]*d)\1';

while ($samp =~/$regex/xg)
{
    print "Found '$1'\n";
}

# Regex construction
# ------------------------------
my @AryofSearchStrs = (
 'abcd',
 '%^&*',
 'hi( )there',
 '==-yes',
);

for my $search_string (@AryofSearchStrs)
{
   my $str = $search_string;
   while( $str =~ s/(.)(.*)\1/$1$2/g) {}

   my @astr = split '', $str;

   my $rxformed = '(?=([' . quotemeta($str) . ']{' . length($str) . ',}))';
   for (my $i = 0; $i < @astr; $i++)
   {
      $rxformed .=
       '(?=['
       . join( '', map { quotemeta($_) } @astr[0..($i-1), ($i+1)..$#astr] )
       . ']*'
       . quotemeta($astr[$i])
       . ')';
   }
   $rxformed .= '\1';

   print "\n\n============\n";
   print "Search string = '$search_string'\n";
   print "Normalized    = '$str'\n";
   print "Formed regex  = \n$rxformed\n";
}

输出

Found 'bddaaabcabbad'
Found 'abcd'
Found 'dcba'
Found 'abbcdd'


============
Search string = 'abcd'
Normalized    = 'abcd'
Formed regex  =
(?=([abcd]{4,}))(?=[bcd]*a)(?=[acd]*b)(?=[abd]*c)(?=[abc]*d)\1


============
Search string = '%^&*'
Normalized    = '%^&*'
Formed regex  =
(?=([\%\^\&\*]{4,}))(?=[\^\&\*]*\%)(?=[\%\&\*]*\^)(?=[\%\^\*]*\&)(?=[\%\^\&]*\*)\1


============
Search string = 'hi( )there'
Normalized    = 'hi( )ter'
Formed regex  =
(?=([hi\(\ \)ter]{8,}))(?=[i\(\ \)ter]*h)(?=[h\(\ \)ter]*i)(?=[hi\ \)ter]*\()(?=[hi\(\)ter]*\ )(?=[hi\(\ ter]*\))(?=[hi\(\ \)er]*t)(?=[hi\(\ \)tr]*e)(?=[hi\(\ \)te]*r)\1


============
Search string = '==-yes'
Normalized    = '=-yes'
Formed regex  =
(?=([\=\-yes]{5,}))(?=[\-yes]*\=)(?=[\=yes]*\-)(?=[\=\-es]*y)(?=[\=\-ys]*e)(?=[\=\-ye]*s)\1

正则表达式至少匹配每个字符一次

2 个答案: