我正在寻找一种匹配2个列表的有效方法,一个包含完整信息,另一个包含通配符。我已经能够使用固定长度的通配符来做到这一点,但我现在正尝试使用可变长度的通配符。
因此:
match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )
只要两个列表中的所有元素的顺序相同,就会返回True。
我正在处理对象列表,但为了简单起见,上面使用了字符串。
答案 0 :(得分:4)
[编辑以证明OP对比较对象的评论后没有RE]
看起来你没有使用字符串,而是比较对象。因此我给出了一个明确的算法 - 正则表达式为字符串量身定制了一个很好的解决方案,不要误解我的意思,但从你所说的评论到你的问题,似乎一个明确的,简单的算法可能会让事情变得更容易
事实证明,这可以通过比this previous answer更简单的算法来解决:
def matcher (l1, l2):
if (l1 == []):
return (l2 == [] or l2 == ['*'])
if (l2 == [] or l2[0] == '*'):
return matcher(l2, l1)
if (l1[0] == '*'):
return (matcher(l1, l2[1:]) or matcher(l1[1:], l2))
if (l1[0] == l2[0]):
return matcher(l1[1:], l2[1:])
else:
return False
关键的想法是,当您遇到通配符时,您可以探索两个选项:
答案 1 :(得分:1)
以下内容如何:
import re
def match(pat, lst):
regex = ''.join(term if term != '*' else '.*' for term in pat) + '$'
s = ''.join(lst)
return re.match(regex, s) is not None
print match( ['A', 'B', '*', 'D'], ['A', 'B', 'C', 'C', 'C', 'D'] )
它使用正则表达式。通配符(*
)已更改为.*
,所有其他搜索字词保持原样。
有一点需要注意的是,如果您的搜索字词可能包含正则表达式语言中具有特殊含义的内容,则需要对其进行适当的转义。在match
函数中处理这个很容易,我只是不确定这是否是你需要的。
答案 2 :(得分:1)
我建议将['A', 'B', '*', 'D']
转换为'^AB.*D$'
,将['A', 'B', 'C', 'C', 'C', 'D']
转换为'ABCCCD'
,然后使用re
模块(正则表达式)进行匹配
如果列表中的元素每个只有一个字符,并且它们是字符串,则此选项有效。
类似的东西:
import(re)
def myMatch( patternList, stringList ):
# convert pattern to flat string with wildcards
# convert AB*D to valid regex ^AB.*D$
pattern = ''.join(patternList)
regexPattern = '^' + pattern.replace('*','.*') + '$'
# perform matching
against = ''.join(stringList) # convert ['A','B','C','C','D'] to ABCCCD
# return whether there is a match
return (re.match(regexPattern,against) is not None)
如果列表包含数字或单词,请选择您不希望出现的字符,例如#
。然后['Aa','Bs','Ce','Cc','CC','Dd']
可以转换为Aa#Bs#Ce#Cc#CC#Dd
,通配符模式['Aa','Bs','*','Dd']
可以转换为^Aa#Bs#.*#Dd$
,然后执行匹配。
实际上,这只是意味着''.join(...)
中的所有'#'.join(...)
变为myMatch
。
答案 3 :(得分:0)
我同意有关此问题的评论可以使用正则表达式完成。例如:
import re
lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = ['A', 'B', 'C+', 'D']
print re.match(''.join(pattern), ''.join(lst)) # Will successfully match
编辑:正如评论所指出的那样,事先可能只知道某些角色必须匹配,而不是哪一个角色。在这种情况下,正则表达式仍然有用:
import re
lst = ['A', 'B', 'C', 'C', 'C', 'D']
pattern = r'AB(\w)\1*D'
print re.match(pattern, ''.join(lst)).groups()
答案 4 :(得分:0)
我同意,正则表达式通常是这种方式。这个算法有效,但它看起来很复杂。写作虽然很有趣。
def match(listx, listy):
listx, listy = map(iter, (listx, listy))
while 1:
try:
x = next(listx)
except StopIteration:
# This means there are values left in listx that are not in listy.
try:
y = next(listy)
except StopIteration:
# This means there are no more values to be compared in either
# listx or listy; since no exception was raied elsewhere, the
# lists match.
return True
else:
# This means that there are values in listy that are not in
# listx.
return False
else:
try:
y = next(listy)
except StopIteration:
# Similarly, there are values in listy that aren't in listx.
return False
if x == y:
pass
elif x == '*':
try:
# Get the value in listx after '*'.
x = next(listx)
except StopIteration:
# This means that listx terminates with '*'. If there are any
# remaining values of listy, they will, by definition, match.
return True
while 1:
if x == y:
# I didn't shift to the next value in listy because I
# assume that a '*' matches the empty string and well as
# any other.
break
else:
try:
y = next(listy)
except StopIteration:
# This means there is at least one remaining value in
# listx that is not in listy, because listy has no
# more values.
return False
else:
pass
# Same algorithm as above, given there is a '*' in listy.
elif y == '*':
try:
y = next(listy)
except StopIteration:
return True
while 1:
if x == y:
break
else:
try:
x = next(listx)
except StopIteration:
return False
else:
pass
答案 5 :(得分:0)
我有这段c ++代码似乎正在做你想做的事情(输入是字符串而不是字符数组,但你无论如何都要调整东西)。
bool Utils::stringMatchWithWildcards (const std::string str, const std::string strWithWildcards)
PRINT("Starting in stringMatchWithWildcards('" << str << "','" << strWithWildcards << "')");
const std::string wildcard="*";
const bool startWithWildcard=(strWithWildcards.find(wildcard)==0);
int pos=strWithWildcards.rfind(wildcard);
const bool endWithWildcard = (pos!=std::string::npos) && (pos+wildcard.size()==strWithWildcards.size());
// Basically, the point is to split the string with wildcards in strings with no wildcard.
// Then search in the first string for the different chunks of the second in the correct order
std::vector<std::string> vectStr;
boost::split(vectStr, strWithWildcards, boost::is_any_of(wildcard));
// I expected all the chunks in vectStr to be non-empty. It doesn't seem the be the case so let's remove them.
vectStr.erase(std::remove_if(vectStr.begin(), vectStr.end(), std::mem_fun_ref(&std::string::empty)), vectStr.end());
// Check if at least one element (to have first and last element)
if (vectStr.empty())
{
const bool matchEmptyCase = (startWithWildcard || endWithWildcard || str.empty());
PRINT("Match " << (matchEmptyCase?"":"un") << "successful (empty case) : '" << str << "' and '" << strWithWildcards << "'");
return matchEmptyCase;
}
// First Element
std::vector<std::string>::const_iterator vectStrIt = vectStr.begin();
std::string aStr=*vectStrIt;
if (!startWithWildcard && str.find(aStr, 0)!=0) {
PRINT("Match unsuccessful (beginning) : '" << str << "' and '" << strWithWildcards << "'");
return false;
}
// "Normal" Elements
bool found(true);
pos=0;
std::vector<std::string>::const_iterator vectStrEnd = vectStr.end();
for ( ; vectStrIt!=vectStrEnd ; vectStrIt++)
{
aStr=*vectStrIt;
PRINT( "Searching '" << aStr << "' in '" << str << "' from " << pos);
pos=str.find(aStr, pos);
if (pos==std::string::npos)
{
PRINT("Match unsuccessful ('" << aStr << "' not found) : '" << str << "' and '" << strWithWildcards << "'");
return false;
} else
{
PRINT( "Found at position " << pos);
pos+=aStr.size();
}
}
// Last Element
const bool matchEnd = (endWithWildcard || str.rfind(aStr)+aStr.size()==str.size());
PRINT("Match " << (matchEnd?"":"un") << "successful (usual case) : '" << str << "' and '" << strWithWildcards);
return matchEnd;
}
/* Tested on these values :
assert( stringMatchWithWildcards("ABC","ABC"));
assert( stringMatchWithWildcards("ABC","*"));
assert( stringMatchWithWildcards("ABC","*****"));
assert( stringMatchWithWildcards("ABC","*BC"));
assert( stringMatchWithWildcards("ABC","AB*"));
assert( stringMatchWithWildcards("ABC","A*C"));
assert( stringMatchWithWildcards("ABC","*C"));
assert( stringMatchWithWildcards("ABC","A*"));
assert(!stringMatchWithWildcards("ABC","BC"));
assert(!stringMatchWithWildcards("ABC","AB"));
assert(!stringMatchWithWildcards("ABC","AB*D"));
assert(!stringMatchWithWildcards("ABC",""));
assert( stringMatchWithWildcards("",""));
assert( stringMatchWithWildcards("","*"));
assert(!stringMatchWithWildcards("","ABC"));
*/
这不是我真正引以为豪的事情,但它似乎到目前为止仍在努力。我希望你能发现它很有用。