Question

最快，最优雅的方法来检查正则表达式表示的某些元素是否在给定列表中。

例如：给出一个列表：

newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')

此问题：Regular Expressions: Search in list

list(filter(regex.match,newlist))

给我一个列表

['this','thas']

但是，我只想返回 True或False 。因此，上述方法效率不高，因为它会遍历newlist的所有元素。有没有类似

的方式

'this' in newlist

有效而优雅地检查正则表达式表示的某些元素是否在给定列表中。

Answer 1

根据Loocid的建议，您可以使用any。我会用这样的生成器表达式来做到这一点：

newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')

result = any(regex.match(word) for word in newlist)
print(result) # True

这是map的另一个版本，速度稍快：

result = any(map(regex.match, newlist))

Answer 2

这将评估列表，直到找到第一个匹配项为止。

def search_for_match(list):
    result = False
    for i in newlist:
        if bool(re.match(r"th.s", i)) is True:
            result = True
            break
    return result

或者使之更一般：

def search_for_match(list, pattern):
    result = False
    for i in list:
        if bool(re.match(pattern, i)) is True:
            result = True
            break
    return result

newlist = ['this','thiis','thas','sada']
found = search_for_match(newlist, r"th.s")
print(found) # True

只是踢我通过定时器运行这些。我SOOO丢失：

t = time.process_time()
newlist = ['this','thiis','thas','sada']
search_for_match(newlist, r"th.s")
elapsed_time1 = time.process_time() - t
print(elapsed_time1) # 0.00015399999999998748

t2 = time.process_time()
newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')
result = any(regex.match(word) for word in newlist)
elapsed_time2 = time.process_time() - t2
print(elapsed_time2) # 1.1999999999900979e-05

t3 = time.process_time()
newlist = ['this','thiis','thas','sada']
regex = re.compile('th.s')
result = any(map(regex.match, newlist))
elapsed_time3 = time.process_time() - t3
print(elapsed_time3) # 5.999999999950489e-06

Answer 3

我能想到的（除了使用任何）

next((x for x in newlist if regex.match(x)), False)

如果没有空字符串，则不返回True，但可能可以进行条件测试：）

通过正则表达式检查给定列表是否包含某些元素的最快，最优雅的方法

3 个答案: