我在python中得到以下示例的误报。我试图找出字符串中是否存在关键字。问题是字符串的单词通常用下划线或连字符连接,所以如果关键字在不在单词中时存在,我只想要肯定的结果。它可以用连字符,下划线或任何不是字母的东西来考虑真实的结果。通常它应该用下划线或连字符包围。它也不区分大小写。
test_list = ['server_test', 'server_dev', 'server_uat', 'server_dr', 'server-dr-NA', 'server-DR', 'dress_prod', 'testosterone','uatae','devacurl', 'dev_server']
结果应输出此True / False列表
[True, True, True, True, True, True, False, False, False, False, True]
实现:
key_words = ['uat','dr','test','qa','dev']
for name in test_list:
if any(x in name.lower() for x in key_words):
print('True')
else:
print('False')
结果:
True
True
True
True
True
True
True
True
True
True
在python中有更好的方法吗?
如果不是我如何在python中使用正则表达式?
请记住,这是在一个性能很重要的大型数据集上循环。
答案 0 :(得分:2)
假设:
>>> test_list = ['server_test', 'server_dev', 'server_uat', 'server_dr', 'server-dr-NA', 'server-DR', 'dress_prod', 'testosterone','uatae','devacurl', 'dev_server']
>>> key_words = ['uat','dr','test','qa','dev']
您可以使用re.split
和any
:
>>> [any(word.lower() in key_words for word in re.split(r'[^a-zA-Z]', s))
... for s in test_list]
[True, True, True, True, True, True, False, False, False, False, True]
与目标相同:
>>> tgt=[True, True, True, True, True, True, False, False, False, False, True]
>>> [any(word.lower() in key_words for word in re.split(r'[^a-zA-Z]', s))
... for s in test_list]==tgt
True
答案 1 :(得分:1)
使用基于负面lookbehind的正则表达式。
>>> test_list = ['server_test', 'server_dev', 'server_uat', 'server_dr', 'server-dr-NA', 'server-DR', 'dress_prod', 'testosterone','uatae','devacurl', 'dev_server']
>>> key_words = ['uat','dr','test','qa','dev']
>>> [True if re.search(r'(?i)(?<![a-z])(?:' + '|'.join(key_words) + ')(?![a-z])', i) else False for i in test_list]
[True, True, True, True, True, True, False, False, False, False, True]
>>>
答案 2 :(得分:0)
另一种方法是使用\b
来检测字边界。不幸,
_
被视为单词字符,因此我们需要检测\b
或
_
。
不像Avinash的解决方案那样简洁或高效,但可能更具可读性。
import re
test_list = ['server_test', 'server_dev', 'server_uat', 'server_dr',
'server-dr-NA', 'server-DR', 'dress_prod', 'testosterone',
'uatae', 'devacurl', 'dev_server']
key_words = ['uat','dr','test','qa','dev']
for name in test_list:
for kw in key_words:
regex = r'(\b|_)'+kw+r'(\b|_)'
if re.search(regex, name, re.IGNORECASE):
print('True')
break # exit "for kw" loop
else: # only executed if "for kw" loop exits via exhaustion, not via break
print('False')
答案 3 :(得分:0)
我认为这种模式很容易理解和修改:
import re
pattern = r'.*(^|[^a-z])({names})([^a-z]|$).*'.format(names='|'.join(key_words))
# .*(^|[^a-z])(uat|dr|test|qa|dev)([^a-z]|$).*
for name in test_list:
print(bool(re.search(pattern, name, re.IGNORECASE)))
答案 4 :(得分:0)
import re
key_words = ['uat','dr','test','qa','dev']
test_list = ['server_test', 'server_dev', 'server_uat', 'server_dr', 'server-dr-NA',
'server-DR', 'dress_prod', 'testosterone','uatae','devacurl', 'dev_server']
def check(word):
parts = re.split('[^a-z]', word.lower())
return any(part in key_words for part in parts)
print([check(item) for item in test_list])