我想编写一个匹配包含字母数字字符+下划线的所有单词的正则表达式,但不是那些彼此相邻的两个下划线的正则表达式。实际上我想选择匹配在正则表达式下面但不包含" __"
的单词正则表达式:[A-Za-z](\w){3,}[A-Za-z0-9]
匹配示例:123dfgkjdflg4_aaa
,ad
,12354
不匹配示例:1246asd__
答案 0 :(得分:1)
您可以使用
\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)
使用第一组,请参阅a demo on regex101.com
<小时/> 在Python
中,这可能是
import re
rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')
words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']
nwords = [match.group(1)
for word in words
for match in [rx.search(word)]
if match and match.group(1) is not None]
print(nwords)
# ['ad', '12354', 'test']
或在字符串中:
import re
rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')
string = "a__a 123dfgkjdflg4_ ad 12354 1246asd__ test__test test"
nwords = filter(None, rx.findall(string))
print(nwords)
# ['ad', '12354', 'test']
<小时/> 请注意,您可以在没有正则表达式的情况下完成所有这些操作(可能更快,更少麻烦):
words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']
nwords = [word
for word in words
if "__" not in word and not (word.startswith('_') or word.endswith('_'))]
print(nwords)
# ['ad', '12354', 'test']