Question

我想编写一个匹配包含字母数字字符+下划线的所有单词的正则表达式，但不是那些彼此相邻的两个下划线的正则表达式。实际上我想选择匹配在正则表达式下面但不包含＆＃34; __＆＃34;

的单词

正则表达式：[A-Za-z](\w){3,}[A-Za-z0-9]

匹配示例：123dfgkjdflg4_aaa，ad，12354

不匹配示例：1246asd__

Answer 1

您可以使用

\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)

使用第一组，请参阅a demo on regex101.com

<小时/> 在Python中，这可能是

import re

rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')

words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']

nwords = [match.group(1) 
            for word in words 
            for match in [rx.search(word)]
            if match and match.group(1) is not None]

print(nwords)
# ['ad', '12354', 'test']

或在字符串中：

import re

rx = re.compile(r'\b[a-z0-9A-Z]*__\w*\b|(\b[A-Za-z0-9]\w*[A-Za-z0-9]\b)')

string = "a__a 123dfgkjdflg4_ ad 12354 1246asd__ test__test test"

nwords = filter(None, rx.findall(string))
print(nwords)
# ['ad', '12354', 'test']

<小时/> 请注意，您可以在没有正则表达式的情况下完成所有这些操作（可能更快，更少麻烦）：

words = ['a__a', '123dfgkjdflg4_', 'ad', '12354', '1246asd__', 'test__test', 'test']

nwords = [word 
            for word in words
            if "__" not in word and not (word.startswith('_') or word.endswith('_'))]
print(nwords)
# ['ad', '12354', 'test']

正则表达式匹配没有两个下划线的单词

1 个答案: