Question

我正在搜索格式为XXXXX_XXXXX或XXXXXX_XXXXX或XXXXXX的字符串，其中X是字母数字。

因此“_”之前的字符串长度为5或6个字符，“_”之后的字符串总是五个，或者只有6个字符长，没有任何下划线。我在Python编码。

非常感谢任何帮助。

Answer 1

这是怎么回事？

([a-zA-Z0-9]{5,6}_[a-zA-Z0-9]{5})|[a-zA-Z0-9]{6}

完整代码示例：

import re
pat = re.compile(r'^(([a-zA-Z0-9]{5,6}_[a-zA-Z0-9]{5})|[a-zA-Z0-9]{6})$')
print pat.match('xxxxx_xxxxx') is not None    # True, 5 chars, underscore, 5 chars
print pat.match('xxxxxx_xxxxx') is not None    # True, 6 chars, underscore, 5 chars
print pat.match('xxxxxx') is not None    # True, 6 chars

注意：我以前写的这个，没有意识到python不支持POSIX字符类

([[:alnum:]]{5,6}_[[:alnum:]]{5})|[[:alnum:]]{6}

Answer 2

导入re 然后：

re.match("[a-zA-Z0-9]{5,6}(_[a-zA-Z0-9]{5})?", c).group()

注意，预定义的\ w得到“_”作为alphanum，所以你不能在这里使用它。

Answer 3

import re

regex = re.compile("[[:alnum:]]{5,6}_[[:alnum:]]{5})|[[:alnum:]]{6}")
here = re.search(regex, "your string")
if here:
     #pattern has been found

Answer 4

如果Python不认为开始和结束边界条件是默认的，则或者，如果在字符串中搜索字符串，您可能需要考虑边界条件否则，XXXXXXXXXXXXXXXXXXXXXX_XXXXXXXXXXXXXXXXXXXXXXX也将匹配。

/ (?: ^ | [\W_] )              # beginning of line or non-alphameric
  (?:
       [^\W_]{5,6}_[^\W_]{5}   # 5-6 alphameric's, underscore, 5 alphameric's
    |  [^\W_]{6}               # or, 6 alphameric's
  )
  (?: [\W_] | $)               # non-alphameric or end of line
/

Answer 5

我非常喜欢MichałŠrajer的回答，但正如已经指出的那样，他的版本也只匹配5个alnum字符（我们不想要）。

这是他的版本的编辑，以弥补这一点：

re.match("[a-zA-Z0-9]{5}(([a-zA-Z0-9]?_[a-zA-Z0-9]{5})?|[a-zA-Z0-9])", c)

虽然其他一些答案可能更具可读性......

使用正则表达式搜索一行中的字符串

5 个答案: