如何针对以下情况编写正则表达式

时间:2019-12-01 12:33:40

标签: regex matching

  • 至少应包含4个字符
  • 应以字符[a-zA-Z]开头
  • 不能以_结尾
  • 整个单词中最多只能包含1个_
  • 该单词可以包含[a-zA-Z0-9]

我尝试了以下正则表达式:

^[a-zA-Z][a-zA-Z0-9]*_?[a-zA-Z0-9]*[^_]$

但是我想知道是否可以将其缩小,并且我不确定如何设置“至少4个字符”约束。

3 个答案:

答案 0 :(得分:1)

您可以通过省略最后一个否定的字符类[^_]来缩短样式,因为这将匹配_以外的任何字符并添加正向超前(?=.{4})以从一开始就断言4个字符的字符串:

^(?=.{4})[a-zA-Z][a-zA-Z0-9]*_?[a-zA-Z0-9]+$
  • ^字符串的开头
  • (?=.{4})声明4个字符
  • [a-zA-Z]匹配单个字符a-zA-Z
  • [a-zA-Z0-9]*_?[a-zA-Z0-9]+匹配可选的_以及左侧和/或右侧列出的任何一个
  • $字符串结尾

Regex demo

答案 1 :(得分:0)

这可以完成工作:

^(?i)(?=\w{4,})[a-z]+_?[^\W_]+$

说明:

^                   # beginning of line
  (?i)              # case insensitive
  (?=\w{4,})        # positive lookahead, make sure we have 4 or more word character
  [a-z]+            # 1 or more alphabet
  _?                # optional underscore
  [^\W_]+           # 1 or more alphanum
$

Demo & explanation

答案 2 :(得分:0)

可以结合使用正向和负向超前进行验证:

import re

tests = [
    'abc', # too short
    '_bcde', # starts with wrong character
    'abcd_', # last character is '_'
    'a_b_cd' # too many '_',
    'abc&cd', # illegal character '&'
    'ab_cd' # OK
]


regex = re.compile(r"""
    ^               # matches start of the line
    (?=.{4})        # positive lookahead: matches any 4 characters (string must be at least 4 characters long)
    (?=[a-zA-Z])    # positive lookahead: next character must be [a-zA-Z]
    (?!.*_$)        # negative lookahead: last character cannot be `_`
    (?!.*_.*_)      # negative lookahead: cannot match more than one `_`
    [a-zA-Z_]+      # looking for one or more of these
    $               # looking for the end of the string
""", re.X)

for test in tests:
    m = regex.match(test)
    print(test, 'Match' if m else 'No match')

打印:

abc No match
_bcde No match
abcd_ No match
a_b_cdabc&cd No match
ab_cd Match

See Regex Demo