在下划线后寻找信件的正则表达式

时间:2010-09-08 13:45:32

标签: regex unix grep

我想使用unix命令编写一个正则表达式,该命令会将不确认的所有字符串标识为以下格式

First Leter is UpperCase    
Followed by any number of letters
Underscore
Followed by UpperCase Letter
Followed by any number of letters
Underscore
and so on .............

下划线数量可变

So valid ones are                                     Invalid ones are
Alpha_Beta_Gamma                                      alph_Beta_Gamma
Alpha_Beta_Gamma_Delta                                Alpha_beta_Gamma
Alppha_Beta                                           Alpha_beta
Aliph_Theta_Pi_Chi_Ming                               Alpha_theta_Pi_Chi_Ming

1 个答案:

答案 0 :(得分:4)

grep有一个-v选项可以反转匹配(即返回不匹配的行)。 -E选项将grep置于extended-regexp模式(允许+并且括号在模式中未转义。)

您可以使用的模式是(为清晰起见而分解):

^              # beginning of string
  [A-Z]        # a single uppercase letter
  [a-z]*       # zero or more lowercase letters
  (            # start a group
    _          # an underscore
    [A-Z]      # a single uppercase letter
    [a-z]*     # zero or more lowercase letters
  )+           # close the group and it can appear one or more times
$              # end of string

假设您有一个文件test.dat,其中包含您问题中的8个字符串:

grep -E -v "^[A-Z][a-z]*(_[A-Z][a-z]*)+$" test.dat

返回:

alph_Beta_Gamma
Alpha_beta_Gamma
Alpha_beta
Alpha_theta_Pi_Chi_Ming