我想检查第1列中的数字是否等于第2列第1列应该开始,结束时使用以下格式

时间:2017-08-29 19:09:21

标签: awk

我想检查第1列中的数字是否等于第2列,最后第1列应以"ABC"开头,以"DEF"结尾,但有时也会以{{1}结尾"DEFZ#""ABC"######"DEF"之间的数字应与第二列匹配。请有人帮我。

我的意见

"DEFZ#"

输出应为:

ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC95678DEF|45678|23132331331| 
ABC87887DEF|86187|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|

我试图使用以下内容,但它无效

ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC89043DEFZ1|89043|23132331331|    
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|

有人可以帮我吗? 提前致谢

1 个答案:

答案 0 :(得分:0)

awk -v FS="|" '{tmpvar=$1;gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)}tmpvar == $2' infile

<强>输入

akshay@db-3325:/tmp$ cat infile
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC95678DEF|45678|23132331331|
ABC87887DEF|86187|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|

<强>输出

akshay@db-3325:/tmp$ awk -v FS="|" '{tmpvar = $1; gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)} tmpvar == $2' infile
ABC12345DEF|12345|23132331331|
ABC12345DEFZ1|12345|23132331331|
ABC12345DEFZ2|12345|23132331331|
ABC89043DEF|89043|23132331331|
ABC89043DEFZ1|89043|23132331331|
ABC89043DEFZ2|89043|23132331331|
ABC89043DEFZ3|89043|23132331331|

<强>解释

awk -v FS="|" '{                  # call awk set field separator |
                 tmpvar = $1;     # save first field contents in variable tmpvar

                 # substitute first ABC or DEF 
                 # which can be followed by Z and numbers 
                 # from variable with null globally
                 # so that tmpvar will just have numbers which is between abc and def*
                 gsub(/^ABC|DEF(Z[0-9]+)?$/,"",tmpvar)
               } 
               # if tmpvar is equal to second field then
               # print current record/row/line, thats boolean true, print $0
               tmpvar == $2
              ' infile

Online regex

  • /^ABC|DEF(Z[0-9]+)?/第一替代^ABC ^断言字符串ABC开头的位置字面匹配字符ABC(区分大小写)

  • 第二个替代DEF(Z[0-9]+)? DEF字面匹配字符DEF(区分大小写)第一个捕获组(Z[0-9]+)? ?量词 - 零和之间的匹配一次,尽可能多次,根据需要回馈(贪婪)Z字面匹配字符Z(区分大小写)匹配下面列表中的单个字符[0-9]+

  • +量词 - 在一次和无限次之间匹配,尽可能多次,根据需要回馈(贪婪)