awk根据字段匹配将特定字段添加到文件中

时间:2017-07-06 12:39:29

标签: awk

我正在尝试使用awk添加$4$5$6字段以及tab-delimeted file2中的标头file2 $2中的行$3中存在匹配的file1值。我在每一行添加了评论以及我对最新情况的理解。谢谢你:)。

file1 tab-delimeted

ID  Name    Number
0-0 A,A 123456
2-2 B,B 789123
4-4 C,C 456789

file2 tab-delimeted

ID  Number  Name    Info1   Info2   Info3   Info4
0-0 123456  A,A aaaaa   bbbbb   ccccc   eeeee
1-1 111111  Z,Z aaa bbb ccc eee
2-2 789123  B,B aaaaa   bb,bbb  ccccc   eeeee
3-3 222222  Y,Y aaa bb,bb   cc  e
4-4 456789  C,C aaa bb  ccc eeee

所需的输出 tab-delimeted

ID  Name    Number  Info1   Info2   Info3
0-0 A,A 123456  aaaaa   bbbbb   ccccc
2-2 B,B 789123  aaaaa   bb,bbb  ccccc
4-4 C,C 456789  aaa bb  ccc

awk

awk -F"\t" '$3 in a{  # read $3 value of file1 into array a
 a[$3]=a[$2];   # match $3 array a from file1 with $2 value in file2
  next   # process next line
 }  # close block
  { print $1,$2,a[$2],$4,$5,$6  # print desired output
 }  # close block
    END {  # start block
 for ( i in a) {   # create for loop i to print
     print a[i]  # print for each matching line in i
  }  # close block
}' file1 file2

2 个答案:

答案 0 :(得分:2)

$ awk -v OFS="\t" 'NR==FNR{a[$3]=$0;next}$2 in a{print a[$2],$4,$5,$6}' file1 file2
ID      Name    Number  Info1   Info2   Info3
0-0     A,A     123456  aaaaa   bbbbb   ccccc
2-2     B,B     789123  aaaaa   bb,bbb  ccccc
4-4     C,C     456789  aaa     bb      ccc

说明:

$ awk -v OFS="\t" '         # tab as OFS also
NR==FNR{                    # for file1
    a[$3]=$0                # hash $0 to a using $3 as key
    next                    # no further processing for this record
}
$2 in a {                   # if $2 found in a
    print a[$2],$4,$5,$6    # output as requested
}' file1 file2              # mind the file order

答案 1 :(得分:1)

尝试:再一次尝试读取file2然后再读取file1。

awk -F"\t" 'FNR==NR{a[$1,$3,$2]=$4 OFS $5 OFS $6;next} (($1,$2,$3) in a){print $1,$2,$3,a[$1,$2,$3]}' OFS="\t" file2 file1

将在几分钟内添加解释。

编辑:添加非单线形式的解决方案以及解释。

awk -F"\t" 'FNR==NR{                              ####Checking condition FNR==NR which will be only true when first file named file2 is being read. Because FNR and NR both represent the number of lines for a Input_file, only difference is FNR value will be RESET whenever it is starting to read next Input_file and NR value will be keep on increasing till all the Input_files are being read.
                a[$1,$3,$2]=$4 OFS $5 OFS $6;     ####Creating an array named a whose index is $1,$3 and $2 and value is $4,$5 and $6. Where OFS is output field separator, whose default value is space.
                next                              ####next is awk built-in keyword which will NOT allow cursor to go further and will skip all next statements.
            }
     (($1,$2,$3) in a){                           ####Checking a condition which will be only checked when 2nd Input_file is being read. So checking here if $1, $2 and $3 is present in array a, then do following.
                        print $1,$2,$3,a[$1,$2,$3]####print the value of $1, $2,$3 and array a value whose index is $1,$2 and $3.
                      }
    ' OFS="\t" file2 file1                        ####Mentioning the Input_files here.