合并某些列匹配的文件

时间:2019-02-21 20:28:06

标签: awk

如果两个文件中的第1,2,3列相等,则匹配它们。

对于列匹配的文件,将file1中第4列的值写入file2 如果不匹配,则写NA

文件1

31431 37150 100 10100
31431 37201 100 12100
31431 37471 100 14100

文件2

31431 37150 100 14100
31431 37131 100 14100
31431 37201 100 14100
31431 37478 100 14100
31431 37471 100 14100

所需的输出:

31431 37150 100 14100 10100
31431 37131 100 14100 NA
31431 37201 100 14100 12100
31431 37478 100 14100 NA
31431 37471 100 14100 14100

我尝试过

awk '
FNR==NR{
  a[$1 $2 $3]=$4
  next
}
($1 in a){
  $1=a[$1]
  found=1
}
{
  $0=found==1?$0",":$0",NA"
  sub(/^...../,"&,")
  $1=$1
  found=""
}
1
' FS=" " file1 FS=" " OFS="," file2

2 个答案:

答案 0 :(得分:2)

$ awk '      {k=$1 FS $2 FS $3} 
     NR==FNR {a[k]=$4; next} 
             {$(NF+1)=k in a?a[k]:"NA"}1' file1 file2

31431 37150 100 14100 10100
31431 37131 100 14100 NA
31431 37201 100 14100 12100
31431 37478 100 14100 NA
31431 37471 100 14100 14100

答案 1 :(得分:2)

请您尝试以下。

awk 'FNR==NR{a[$1,$2,$3]=$NF;next} {print $0,($1,$2,$3) in a?a[$1,$2,$3]:"NA"}' Input_file1  Input_file2

或者按照Ed ir的评论为字段创建变量。

awk '{var=$1 OFS $2 OFS $3} FNR==NR{a[var]=$NF;next} {print $0,var in a?a[var]:"NA"}' Input_file1  Input_file2

输出如下。

31431 37150 100 14100 10100
31431 37131 100 14100 NA
31431 37201 100 14100 12100
31431 37478 100 14100 NA
31431 37471 100 14100 14100

说明: 现在添加上述代码的说明。

awk '
{
  var=$1 OFS $2 OFS $3              ##Creating a variable named var whose value is first, second ansd third field of current lines of Input_file1 and Input_file2.
}
FNR==NR{                            ##Checking condition FNR==NR which will be TRUE when first Input_file1 is being read.
  a[var]=$NF                        ##Creating an array named a whose index is variable var and value is $NF of curent line.
  next                              ##next keyword will skip all further lines from here.
}
{
  print $0,var in a?a[var]:"NA"     ##Printing current line value and along with that printing either value of a[var] or NA based upon if var present in array a then print a[var] else print NA.
}'  Input_file1  Input_file2        ##Mentioning Input_file names here.