匹配并编写比较两个文件的模式?

时间:2017-10-27 01:09:43

标签: linux awk grep

我搜索并发现了非常相似的问题,但不幸的是,它们都没有适用于我的大型数据集。我想要做的是比较fileA和fileB,并通过添加fileA中的重要信息来写出fileB中的匹配行。

这是fileA:

TCC    Reg  
TGA    Reg  
TTG    Reg  
TAG    None  
AAA    None

和fileB:

1       GCT    1883127 302868  16.08  
2       GGG    1779189 284102  15.97  
3       TCC    1309842 217491  16.60  
4       TAA    1384070 168924  12.20  
5       TAG    892324  140634  15.76  

我想写的输出文件是:

3       TCC    1309842 217491  16.60  Reg          
5       TAG    892324  140634  15.76  None

我已单独尝试过grep -fawk 'FNR==NR{a[$1];next}($1 in a){print}' fileA fileB > outputfile,但它无效。

2 个答案:

答案 0 :(得分:1)

awk救援!

$ awk 'NR==FNR {a[$1]=$2; next} 
       $2 in a {print $0,a[$2]}' fileA fileB

3       TCC    1309842 217491  16.60   Reg
5       TAG    892324  140634  15.76   None

答案 1 :(得分:1)

关注awk也可以帮助你。

awk 'FNR==NR{a[$2]=$0;next} ($1 in a){print a[$1],$2}' fileB fileA

输出如下。

3       TCC    1309842 217491  16.60   Reg
5       TAG    892324  140634  15.76   None

编辑:现在添加非单线形式的解决方案以及解释。

awk '
FNR==NR{ ##Checking condition here if FNR(awk out of box variable) and NR(awk out of the box variable) values are equal.
         ##Both FNR and NR indicates the number of lines, only difference between them is that FNR value get RESET whenever a new Input_file started reading.
         ##On other end NR value will be keep increasing till al the Input_file(s) are read. So this condition will be TRUE only when very first Input_file
         ##is being read.
  a[$2]=$0;##Creating an array here named a whose index is $2(second field) of current line of file named fileB and keeping its value as current line value.
  next     ##next is awk out of the box variable which will skpi all further statements for the current line.
}
($1 in a){ ##Now this condition will be always executed when first Input_file is done with reading and second Input_file is getting read.
           ##Checking here if $1(first field) of current line of Input_file(fileA) is present in array a, if yes then do following.
  print a[$1],$2 ##Printing the value of array a whose index is $1(current line) and $2 of current line as per your requirement.
}
' fileB fileA ##Mentioning the Input_file(s) fileA and fileB here.