我搜索并发现了非常相似的问题,但不幸的是,它们都没有适用于我的大型数据集。我想要做的是比较fileA
和fileB,并通过添加fileA
中的重要信息来写出fileB中的匹配行。
这是fileA:
TCC Reg
TGA Reg
TTG Reg
TAG None
AAA None
和fileB:
1 GCT 1883127 302868 16.08
2 GGG 1779189 284102 15.97
3 TCC 1309842 217491 16.60
4 TAA 1384070 168924 12.20
5 TAG 892324 140634 15.76
我想写的输出文件是:
3 TCC 1309842 217491 16.60 Reg
5 TAG 892324 140634 15.76 None
我已单独尝试过grep -f
和awk 'FNR==NR{a[$1];next}($1 in a){print}' fileA fileB > outputfile
,但它无效。
答案 0 :(得分:1)
awk
救援!
$ awk 'NR==FNR {a[$1]=$2; next}
$2 in a {print $0,a[$2]}' fileA fileB
3 TCC 1309842 217491 16.60 Reg
5 TAG 892324 140634 15.76 None
答案 1 :(得分:1)
关注awk也可以帮助你。
awk 'FNR==NR{a[$2]=$0;next} ($1 in a){print a[$1],$2}' fileB fileA
输出如下。
3 TCC 1309842 217491 16.60 Reg
5 TAG 892324 140634 15.76 None
编辑:现在添加非单线形式的解决方案以及解释。
awk '
FNR==NR{ ##Checking condition here if FNR(awk out of box variable) and NR(awk out of the box variable) values are equal.
##Both FNR and NR indicates the number of lines, only difference between them is that FNR value get RESET whenever a new Input_file started reading.
##On other end NR value will be keep increasing till al the Input_file(s) are read. So this condition will be TRUE only when very first Input_file
##is being read.
a[$2]=$0;##Creating an array here named a whose index is $2(second field) of current line of file named fileB and keeping its value as current line value.
next ##next is awk out of the box variable which will skpi all further statements for the current line.
}
($1 in a){ ##Now this condition will be always executed when first Input_file is done with reading and second Input_file is getting read.
##Checking here if $1(first field) of current line of Input_file(fileA) is present in array a, if yes then do following.
print a[$1],$2 ##Printing the value of array a whose index is $1(current line) and $2 of current line as per your requirement.
}
' fileB fileA ##Mentioning the Input_file(s) fileA and fileB here.