我有以下三个文件:
list1.txt
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
list2.txt
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004
AB0005
AB0006 COG5621
AB0007 COG3127
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
list3.txt
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 NOG84439
AB0006 COG5621
AB0007 COG0577
AB0008 COG1136
AB0009 COG0071
AB0010 NOG218375
我希望用其他列表的column2中的值填充缺失值(来自第一列AB00[01-10]
),其中list1具有最高优先级,list2具有最高优先级,list3具有最低优先级。
所以期望的输出是:
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
意味着list1应该作为基础,如果缺少值,则从list2获取它,如果list2中也缺少该值,则从list3中取出。
答案 0 :(得分:2)
按照优先顺序的相反顺序处理文件,优先级越高,胜利越高。使用NF>1
可确保忽略具有缺失值的行。
$ awk 'BEGIN {FS=OFS="\t"} NF > 1 {a[$1] = $2} END {for (i in a) print i, a[i]}' list3.txt list2.txt list1.txt | sort
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212
答案 1 :(得分:0)
短 加入 + awk 组合:
join -a2 list1.txt list2.txt | join -a2 - list3.txt | awk '{print $1,$2}' OFS='\t'
输出:
AB0001 COG0593
AB0002 COG0592
AB0003 COG1195
AB0004 COG5146
AB0005 COG1005
AB0006 COG5621
AB0007 COG4591
AB0008 COG1136
AB0009 COG0071
AB0010 COG3212