如何用另一个文件的值替换一个文件的值?

时间:2019-06-12 14:48:31

标签: bash file awk

我解释了我的问题:

我有两个文件,一个看起来像这样(它是一个.tsv文件,每一行的列数不必相同):

OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome

每行以OTUXXXX开头,并且该ID始终位于第一列。

另一个文件是.tsv文件,其中包含3列:

OTU3978 UniRef90_A0A010P3Z8 0.846
OTU0006 UniRef90_A0A010P3Z8 0.855
OTU4929 UniRef90_A0A010P3Z8 0.829
OTU4317 UniRef90_A0A011P550 0.85
OTU4816 UniRef90_A0A011P550 0.807
OTU3902 UniRef90_A0A011QPQ2 0.836
OTU3339 UniRef90_A0A011RKI6 0.835
OTU1359 UniRef90_A0A011RLA7 0.801
OTU2085 UniRef90_A0A011RLA7 0.843
OTU3542 UniRef90_A0A011RLA7 0.866

我想在第二个文件中的每个OTUXXX替换为第一个文件的第二列。例如,它应该给出(用于第二个文件的第二行):

OTU0006UniRef90_A0A010P3Z8 0.855变为:

Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured UniRef90_A0A010P3Z8 0.855

有可能用bash吗?

编辑:

我可以用

替换列
awk 'FNR==NR{a[NR]=$2;next}{$1=a[FNR]}1' f1 f2

但是它不是'automatic',文件1的第一行将与文件2的第一行匹配……根据OTUXXX

1 个答案:

答案 0 :(得分:1)

您非常接近。您可能可以使用以下awk

awk 'NR == FNR {a[$1] = $2; next} $1 in a{$1 = a[$1]} 1' f1 f2

OTU3978 UniRef90_A0A010P3Z8 0.846
Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured UniRef90_A0A010P3Z8 0.855
OTU4929 UniRef90_A0A010P3Z8 0.829
OTU4317 UniRef90_A0A011P550 0.85
OTU4816 UniRef90_A0A011P550 0.807
OTU3902 UniRef90_A0A011QPQ2 0.836
OTU3339 UniRef90_A0A011RKI6 0.835
OTU1359 UniRef90_A0A011RLA7 0.801
OTU2085 UniRef90_A0A011RLA7 0.843
OTU3542 UniRef90_A0A011RLA7 0.866