使用awk基于公共行合并两个文本文件

时间:2018-05-17 20:24:20

标签: awk

我有2个制表符分隔文本,如下例所示:

小例1:

chr9    35689814    35689922    U2OS_Noco_input_peak_1972   77  .   4.84893 12.13092    7.77385 26
chr9    139793146   139793192   U2OS_Noco_input_peak_2029   49  .   6.30132 9.04134 4.96447 89
chr9    35748701    35748740    U2OS_Noco_input_peak_1974   197 .   10.68892    24.88541    19.76040    127
chr9    85677944    85678064    U2OS_Noco_input_peak_1980   44  .   3.93263 8.45104 4.42192 5
chr9    127631470   127631569   U2OS_Noco_input_peak_1997   148 .   11.29185    19.71885    14.86821    74
chr9    140512429   140512570   U2OS_Noco_input_peak_2045   113 .   9.54787 15.99886    11.37007    71

小例2:

chr9    35748701    35748740    GBA2    0   -   35748701    35749983    0   5   223,269,514,524,276,    
chr9    117880410   117880530   TNC 0   -   117853297   117880536   0   17  
chr9    85677944    85678064    RASEF   0   -   85677782    85678092    0   2   261,310,    0,0,
chr9    35689814    35689922    TPM2    0   -   35689814    35691017    0   6   83,86,96,105,108,1203,  
chr9    139793146   139793192   TRAF2   0   +   139776363   139793192   0   16

我想使用它们制作一个文件。实际上我想要获取前3个字段中的公共行,输出文件将具有前3个字段(其中行在2个文件之间是通用的),而2个文件中的其他列将在这3个列之后。这是输出示例:

chr9    35689814    35689922    U2OS_Noco_input_peak_1972   77  .   4.84893 12.13092    7.77385 26  chr9    35689814    35689922    TPM2    0   -   35689814    35691017    0   6   83,86,96,105,108,1203,  
chr9    35748701    35748740    U2OS_Noco_input_peak_1974   197 .   10.68892    24.88541    19.76040    127 chr9    35748701    35748740    GBA2    0   -   35748701    35749983    0   5   223,269,514,524,276,    
chr9    85677944    85678064    U2OS_Noco_input_peak_1980   44  .   3.93263 8.45104 4.42192 5   chr9    85677944    85678064    RASEF   0   -   85677782    85678092    0   2   261,310,    0,0,

我在awk中尝试以下代码,但没有返回我想要的内容。

awk FS=OFS='\t' infile1.txt infile2.txt '$1 = $1, $2= $2, $3=$3 {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15"\t"$16"\t"$17"\t"$18"\t"$19}' > out.txt

你知道怎么解决吗?

1 个答案:

答案 0 :(得分:0)

$ awk 'BEGIN   {FS=OFS="\t"} 
               {k=$1 FS $2 FS $3} 
       NR==FNR {sub(k,OFS); a[k]=$0; next} 
       k in a  {print $0,a[k]}' file.2 file.1 | column -t

chr9  35689814   35689922   U2OS_Noco_input_peak_1972  77   .  4.84893   12.13092  7.77385   26   TPM2   0  -  35689814   35691017   0  6   83,86,96,105,108,1203,
chr9  139793146  139793192  U2OS_Noco_input_peak_2029  49   .  6.30132   9.04134   4.96447   89   TRAF2  0  +  139776363  139793192  0  16
chr9  35748701   35748740   U2OS_Noco_input_peak_1974  197  .  10.68892  24.88541  19.76040  127  GBA2   0  -  35748701   35749983   0  5   223,269,514,524,276,
chr9  85677944   85678064   U2OS_Noco_input_peak_1980  44   .  3.93263   8.45104   4.42192   5    RASEF  0  -  85677782   85678092   0  2   261,310,                0,0,

您错过了预期输出中的第二条记录。