我有2个制表符分隔文本,如下例所示:
小例1:
chr9 35689814 35689922 U2OS_Noco_input_peak_1972 77 . 4.84893 12.13092 7.77385 26
chr9 139793146 139793192 U2OS_Noco_input_peak_2029 49 . 6.30132 9.04134 4.96447 89
chr9 35748701 35748740 U2OS_Noco_input_peak_1974 197 . 10.68892 24.88541 19.76040 127
chr9 85677944 85678064 U2OS_Noco_input_peak_1980 44 . 3.93263 8.45104 4.42192 5
chr9 127631470 127631569 U2OS_Noco_input_peak_1997 148 . 11.29185 19.71885 14.86821 74
chr9 140512429 140512570 U2OS_Noco_input_peak_2045 113 . 9.54787 15.99886 11.37007 71
小例2:
chr9 35748701 35748740 GBA2 0 - 35748701 35749983 0 5 223,269,514,524,276,
chr9 117880410 117880530 TNC 0 - 117853297 117880536 0 17
chr9 85677944 85678064 RASEF 0 - 85677782 85678092 0 2 261,310, 0,0,
chr9 35689814 35689922 TPM2 0 - 35689814 35691017 0 6 83,86,96,105,108,1203,
chr9 139793146 139793192 TRAF2 0 + 139776363 139793192 0 16
我想使用它们制作一个文件。实际上我想要获取前3个字段中的公共行,输出文件将具有前3个字段(其中行在2个文件之间是通用的),而2个文件中的其他列将在这3个列之后。这是输出示例:
chr9 35689814 35689922 U2OS_Noco_input_peak_1972 77 . 4.84893 12.13092 7.77385 26 chr9 35689814 35689922 TPM2 0 - 35689814 35691017 0 6 83,86,96,105,108,1203,
chr9 35748701 35748740 U2OS_Noco_input_peak_1974 197 . 10.68892 24.88541 19.76040 127 chr9 35748701 35748740 GBA2 0 - 35748701 35749983 0 5 223,269,514,524,276,
chr9 85677944 85678064 U2OS_Noco_input_peak_1980 44 . 3.93263 8.45104 4.42192 5 chr9 85677944 85678064 RASEF 0 - 85677782 85678092 0 2 261,310, 0,0,
我在awk
中尝试以下代码,但没有返回我想要的内容。
awk FS=OFS='\t' infile1.txt infile2.txt '$1 = $1, $2= $2, $3=$3 {print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15"\t"$16"\t"$17"\t"$18"\t"$19}' > out.txt
你知道怎么解决吗?
答案 0 :(得分:0)
$ awk 'BEGIN {FS=OFS="\t"}
{k=$1 FS $2 FS $3}
NR==FNR {sub(k,OFS); a[k]=$0; next}
k in a {print $0,a[k]}' file.2 file.1 | column -t
chr9 35689814 35689922 U2OS_Noco_input_peak_1972 77 . 4.84893 12.13092 7.77385 26 TPM2 0 - 35689814 35691017 0 6 83,86,96,105,108,1203,
chr9 139793146 139793192 U2OS_Noco_input_peak_2029 49 . 6.30132 9.04134 4.96447 89 TRAF2 0 + 139776363 139793192 0 16
chr9 35748701 35748740 U2OS_Noco_input_peak_1974 197 . 10.68892 24.88541 19.76040 127 GBA2 0 - 35748701 35749983 0 5 223,269,514,524,276,
chr9 85677944 85678064 U2OS_Noco_input_peak_1980 44 . 3.93263 8.45104 4.42192 5 RASEF 0 - 85677782 85678092 0 2 261,310, 0,0,
您错过了预期输出中的第二条记录。