文件1:
scaffold2232_size19577 gene 8878 9258
scaffold2232_size19577 CDS 8878 9258
scaffold2232_size19577 gene 10631 14562
scaffold2232_size19577 intron 10693 11242
scaffold2232_size19577 intron 11343 14252
scaffold2232_size19577 intron 14346 14499
scaffold2232_size19577 CDS 10631 10692
scaffold2232_size19577 CDS 11243 11342
scaffold2232_size19577 CDS 14253 14345
scaffold2232_size19577 CDS 14500 14562
scaffold2232_size19577 gene 18807 19055
scaffold2232_size19577 CDS 18807 19055
file2的:
scaffold2232_size19577 8878 9258 Os12t0508300-01
scaffold2232_size19577 8878 9258 Os12t0508300-01
scaffold2232_size19577 10631 14562 Os12t0508300-01
scaffold2232_size19577 10693 11242 Os12t0508300-01
scaffold2232_size19577 11343 14252 Os12t0508300-01
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 10631 10692 Os12t0508300-01
scaffold2232_size19577 11243 11342 Os12t0508300-01
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
期望的输出:
scaffold2232_size19577 8878 9258 Os12t0508300-01 gene
scaffold2232_size19577 8878 9258 Os12t0508300-01 CDS
scaffold2232_size19577 10631 14562 Os12t0508300-01 gene
scaffold2232_size19577 10693 11242 Os12t0508300-01 intron
scaffold2232_size19577 11343 14252 Os12t0508300-01 intron
scaffold2232_size19577 14346 14499 Os12t0508400-00 intron
scaffold2232_size19577 10631 10692 Os12t0508300-01 CDS
scaffold2232_size19577 11243 11342 Os12t0508300-01 CDS
scaffold2232_size19577 14253 14345 Os12t0508400-00 CDS
scaffold2232_size19577 14500 14562 Os12t0508400-00 CDS
scaffold2232_size19577 18807 19055 Os12t0508400-00 gene
scaffold2232_size19577 18807 19055 Os12t0508400-00 CDS
我尝试过:awk '{a[$1,$2,$3]=$0}END{for(i in a) print a[i]}' file2
但有了这个,我失去了一个基因/ CDS系列,因为他们在col [2],[3]中有相同的坐标 所以产出即将到来:
scaffold2232_size19577 8878 9258 Os12t0508300-01
scaffold2232_size19577 10631 14562 Os12t0508300-01
scaffold2232_size19577 10693 11242 Os12t0508300-01
scaffold2232_size19577 11343 14252 Os12t0508300-01
scaffold2232_size19577 14346 14499 Os12t0508400-00
scaffold2232_size19577 10631 10692 Os12t0508300-01
scaffold2232_size19577 11243 11342 Os12t0508300-01
scaffold2232_size19577 14253 14345 Os12t0508400-00
scaffold2232_size19577 14500 14562 Os12t0508400-00
scaffold2232_size19577 18807 19055 Os12t0508400-00
我以为我以后可以将file1的col [2]添加到file2但是在awk的这个操作之后行数减少了,所以我无法添加它们。 我希望这就像我想要的输出。
答案 0 :(得分:1)
这样的东西?
awk 'FNR==NR {a[$2FS$3]=$4;next} {print $1,$3,$4,a[$3FS$4],$2}' OFS="\t" f2 f1
scaffold2232_size19577 8878 9258 Os12t0508300-01 gene
scaffold2232_size19577 8878 9258 Os12t0508300-01 CDS
scaffold2232_size19577 10631 14562 Os12t0508300-01 gene
scaffold2232_size19577 10693 11242 Os12t0508300-01 intron
scaffold2232_size19577 11343 14252 Os12t0508300-01 intron
scaffold2232_size19577 14346 14499 Os12t0508400-00 intron
scaffold2232_size19577 10631 10692 Os12t0508300-01 CDS
scaffold2232_size19577 11243 11342 Os12t0508300-01 CDS
scaffold2232_size19577 14253 14345 Os12t0508400-00 CDS
scaffold2232_size19577 14500 14562 Os12t0508400-00 CDS
scaffold2232_size19577 18807 19055 Os12t0508400-00 gene
scaffold2232_size19577 18807 19055 Os12t0508400-00 CDS