我希望使用awk
,sed
或其他工具获得如下所示的一项功能。
例如,如下所示,
第一个文件名:File1.txt
内部(带有制表符分隔的表格格式)
ID Match Length
100 OK 1000
200 OK 1000
300 OK 2000
400 OK 2000
500 OK 3000
第二个文件名:File2.fasta
该信息包含如下信息
>100
ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
>200
CTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA
>300
TGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAC
>400
GACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACT
>500
ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
所以我想再从File2.fasta向File1.txt文件扩展一列 所以这是最终结果
ID Match Length Sequence
100 OK 1000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
200 OK 1000 CTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA
300 OK 2000 TGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAC
400 OK 2000 GACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACT
500 OK 3000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
有没有人对如何组合这两个文件有任何好的想法?
答案 0 :(得分:2)
我相信你正在寻找加入。
首先,您需要对文件进行排序,并采用通用格式(相同的分隔符)。
cat File2.fasta |sed 's/$/\t/g'|tr -d '\n' |sed 's/>/\n/g'|sort > File2.fasta.sorted
cat File1.txt|sort > File1.txt.sorted
然后,您只需要像这样加入:
join -a1 -t'$TAB' File1.txt.sorted File2.fasta.sorted
请注意,$ TAB表示制表符。
这会产生这样的结果:
100 OK 1000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
200 OK 1000 CTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA
300 OK 2000 TGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAC
400 OK 2000 GACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACT
500 OK 3000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
ID Match Length
您想要的是什么(列名/位置除外)。
答案 1 :(得分:0)
IFS=$(echo -en "\n\b") && i=1 && for a in $(cat File1.txt); do ((i)) && echo "$a Sequence" && i=0 || echo "$a $(sed -n "/$(echo $a | awk '{print $1}')/{n;p}" File2.fasta)"; done && unset IFS
循环文件,第一行只执行一次新标题,之后使用sed查找匹配后的下一行并在新列上回显它。