我有一个如下文件。我想计算每个角色的数量。
>1DMLA
MTDSPGGVAPASPVEDASDASLGQPEEGAPCQVVLQGAELNGILQAFAPLRTSLLDSLLVMGDRGILIHNTIFGEQVFLP
LEHSQFSRYRWRGPTAAFLSLVDQKRSLLSVFRANQYPDLRRVELAITGQAPFRTLVQRIWTTTSDGEAVELASETLMKR
ELTSFVVLVPQGTPDVQLRLTRPQLTKVLNATGADSATPTTFELGVNGKFSVFTTSTCVTFAAREEGVSSSTSTQVQILS
NALTKAGQAAANAKTVYGENTHRTFSVVVDDCSMRAVLRRLQVGGGTLKFFLTTPVPSLCVTATGPNAVSAVFLLKPQK
>1DMLB
DDVAARLRAAGFGAVGAGATAEETRRMLHRAFDTLA
>2BHDC
MTDSPGGVAPASPVEDASDASLGQPEEGAPCQVVLQGAELNGILQAFAPLRTSLLDSLLVMGDRGILIHNTIFGEQVFLP
LEHSQFSRYRWRGPTAAFLSLVDQKRSLLSVFRANQYPDLRRVELAITGQAPFRTLVQRIWTTTSDGEAVELASETLMKR
ELTSFVVLVPQGTPDVQLRLTRPQLTKVLNATGADSATPTTFELGVNGKFSVFTTSTCVTFAAREEGVSSSTSTQVQILS
我尝试了以下代码。
awk '/^>/ { res=substr($0, 2); } /^[^>]/ { print res " - " length($0); }' <file
上述代码的输出是
1DMLA - 80
1DMLA - 80
1DMLA - 80
1DMLA - 79
1DMLB - 36
2BHDC - 80
2BHDC - 80
2BHDC - 80
我想要的输出是
1DMLA - 319
1DMLB - 36
2BHDC - 240
如何更改上述代码以获得所需的输出?
答案 0 :(得分:0)
这样:
awk -F\> '/^>/ {if (seqlen != ""){print seqlen}printf("%s - ",$2);seqlen=0;next}seqlen != ""{seqlen +=length($0)}END{print seqlen}' infile
或格式化:
awk -F\> '/^>/ { if (seqlen != "")
print seqlen
printf("%s - ",$2)
seqlen=0
next }
seqlen != ""{seqlen+=length($0)}
END{
print seqlen}' infile
请参阅: Sequence length of FASTA file
除了预期的结果,这将处理这些意外的文件格式。
$ cat infile
MTDSPGGVAPASPVEDASDASLGQPEEGAPCQVVLQGAELNGILQAFAPLRTSLLDSLLVMGDRGILIHNTIFGEQVFLP
LEHSQFSRYRWRGPTAAFLSLVDQKRSLLSVFRANQYPDLRRVELAITGQAPFRTLVQRIWTTTSDGEAVELASETLMKR
ELTSFVVLVPQGTPDVQLRLTRPQLTKVLNATGADSATPTTFELGVNGKFSVFTTSTCVTFAAREEGVSSSTSTQVQILS
NALTKAGQAAANAKTVYGENTHRTFSVVVDDCSMRAVLRRLQVGGGTLKFFLTTPVPSLCVTATGPNAVSAVFLLKPQK
>1DMLB
>2BHDC
MTDSPGGVAPASPVEDASDASLGQPEEGAPCQVVLQGAELNGILQAFAPLRTSLLDSLLVMGDRGILIHNTIFGEQVFLP
LEHSQFSRYRWRGPTAAFLSLVDQKRSLLSVFRANQYPDLRRVELAITGQAPFRTLVQRIWTTTSDGEAVELASETLMKR
ELTSFVVLVPQGTPDVQLRLTRPQLTKVLNATGADSATPTTFELGVNGKFSVFTTSTCVTFAAREEGVSSSTSTQVQILS
$ awk -F\> '/^>/ {if (seqlen != ""){print seqlen}printf("%s - ",$2);seqlen=0;next}seqlen != ""{seqlen +=length($0)}END{print seqlen}' kk2
1DMLB - 0
2BHDC - 240
答案 1 :(得分:0)
这是使用awk
的一种方式:
awk '/^>/ && r { print r, "-", s; r=s="" } /^>/ { r = substr($0, 2); next } { s += length } END { print r, "-", s }' file
结果:
1DMLA - 319
1DMLB - 36
2BHDC - 240
答案 2 :(得分:0)
awk -vRS='>' '$1{gsub( "[\r]", "",$1 );
printf "%s - %d\n", $1, length($0) - length($1) - NF + 1}' input