我有一个包含许多蛋白质序列的FASTA
文件。我需要阅读FASTA文件,删除标题并将序列保存在不同的变量中。关于如何在Perl中做这些建议(请不是Bio Perl)?
FASTA文件的示例:
gi|542264878|ref|XP_003460692.2| PREDICTED: myosin heavy chain, fast skeletal muscle-like, partial [Oreochromis niloticus|
KCFEKPKPAKGKAEAHFSLVHYAGTVDYNITGWLDKNKDPLNDSVVQLYQKSSNKLLALLYVAHAGGEEAGGGKKGGKKKGGSFQTVSALFRENLGKLMTNLRSTHPHFVRCLIPNETKTPGLMENFLVIHQLRCNGVLEGIRICRKGFPSRILYGDFKQRYKVLNASVIPEGQFIDNKKAS
我只想要序列:
KCFEKPKPAKGKAEAHFSLVHYAGTVDYNITGWLDKNKDPLNDSVVQLYQKSSNKLLALLYVAHAGGEEAGGGKKGGKKKGGSFQTVSALFRENLGKLMTNLRSTHPHFVRCLIPNETKTPGLMENFLVIHQLRCNGVLEGIRICRKGFPSRILYGDFKQRYKVLNASVIPEGQFIDNKKAS
答案 0 :(得分:0)
如果awk
对你来说没问题,那么这个简单的单行就可以了
# cat test
gi|542264878|ref|XP_003460692.2| PREDICTED: myosin heavy chain, fast skeletal muscle-like, partial [Oreochromis niloticus| KCFEKPKPAKGKAEAHFSLVHYAGTVDYNITGWLDKNKDPLNDSVVQLYQKSSNKLLALLYVAHAGGEEAGGGKKGGKKKGGSFQTVSALFRENLGKLMTNLRSTHPHFVRCLIPNETKTPGLMENFLVIHQLRCNGVLEGIRICRKGFPSRILYGDFKQRYKVLNASVIPEGQFIDNKKAS
# awk '{print $NF}' test
KCFEKPKPAKGKAEAHFSLVHYAGTVDYNITGWLDKNKDPLNDSVVQLYQKSSNKLLALLYVAHAGGEEAGGGKKGGKKKGGSFQTVSALFRENLGKLMTNLRSTHPHFVRCLIPNETKTPGLMENFLVIHQLRCNGVLEGIRICRKGFPSRILYGDFKQRYKVLNASVIPEGQFIDNKKAS
以下是perl
方式:
# perl -lane 'print $F[-1]' test
KCFEKPKPAKGKAEAHFSLVHYAGTVDYNITGWLDKNKDPLNDSVVQLYQKSSNKLLALLYVAHAGGEEAGGGKKGGKKKGGSFQTVSALFRENLGKLMTNLRSTHPHFVRCLIPNETKTPGLMENFLVIHQLRCNGVLEGIRICRKGFPSRILYGDFKQRYKVLNASVIPEGQFIDNKKAS
请参阅此链接以获取每个单行说明:https://blogs.oracle.com/ksplice/entry/the_top_10_tricks_of