我下载序列并希望在python中处理数据。第一个问题是格式化序列。下面的序列如下。每个序列以“>”开头,蛋白质序列在单独的行中。在对齐序列之前。我需要获得一行的所有序列(字符串)。我试过“str.startswith”,但它不起作用。我谷歌,有些人说它可以用正则表达式完成。任何人都可以帮我解决这个问题。非常感谢。
“>一种/纳亚里特/ InDRE21 / 2017_EPI1215815 DTLCIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDKHNGKLCKLRGVVPLHLGKCNIAGWILGNPECESLSTARSWSYIV ETSNSDNGTCYPGDFINYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAACPHAGAKSFYKNLIWLVKKGNSYPK LNQTYINDKGKEVLVLWGIHHPPTTADQQSLYQNADAYVFVGTSRYSKKFKPEIATRPKVRDQEGRMNYYWTLVEPGDKI TFEATGNLVVPRYAFTMERNAGSGIIISDTPVHDCNTTCQTPEGAINTSLPFQNVHPITIGKCPKYVKSTKLRLATGLRN VPSIQSRGLFGAIAGFIEGGWTGMVDGWYGYHHQNEQGSGYAADLKSTQNAIDKITNKVNSVIEKMNTQFTAVGKEFNHL EKRIENLNKKVDDGFLDIWTYNAELLVLLENERTLDYHDSNVKNLYEKVRNQLKNNAKEIGNGCFEFYHKCDNTCMESVK NGTYDYPKYSEEAKLNREKIDGVKLESTRIYQILAIYSTVASSLVLVVSLGAISFWMCSNGSLQCRICI “>一种/安大略省/ RV2895 / 2018_EPI1215825 DTLCIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDKHNGKLCKLRGVAPLHLGKCNIAGWILGNPECESLSTARSWSYIV ETSNSDNGTCYPGDFINYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAACSHAGAKSFYKNLIWLVKKGNSYPK LNQTYINDKGKEVLVLWGIHHPPTTADQQSLYQNADAYVFVGTSRYSKKFKPEIATRPKVRDQEGRMNYYWTLVEPGDKI TFEATGNLVVPRYAFTMERNAGSGIIISDTPVHDCNTTCQTPEGAINTSLPFQNVHPITIGKCPKYVKSTKLRLATGLRN VPSIQSRGLFGAIAGFIEGGWTGMVDGWYGYHHQNEQGSGYAADLKSTQNAIDKITNKVNSVIEKMNTQFTAVGKEFNHL EKRIENLNKKVDDGFLDIWTYNAELLVLLENERTLDYHDSNVKNLYEKVRTQLKNNAKEIGNGCFEFYHKCDNTCMESVK NGTYDYPKYSEEAKLNREKIDGVKLESTRIYQILAIYSTVASSLVLVVSLGAISFWMCSNGSLQCRICI “>一种/安大略省/ RV2906 / 2018_EPI1215828 DTLCIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDKHNGKLCKLGGVAPLHLGKCNIAGWILGNPECESLSTARSWSYIV ETSNSDNGTCYPGDFINYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAACPHAGAKSFYKNLIWLVKKGNSYPK LNQTYINDKGKEVLVLWGIHHPPTTADQQSLYQNADAYVFVGTSRYSKKFKPEIATRPKVRDQEGRMNYYWTLVEPGDKI TFEATGNLVAPRYAFTMERNAGSGIIISDTPVHDCNTTCQTAEGAINTSLPFQNVHPVTIGKCPKYVKSTKLRLVTGLRN VPSIQSRGLFGAIAGFIEGGWTGMVDGWYGYHHQNEQGSGYAADLKSTQNAIDKITNKVNSVIEKMNTQFTAVGKEFNHL EKRIENLNKKVDDGFLDIWTYNAELLVLLENERTLDYHDSNVKNLYEKVRNQLKNNAKEIGNGCFEFYHKCDNTCMESVK NGTYDYPKYSEEAKLNREKIDGVKLESTRIYQILAIYSTAASSLVLVVSLGAISFWMCSNGSLQCRICI '> A / Nova Scotia / RV2907 / 2018_EPI1215830 DTLCIGYHANNSTDTVDTVLEKNVTVTHSVNLLEDKHNGKLCKLGGVAPLHLGKCNIAGWILGNPECESLSTARSWSYIV ETSNSDNGTCYPGDFINYEELREQLSSVSSFERFEIFPKTSSWPNHDSNKGVTAACPHAGAKSFYKNLIWLVKKGNSYPK LNQTYINDKGKEVLVLWGIHHPPTTADQQSLYQNADAYVFVGTSRYSKKFKPEIATRPKVRDQEGRMNYYWTLVGPGDKI TFEATGNLVVPRYAFTMERNAGSGIIISDTPVHDCNTTCQTAEGAINTSLPFQNVHPVTIGKCPKYVKSTKLRLATGLRN VPSIQSRGLFGAIAGFIEGGWTGMVDGWYGYHHQNEQGSGYAADLKSTQNAIDKITNKVNSVIEKMNTQFTAVGKEFNHL EKRIENLNKKVDDGFLDIWTYNAELLVLLENERTLDYHDSNVKNLYEKVRNQLKNNAKEIGNGCFEFYHKCDNTCMESVK NGTYDYPKYSEEAKLNREKIDGVKLESTRIYQILAIYSTVASSLVLVVSLGAISFWMCSNGSLQCRICI