我有一个输出文件,我想将其转换为I文件,我可以提取整个序列。
输出是这样的:(但是然后没有“在>前面,并且在每行之后有一个输入
">comp2_c0_seq1 len=265 path=[1:0-264]
GTTTGAATGGTTTGTGGTTCTGCCTTTGACAAACTGATCATAGTGGAATAATAAGGGAAC
ATGAAGAAATTCCAAGCCCATTGATTTTCTCTTGAGACCAATTAGGTAAAGTCACTCAAA
ATTTTTGAGAGTGGATGCTCAGAGGTAACACTTTGGCATAGAATTGTTAAATAGCATGCA
CTTTAATGGAAGAATAGAATCATTAAAGATTGGTTGATAACAAGTCACAGTGTATTTAAC
CATCATCACAGCAGATGTAGACAGA
">comp2_c1_seq1 len=203 path=[2794:0-202]
CAGCAGATGTAGACAGAAATGGCACCACTGCTTATGAAGGAAACTGGAACCCAGAAGCAC
ACCCTCCCCTCATTCACCATGAGCATCATGAGGAAGAAGAGACCCCACATTCTACAAGCA
CAAGTAAGCAAGATGGCGGTCGGCAGTTCTGGGTTAGATGAATTAGTAAAGACATTCCAG
CAATAGGGAAGATTTTGTTTAGA
">comp6_c0_seq1 len=424 path=[1744:0-423]
CCAGCTCCTACTCACCAGTCTCTCCGCATGGAGAAGTGGCCGTCATGGTCGACCTGTTCC
CAAGGGTGGCCTTGTGAGTGCAGGCTCTCCTCACCAGAGCTGAGGGCTTTGTGAACCTCT
GATGTCAATAGATGCCCCTCATCTTCCAGGAGGACAAAACAGGGCAAAGCAAGACATGGG
GTGAGAACAGGAGTGCATCAGTGGGGTTCCCCAAGCCTGTGTCAGGTCCGGATCTGGGTG
GGAGTTCCCTTCTGCGTCATCCAGGCCAGGCGAGTGGGCATCCTCCCTGAGCACCTGTGC
TTGGGGCTTTGCCTGTGTCAGTCAGGAAGACAGAGTACACGGAAGAGTTACCATTGCTTT
CAGAGCAAACCTTCCTTTGACATGCATTTAACACAGCACGGAGTGATTGACATGTGTCCT
TGTG
">comp7_c0_seq1 len=208 path=[22:0-207]
GGAAGGACAGCATGTTTTCCATCTCAAAGACAGGAAAGAGTTATCTCTTCCTCTGGGATC
CATCAGCATCCTGCCTACTCCTGCGTCACAGCACAGATCCTAACTGGCAAAATTATTAAT
CTCTCTTCCACTGAAATAGATACATCAGACAGATTCCTTTCTGACTGAAACTGTTCTGCT
GTGAAAGACTAACAACAAAGCAGATGCT
">comp8_c0_seq1 len=537 path=[1925:0-536]
TTAATAATTTAATTTTACTTTGAATATGTGTATATAAAATGCCTAATGTGATAAAAGTAG
AATATGCCTGGTTGAAGGAAACATAGAAAATTGAATTGCCACTGATTTGGCCTTTCCTTC
ATCTTTCATGGGGAGCCAGAGAGAATCTGGTTCAGAAGACAGACTCTAGAGTCAAGCAGC
TGGGGTTCAAATCTTGGCAACATTTCAGGGTGATTTTAAAAATATTTAACAGCTGGTAAT
GCTAGATGTCGACTTGTCAGAATGGATAAAGCCTGACATGACGTATATAGCCACACCAGC
ATATAATCAGCCCTGTCTCCACCACTTACTAGTAGTGTCTTTATCTGTAAGATAAAGATA
GCAATAGGCATTATCTCATAGGGGTTTTATGAGGATTAGGTGTAATAATATATATAAAGC
ACTTATGACAATGTTTGGAAGAAAGTGTCATTCAACATTAGATATCATCATCATTGTCAT
CATCGTGACTAATACTTGAGGAATTCCAGAATGTTATGGTTAGAATGGTAAAGTTCT
我想要的是:
> ">comp2_c0_seq1 len=265 path=[1:0-264] GTTTGAATGGTTTGTGGTTCTGCCTTTGACAAACTGATCATAGTGGAATAATAAGGGAACATGAAGAAATTCCAAGCCCATTGATTTTCTCTTGAGACCAATTAGGTAAAGTCACTCAAAATTTTTGAGAGTGGATGCTCAGAGGTAACACTTTGGCATAGAATTGTTAAATAGCATGCACTTTAATGGAAGAATAGAATCATTAAAGATTGGTTGATAACAAGTCACAGTGTATTTAACCATCATCACAGCAGATGTAGACAGA
每个序列都在同一条线上,因此我可以通过grep轻松提取序列。希望这是可能的。
感谢
答案 0 :(得分:1)
此awk
应该:
awk '{printf (/comp/&&NR>1?"\n":"")"%s",$0}' file
">comp2_c1_seq1 len=203 path=[2794:0-202]CAGCAGATGTAGACAGAAATGGCACCACTGCTTATGAAGGAAACTGGAACCCAGAAGCACACCCTCCCCTCATTCACCATGAGCATCATGAGGAAGAAGAGACCCCACATTCTACAAGCACAAGTAAGCAAGATGGCGGTCGGCAGTTCTGGGTTAGATGAATTAGTAAAGACATTCCAGCAATAGGGAAGATTTTGTTTAGA
">comp6_c0_seq1 len=424 path=[1744:0-423]CCAGCTCCTACTCACCAGTCTCTCCGCATGGAGAAGTGGCCGTCATGGTCGACCTGTTCCCAAGGGTGGCCTTGTGAGTGCAGGCTCTCCTCACCAGAGCTGAGGGCTTTGTGAACCTCTGATGTCAATAGATGCCCCTCATCTTCCAGGAGGACAAAACAGGGCAAAGCAAGACATGGGGTGAGAACAGGAGTGCATCAGTGGGGTTCCCCAAGCCTGTGTCAGGTCCGGATCTGGGTGGGAGTTCCCTTCTGCGTCATCCAGGCCAGGCGAGTGGGCATCCTCCCTGAGCACCTGTGCTTGGGGCTTTGCCTGTGTCAGTCAGGAAGACAGAGTACACGGAAGAGTTACCATTGCTTTCAGAGCAAACCTTCCTTTGACATGCATTTAACACAGCACGGAGTGATTGACATGTGTCCTTGTG
">comp7_c0_seq1 len=208 path=[22:0-207]GGAAGGACAGCATGTTTTCCATCTCAAAGACAGGAAAGAGTTATCTCTTCCTCTGGGATCCATCAGCATCCTGCCTACTCCTGCGTCACAGCACAGATCCTAACTGGCAAAATTATTAATCTCTCTTCCACTGAAATAGATACATCAGACAGATTCCTTTCTGACTGAAACTGTTCTGCTGTGAAAGACTAACAACAAAGCAGATGCT
">comp8_c0_seq1 len=537 path=[1925:0-536]TTAATAATTTAATTTTACTTTGAATATGTGTATATAAAATGCCTAATGTGATAAAAGTAGAATATGCCTGGTTGAAGGAAACATAGAAAATTGAATTGCCACTGATTTGGCCTTTCCTTCATCTTTCATGGGGAGCCAGAGAGAATCTGGTTCAGAAGACAGACTCTAGAGTCAAGCAGCTGGGGTTCAAATCTTGGCAACATTTCAGGGTGATTTTAAAAATATTTAACAGCTGGTAATGCTAGATGTCGACTTGTCAGAATGGATAAAGCCTGACATGACGTATATAGCCACACCAGCATATAATCAGCCCTGTCTCCACCACTTACTAGTAGTGTCTTTATCTGTAAGATAAAGATAGCAATAGGCATTATCTCATAGGGGTTTTATGAGGATTAGGTGTAATAATATATATAAAGCACTTATGACAATGTTTGGAAGAAAGTGTCATTCAACATTAGATATCATCATCATTGTCATCATCGTGACTAATACTTGAGGAATTCCAGAATGTTATGGTTAGAATGGTAAAGTTCT
答案 1 :(得分:0)
您可以尝试此sed
,
sed '/">comp/{:loop; N; /\n">comp/{P;D}; s/\n//g; b loop;}' yourfile.txt