我有718个格式类似的文件,需要稍微清理一下才能使程序使用每个文件。原始文件从第一行开始有一个空格,需要删除该空格。 DNA序列每10个碱基之间不应有间隔(可以在多行中将其分解)。在下面,我首先显示原始文件的外观,然后是 应该 的外观。
原文:
14 128
Alydidae_Micrelytrinae_Leptocorisini_Stenocoris_tipuloides_CMF_0174_S59_L005 caggacccga ggttcaacag cgagattgac atgaggacag gttacaagac
Coreidae_Coreinae_Acanthocephalini_Acanthocephala_thomasi_CMF0028_UQ caggacccgc gatttaacag tgagatagac atgcgaacag gctacaagac
Coreidae_Coreinae_Anisoscelini_Anisoscelis_alipes_CMF0018_UQ caggacccgc ggtttaacag tgagatagac atgcgaactg gctacaagac
Coreidae_Coreinae_Mictini_Anoplocnemis_sp_CMF0020_UQ caggacccgc gcttcaacag tgagatagac atgcgaacag gctataagac
Coreidae_Coreinae_Mictini_Mygdonia_tuberculosa_CMF0053_UQ caggacccgc gcttcaacag tgagatagac atgcgaacag gctataagac
Coreidae_Coreinae_Nematopodini_Mozena_nr_lineolata_CMF0026_UQ caggacccgc ggttcaacag cgatatagac atgcgaacag gctacaggac
Coreidae_Coreinae_Nematopodini_Thasus_neocalifornicus_CMF0190_UQ caggacccgc gtttcaacag cgagatcgat atgcggacag ggtacaagac
Coreidae_Pseudophloeinae_Clavigrallini_Clavigralla_sp_CMF_0335_S81_L005_UQ caggatccga ggttcaacag cgagatagac atgaggacag gttacaaaac
Coreidae_Pseudophloeinae_Pseudophloeini_Myla_sp_CMF_0091_S35_L005_UQ caggacccga ggttcaacag cgagatagac atgcggacag gctataaaac
Largidae_Largus_sp_CMF_0230_S65_L005_UQ caggacccga ggttcaacag cgaaatagac atgaggactg gctataagac
Pentatomidae_Halyomorpha_halys_halhal1 caggatccga ggttcaacag cgaaatcgac atgaggactg gctacaagac
Pyrrhocoridae_Dysdercus_mimus_CMF_0110_S42_L005_UQ caggatcctc gtttcaacag cgaaatcgac atgagaacag gttacaagac
Pyrrhocoridae_Dysdercus_suturellus_CMF_0305_S71_L005_UQ caggatcctc gtttcaacag cgaaatcgac atgagaacag gttacaagac
Rhopalidae_Serinethinae_Serinethini_Jadera_haematoloma_CMF_0281_S69_L005_UQ caggaccccc gttttaacag tgaaatagac atgcgaaccg gttacaagac
taacactatc ctctgcggcc ccatctctaa ctacgaaggt gatgtgattg
caacactatc ctctgtgggc ccatctctaa ctacgaagga gaggtgatag
caacaccatc ctctgtgggc ctatttctaa ctacgaaggg gaggtgatag
caacactata ctctgcgggc ctatatccaa ctacgaagga gaggtgattg
caacacgata ctctgtgggc ctatatctaa ctacgaagga gaggtgatag
gaacaccatc ctttgcgggc cgatctccaa ctacgagggg gaggtgatcg
caacaccatc ctctgcgggc ctatctccaa ctacgaaggg gaggtgatcg
caacaccatc ctctgtggac ccatctctaa ctacgaagga gaagtgatag
caacaccatc ctctgcgggc ccatctccaa ctacgaaggg gaggtgatcg
tcataccatt ctatgtgggc ctatttcaaa ttacgaaggg gaagtgatcg
taacaccatc ctctgcggcc ccatttccaa ctacgaaggc gaagtgattg
caacacaata ctctgcggac ccatatcgaa ctacgaaggt gaagtcatag
caacacaata ctctgcggac ccatatcgaa ctacgaaggt gaagtcatag
ccacaccatc ctctgcggac ccatctccaa ctacgaaggt gaggtgatag
gagttgccca gatcatcaac aagactga
gagtagctca gatcatcaac aagaccga
gggtagctca gatcatcaac aagacgga
gagtagctca gatcatcaat aagactga
gagtagctca gatcatcaat aagaccga
gggtggcaca gatcatcaac aagacgga
gagtggctca gatcatcaac aagacgga
gcgtcgcaca gatcatc--- --------
gcgtcgcaca gatcataaac aagaccga
gggtagccca gatcataaac aaaacaga
gagtcgccca gatcatcaac aaaactga
gagtggcgca gatcatcatt aaaaccga
gagtggcgca gatcatcaat aaaacgga
gagtagccca gatcatcaac aagacgga
处理后如何 :
14 128
Alydidae_Micrelytrinae_Leptocorisini_Stenocoris_tipuloides_CMF_0174_S59_L005 caggacccgaggttcaacagcgagattgacatgaggacaggttacaagac
Coreidae_Coreinae_Acanthocephalini_Acanthocephala_thomasi_CMF0028_UQ caggacccgcgatttaacagtgagatagacatgcgaacaggctacaagac
Coreidae_Coreinae_Anisoscelini_Anisoscelis_alipes_CMF0018_UQ caggacccgcggtttaacagtgagatagacatgcgaactggctacaagac
Coreidae_Coreinae_Mictini_Anoplocnemis_sp_CMF0020_UQ caggacccgcgcttcaacagtgagatagacatgcgaacaggctataagac
Coreidae_Coreinae_Mictini_Mygdonia_tuberculosa_CMF0053_UQ caggacccgcgcttcaacagtgagatagacatgcgaacaggctataagac
Coreidae_Coreinae_Nematopodini_Mozena_nr_lineolata_CMF0026_UQ caggacccgcggttcaacagcgatatagacatgcgaacaggctacaggac
Coreidae_Coreinae_Nematopodini_Thasus_neocalifornicus_CMF0190_UQ caggacccgcgtttcaacagcgagatcgatatgcggacagggtacaagac
Coreidae_Pseudophloeinae_Clavigrallini_Clavigralla_sp_CMF_0335_S81_L005_UQ caggatccgaggttcaacagcgagatagacatgaggacaggttacaaaac
Coreidae_Pseudophloeinae_Pseudophloeini_Myla_sp_CMF_0091_S35_L005_UQ caggacccgaggttcaacagcgagatagacatgcggacaggctataaaac
Largidae_Largus_sp_CMF_0230_S65_L005_UQ caggacccgaggttcaacagcgaaatagacatgaggactggctataagac
Pentatomidae_Halyomorpha_halys_halhal1 caggatccgaggttcaacagcgaaatcgacatgaggactggctacaagac
Pyrrhocoridae_Dysdercus_mimus_CMF_0110_S42_L005_UQ caggatcctcgtttcaacagcgaaatcgacatgagaacaggttacaagac
Pyrrhocoridae_Dysdercus_suturellus_CMF_0305_S71_L005_UQ caggatcctcgtttcaacagcgaaatcgacatgagaacaggttacaagac
Rhopalidae_Serinethinae_Serinethini_Jadera_haematoloma_CMF_0281_S69_L005_UQ caggacccccgttttaacagtgaaatagacatgcgaaccggttacaagac
taacactatcctctgcggccccatctctaactacgaaggtgatgtgattg
caacactatcctctgtgggcccatctctaactacgaaggagaggtgatag
caacaccatcctctgtgggcctatttctaactacgaaggggaggtgatag
caacactatactctgcgggcctatatccaactacgaaggagaggtgattg
caacacgatactctgtgggcctatatctaactacgaaggagaggtgatag
gaacaccatcctttgcgggccgatctccaactacgagggggaggtgatcg
caacaccatcctctgcgggcctatctccaactacgaaggggaggtgatcg
caacaccatcctctgtggacccatctctaactacgaaggagaagtgatag
caacaccatcctctgcgggcccatctccaactacgaaggggaggtgatcg
tcataccattctatgtgggcctatttcaaattacgaaggggaagtgatcg
taacaccatcctctgcggccccatttccaactacgaaggcgaagtgattg
caacacaatactctgcggacccatatcgaactacgaaggtgaagtcatag
caacacaatactctgcggacccatatcgaactacgaaggtgaagtcatag
ccacaccatcctctgcggacccatctccaactacgaaggtgaggtgatag
gagttgcccagatcatcaacaagactga
gagtagctcagatcatcaacaagaccga
gggtagctcagatcatcaacaagacgga
gagtagctcagatcatcaataagactga
gagtagctcagatcatcaataagaccga
gggtggcacagatcatcaacaagacgga
gagtggctcagatcatcaacaagacgga
gcgtcgcacagatcatc-----------
gcgtcgcacagatcataaacaagaccga
gggtagcccagatcataaacaaaacaga
gagtcgcccagatcatcaacaaaactga
gagtggcgcagatcatcattaaaaccga
gagtggcgcagatcatcaataaaacgga
gagtagcccagatcatcaacaagacgga