感谢您的帮助。我想提取特定的内含子fasta,然后将内含子fasta与CDS fasta合并以输出我的特定记录。我可以用biopython或python做这个吗?
我的gff file.example:
1 ensembl intron 7904 9192 . - . Parent=GRMZM2G059865_T01;Name=intron.71462
1 ensembl intron 6518 6638 . - . Parent=GRMZM2G059865_T01;Name=intron.71465
1 ensembl intron 6266 6361 . - . Parent=GRMZM2G059865_T01;Name=intron.71466
1 ensembl intron 5976 6107 . - . Parent=GRMZM2G059865_T01;Name=intron.71467
1 ensembl intron 5189 5341 . - . Parent=GRMZM2G059865_T01;Name=intron.71469
1 ensembl CDS 9193 9519 . - . Parent=GRMZM2G059865_T01;Name=CDS.71479
1 ensembl CDS 7594 7903 . - 0 Parent=GRMZM2G059865_T01;Name=CDS.71480
1 ensembl CDS 6918 7120 . - 1 Parent=GRMZM2G059865_T01;Name=CDS.71481
1 ensembl CDS 6639 6797 . - 0 Parent=GRMZM2G059865_T01;Name=CDS.71482
1 ensembl CDS 6362 6517 . - 0 Parent=GRMZM2G059865_T01;Name=CDS.71483
1 ensembl CDS 6108 6265 . - 0 Parent=GRMZM2G059865_T01;Name=CDS.71484
1 ensembl CDS 5857 5975 . - 2 Parent=GRMZM2G059865_T01;Name=CDS.71485
1 ensembl CDS 5342 5407 . - 1 Parent=GRMZM2G059865_T01;Name=CDS.71486
1 ensembl CDS 5127 5188 . - 1 Parent=GRMZM2G059865_T01;Name=CDS.71487
1 ensembl intron 39443409 39443716 . + . Parent=GRMZM2G441511_T01;Name=intron.100057
1 ensembl intron 39445109 39445314 . + . Parent=GRMZM2G441511_T01;Name=intron.100061
1 ensembl intron 39450586 39450706 . + . Parent=GRMZM2G441511_T01;Name=intron.100066
1 ensembl CDS 39443355 39443408 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100082
1 ensembl CDS 39443717 39443785 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100083
1 ensembl CDS 39444013 39444161 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100084
1 ensembl CDS 39444634 39444721 . + 2 Parent=GRMZM2G441511_T01;Name=CDS.100085
1 ensembl CDS 39445026 39445108 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100086
1 ensembl CDS 39445315 39445486 . + 2 Parent=GRMZM2G441511_T01;Name=CDS.100087
1 ensembl CDS 39447442 39447548 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100088
1 ensembl CDS 39449775 39449850 . + 2 Parent=GRMZM2G441511_T01;Name=CDS.100089
1 ensembl CDS 39449938 39450049 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100090
1 ensembl CDS 39450433 39450585 . + 1 Parent=GRMZM2G441511_T01;Name=CDS.100091
1 ensembl CDS 39450707 39450822 . + 1 Parent=GRMZM2G441511_T01;Name=CDS.100092
1 ensembl CDS 39450992 39451159 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100093
1 ensembl CDS 39451204 39451266 . + 0 Parent=GRMZM2G441511_T01;Name=CDS.100094
........
答案 0 :(得分:0)
这太模糊了,答案也是如此。您可以使用Biopython中的简单Seq
对象,加载初始或源(完整基因?)序列:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
seq = Seq("ATCAGCATCAGCATCGACTAGCATCGCATCAGC", IUPAC.unambiguous_dna)
# Select this ^^^^^^^^ ^^
print seq[3:10] + seq[20:23]
# AGCATCAGCA