以下代码仅为第一部分DNA提供结果,“附加”以某种方式不附加且列表未扩展。
DNA= open("Ex72_genomic_dna2.txt", "r")
EXON = open("Ex72_exons2.txt", "r")
DNAST = []
DNAseq = DNA.read()
for li in EXON:
positions = li.split(',')
start = int (positions[0]) -1
stop = int (positions[1])
# start = 1, stop = 8
piece = str (DNAseq [start:stop])
# piece = 'CGATCGT'
DNAST.append(piece)
print (DNAST)
NewDNA = DNAST [0]
rangemax = len(DNAST)
if rangemax > 1:
for num in range (1, rangemax):
NewDNA = NewDNA + "" + DNAST [num]
DNA.close()
EXON.close()
print (NewDNA)
print ("But should be CGATCGTCCGTCCGATGCCGATCG")
# the Content of Ex72_genomic_dna2.txt: TCGATCGTACCGTCGACGATGCTACGATCGTCGAT
#the Content of Ex72_exons2.txt: 2,8,10,14,17,22,25,30
答案 0 :(得分:1)
我认为问题出现了,因为您认为for li in EXON:
会迭代Ex72_exons2.txt
文件中的每个项目。执行此操作的正确方法是像以前一样迭代每一行,然后迭代该行中的每个拆分。
这是适合您情况的正确代码:
DNA= open("Ex72_genomic_dna2.txt", "r")
EXON = open("Ex72_exons2.txt", "r")
DNAST = []
DNAseq = DNA.read()
for li in EXON:
positions = li.split(',')
for i in range(0, len(positions), 2):
start = int (positions[i]) -1
# .strip() because of trailing newline character.
stop = int (positions[i+1].strip())
piece = str (DNAseq [start:stop])
DNAST.append(piece)
print (DNAST)
NewDNA = DNAST [0]
rangemax = len(DNAST)
if rangemax > 1:
for num in range (1, rangemax):
NewDNA = NewDNA + "" + DNAST [num]
DNA.close()
EXON.close()
print (NewDNA)
print ("CGATCGTCCGTCCGATGCCGATCG") # <-- correct output for comparison.
值得指出的是这部分代码:
NewDNA = DNAST [0]
rangemax = len(DNAST)
if rangemax > 1:
for num in range (1, rangemax):
NewDNA = NewDNA + "" + DNAST [num]
可以替换为:
NewDNA = ''.join(DNAST)
答案 1 :(得分:0)
在您的代码中,for li in EXON:
中 li 的值将是EXON的全部内容。即如果Ex72_exons2.txt的文本为“1,2,3,4”,则li为“1,2,3,4”,循环将只迭代一次。来自torrent的解决方案将解决您遇到的问题。