感谢您的帮助和抱歉我的天真 我需要生成一些单词的排列(ATGC)实际上是用于二组分的核苷酸(例如AA AT AG AC),三组分(AAA AAT AAC AAG),四,五等(一次一个),然后检查包含具有某些值的序列的其他文件,每个排列的出现次数。我生成了排列列表。 现在我只需循环遍历序列(从值中拆分序列)来计算上面生成的每个排列,并在新文件中获得输出。 但我只得到一个序列的答案,而不是其他序列的答案。
我试图遵循的程序逻辑是:
输入测试文件= DNA_seq_val.txt AAAATTTT#99 \ n CCCCGGGG#77 \ n ATATATCGCGCG#88 \ n
*我得到的输出是 - 2,0,0,1,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT \ n 77 CCCCGGGG \ n 88 ATATATCGCGCG 需要的输出是 2,0,0,1,0,0,0,0,0,0,0,0,0,0,2,99 AAAATTTT \ n x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,77 CCCCGGGGx \ n x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,88 ATATATCGCGCG(其中x =第一行中的对应计数)
enter code herefrom itertools import product
import os
f2 = open('TRYYY', 'a')
#********Generate the permutations start********
per = product('ACGT', repeat=2) # ATGC =nucleotides; 2= for di ntd(replace 2 with 3 fir tri ntds and so on)
f = open('myfile', 'w')
p = ""
for p in per:
p = "".join(p)
f.write(p + "\n")
f.close()
#********Generate the permutations ENDS********
with open('DNA_seq_val.txt', 'r+') as SEQ, open('myfile', 'r+') as TET: #open two files
SEQ_lines = sum(1 for line in open('DNA_seq_val.txt')) #count lines in sequences file
#print (SEQ_lines)
compo_lines = sum(1 for line in open('myfile')) #count lines in composition
#print (compo_lines)
for lines in SEQ:
line,val1 = lines.split("#")
val2 = val1.rstrip('\n')
val = str(val2)
line = line.rstrip('\n')
length =len(line)
#print (line)
#print (val)
LIN = line, val
#print (LIN)
newstr = "".join((line))
print (newstr)
#while True: # infinte loop
for PER in TET:
#print (line)
PER = PER.rstrip('\n')
length2 =len(PER)
#print (length2)
#print (line)
# print (PER)
C_PER = str(line.count(PER))
# print (C_PER)
for R in C_PER:
R1 = "".join(R)
f2.write(R1+ ",")
f2.write(val,)
f2.write('\t')
f2.write(line)
f2.write('\n')
#exit()