输入
ATTTGGC
TGCCTTA
CGGTATC
GAAAATT
我希望每行输出3个mers,形成由所有3个mers组成的最终列表 输出应该像
[ATT, TTT, TTG, TGG, GGC, TGC, GCC...]
不是第一行的GC\n
或第二行的TA\n
def getKmersFromDna(Dna,k):
kmer_list = []
for i in range(len(Dna)-k+1):
kmer_list.append(Dna[i:i+k])
return list(kmer_list)
给予
我不需要的输出像['CC\n', 'C\nG', '\nGT']
。
答案 0 :(得分:1)
data = '''
ATTTGGC
TGCCTTA
CGGTATC
GAAAATT
'''
for line in map(str.strip, data.splitlines()):
if not line:
continue
print([''.join(c) for c in zip(line[::1], line[1::1], line[2::1])])
打印:
['ATT', 'TTT', 'TTG', 'TGG', 'GGC']
['TGC', 'GCC', 'CCT', 'CTT', 'TTA']
['CGG', 'GGT', 'GTA', 'TAT', 'ATC']
['GAA', 'AAA', 'AAA', 'AAT', 'ATT']
答案 1 :(得分:0)
一个非常基本的代码如下:
def getKmersFromDna(Dna,k):
dna_list = Dna.strip().split('\n')
kmer_list = []
for cur_dna in dna_list: # iternating over each line of input
for i in range(len(cur_dna)-k+1): # finding Kmer from each line of input
kmer_list.append(cur_dna[i:i+k])
return list(kmer_list)