使用多行python输入文件中的每一行

时间:2019-07-12 13:33:03

标签: python-3.x

输入

 ATTTGGC

 TGCCTTA

 CGGTATC

 GAAAATT

我希望每行输出3个mers,形成由所有3个mers组成的最终列表  输出应该像

[ATT, TTT, TTG, TGG, GGC, TGC, GCC...]

不是第一行的GC\n或第二行的TA\n

def getKmersFromDna(Dna,k):
kmer_list = []
for i in range(len(Dna)-k+1):
        kmer_list.append(Dna[i:i+k])
return list(kmer_list)

给予

我不需要的输出像['CC\n', 'C\nG', '\nGT']

2 个答案:

答案 0 :(得分:1)

data = '''

 ATTTGGC

 TGCCTTA

 CGGTATC

 GAAAATT
 '''

for line in map(str.strip, data.splitlines()):
    if not line:
        continue
    print([''.join(c) for c in zip(line[::1], line[1::1], line[2::1])])

打印:

['ATT', 'TTT', 'TTG', 'TGG', 'GGC']
['TGC', 'GCC', 'CCT', 'CTT', 'TTA']
['CGG', 'GGT', 'GTA', 'TAT', 'ATC']
['GAA', 'AAA', 'AAA', 'AAT', 'ATT']

答案 1 :(得分:0)

一个非常基本的代码如下:

def getKmersFromDna(Dna,k):
    dna_list = Dna.strip().split('\n')
    kmer_list = []
    for cur_dna in dna_list: # iternating over each line of input
        for i in range(len(cur_dna)-k+1): # finding Kmer from each line of input
            kmer_list.append(cur_dna[i:i+k])
    return list(kmer_list)