假设我有一个Kmerobj类,其属性为:kmer(字符串)和locationlist(列表)
class Kmerobj(object):
def __init__(self,kmer,locationlist):
self.kmer = kmer
self.locationlist = locationlist
现在让我们说我有一些字符串,并且正在遍历该字符串并创建所有长度为k的子字符串,并将它们作为Kmerobj对象存储在列表中,其中kmer属性为子字符串,而locationlist属性为列表该子字符串的起始站点。我已经编写了一个函数来执行此操作。
def kmerizeseq(sequence,kmer_size):
kmer_list = []
sequence = sequence.upper()
if (kmer_size <= len(sequence) and kmer_size >= 1):
for start in range(0,len(sequence)-kmer_size+1,1):
kmerseq = sequence[start:start+kmer_size]
if all(kmerseq != kmerobj.kmer for kmerobj in kmer_list):
kmerinst = Kmerobj(kmerseq,[start],list(seq_name))
kmer_list.append(kmerinst)
else:
for kmerobj in kmer_list:
if kmerseq == kmerobj.kmer:
kmerobj.locationlist.append(start)
return kmer_list
现在这可行。如果我运行该功能
kmerizeseq('ATCATC',3)
我得到三个对象的列表。第一个具有kmer属性'ATC'和locationlist属性[0,3]。第二个具有kmer属性'TCA'和locationlist属性[1]。第三个具有kmer属性'CAT'和locationlist属性[2]。
我的问题是:有什么方法可以更有效地实现相同的结果?目前,我正在遍历整个列表以确定是否有任何对象具有与输入相同的kmer属性,然后再次遍历该列表以找到匹配的对象并进行修改。
有什么方法可以遍历列表,如果当前对象的kmer属性与输入匹配则停止并修改该对象,如果找不到匹配项,则将新的kmerobj对象添加到列表中?理想情况下,我只需要遍历列表一次。
答案 0 :(得分:0)
问题:有什么方法可以更有效地实现相同的结果?
您需要使用Kmerobj
序列作为密钥来随机访问kmer
。
使用dict
考虑以下内容:
class Kmerobj2(object):
def __init__(self, kmer):
"""Parameter 'kmer' is a tuple of (kmer,index), e.g ('ATC', 0)"""
self.kmer = kmer[0]
self.loc = [kmer[1]]
def append(self, kmer):
self.loc.append(kmer[1])
def locations(self):
return len(self.loc)
def __str__(self):
return "{} => {} location(s) at {}".format(self.kmer, self.locations(), self.loc)
def kmerizeseq2(sequence, kmer_size):
l = []
# Create len(sequence) tuples == (seq, i) with kmer_size in ONE loop
for i, c in enumerate(sequence):
l.append((sequence[i:i + kmer_size], i))
print("[{}]{}".format( len(l), l))
#>>>[6][('ATC', 0), ('TCA', 1), ('CAT', 2), ('ATC', 3), ('TC', 4), ('C', 5)]
d = {}
# Aggregate all equal kmer of len kmer_size
for kmer in l[:(len(sequence)-kmer_size)+1]:
# kmer exists ?
if kmer[0] in d:
# Append kmer.loc to d[kmer]
d[kmer[0]].append(kmer)
else:
# Create a new Kmerobj
d[kmer[0]] = Kmerobj2(kmer)
return d
if __name__ == "__main__":
d = kmerizeseq2('ATCATC',3)
print("type:{}, {}".format(type(d), d))
#>>> type:<class 'dict'>, {'CAT': <__main__.Kmerobj2 object at 0xf70634ec>, 'TCA': <__main__.Kmerobj2 object at 0xf70634cc>, 'ATC': <__main__.Kmerobj2 object at 0xf706348c>}
for kmer in d:
print("{}".format(d[kmer]))
输出:
CAT => 1 location(s) at [2] TCA => 1 location(s) at [1] ATC => 2 location(s) at [0, 3]
使用Python测试:3.4.2