Python:搜索同一类的对象列表以确​​定object.attribute等于列表中任何对象的某个值

时间:2018-10-16 19:55:05

标签: python python-3.x class iteration

假设我有一个Kmerobj类,其属性为:kmer(字符串)和locationlist(列表)

class Kmerobj(object):

    def __init__(self,kmer,locationlist):
        self.kmer = kmer
        self.locationlist = locationlist

现在让我们说我有一些字符串,并且正在遍历该字符串并创建所有长度为k的子字符串,并将它们作为Kmerobj对象存储在列表中,其中kmer属性为子字符串,而locationlist属性为列表该子字符串的起始站点。我已经编写了一个函数来执行此操作。

def kmerizeseq(sequence,kmer_size):
    kmer_list = []
    sequence = sequence.upper()
    if (kmer_size <= len(sequence) and kmer_size >= 1):
        for start in range(0,len(sequence)-kmer_size+1,1):
            kmerseq = sequence[start:start+kmer_size]
            if all(kmerseq != kmerobj.kmer for kmerobj in kmer_list):
                kmerinst = Kmerobj(kmerseq,[start],list(seq_name))
                kmer_list.append(kmerinst)
            else:
                for kmerobj in kmer_list:
                    if kmerseq == kmerobj.kmer:
                        kmerobj.locationlist.append(start)
    return kmer_list

现在这可行。如果我运行该功能

kmerizeseq('ATCATC',3)

我得到三个对象的列表。第一个具有kmer属性'ATC'和locationlist属性[0,3]。第二个具有kmer属性'TCA'和locationlist属性[1]。第三个具有kmer属性'CAT'和locationlist属性[2]。

我的问题是:有什么方法可以更有效地实现相同的结果?目前,我正在遍历整个列表以确定是否有任何对象具有与输入相同的kmer属性,然后再次遍历该列表以找到匹配的对象并进行修改。

有什么方法可以遍历列表,如果当前对象的kmer属性与输入匹配则停止并修改该对象,如果找不到匹配项,则将新的kmerobj对象添加到列表中?理想情况下,我只需要遍历列表一次。

1 个答案:

答案 0 :(得分:0)

  

问题:有什么方法可以更有效地实现相同的结果?

您需要使用Kmerobj序列作为密钥来随机访问kmer

使用dict考虑以下内容:

class Kmerobj2(object):
    def __init__(self, kmer):
        """Parameter 'kmer' is a tuple of (kmer,index), e.g ('ATC', 0)"""
        self.kmer = kmer[0]
        self.loc = [kmer[1]]

    def append(self, kmer):
        self.loc.append(kmer[1])

    def locations(self):
        return len(self.loc)

    def __str__(self):
        return "{} => {} location(s) at {}".format(self.kmer, self.locations(), self.loc)

def kmerizeseq2(sequence, kmer_size):
    l = []
    # Create len(sequence) tuples == (seq, i) with kmer_size in ONE loop
    for i, c in enumerate(sequence):
        l.append((sequence[i:i + kmer_size], i))

    print("[{}]{}".format( len(l), l))
    #>>>[6][('ATC', 0), ('TCA', 1), ('CAT', 2), ('ATC', 3), ('TC', 4), ('C', 5)]

    d = {}
    # Aggregate all equal kmer of len kmer_size
    for kmer in l[:(len(sequence)-kmer_size)+1]:
        # kmer exists ?
        if kmer[0] in d:
            # Append kmer.loc to d[kmer]
            d[kmer[0]].append(kmer)
        else:
            # Create a new Kmerobj
            d[kmer[0]] = Kmerobj2(kmer)
    return d

if __name__ == "__main__":
    d = kmerizeseq2('ATCATC',3)
    print("type:{}, {}".format(type(d), d))
    #>>> type:<class 'dict'>, {'CAT': <__main__.Kmerobj2 object at 0xf70634ec>, 'TCA': <__main__.Kmerobj2 object at 0xf70634cc>, 'ATC': <__main__.Kmerobj2 object at 0xf706348c>}

    for kmer in d:
        print("{}".format(d[kmer]))
  

输出

CAT => 1 location(s) at [2]
TCA => 1 location(s) at [1]
ATC => 2 location(s) at [0, 3]

使用Python测试:3.4.2