Question

我正在为生物信息学应用程序解析NCBI Blast搜索的输出文件。本质上，该搜索采用模板遗传序列，并找到与模板序列具有显着相似性的一系列序列（重叠群）。

为了提取重叠群的许多匹配项，我的目标是创建具有以下格式的列表列表：

'[(contig #), (frame #), (first character # of the subject ("Sbjct")),(last character # of the subject ("Sbjct")]'

例如给定节的重叠群＃1568，帧= -1，从主题的字符＃5509开始，到主题的字符＃3914结束的输出子列表是：

[1568,-1,5509,3914]

在这个问题上，我省略了子列表的最后一项。我面临的挑战是，因为有多个读取文件，有时它们包含与其他文件相同的重叠群，所以我正在创建的列表列表有时会用相同的重叠群扩展两次。让我解释一下。

如下面发布的代码块中所述，如果子列表是唯一的（不存在），我尝试仅添加一个新的子列表。我认为存在的问题是，将子列表中的所有项目与另一个子列表中的所有项目进行了比较。由于尽管重叠群号相同，但其他参数不相同，因此导致重复。我只希望第一个具有特定重叠群＃的子列表成为它保留的那个子列表，而不考虑其他参数。

for ind, line in enumerate(contents,1):
    if re.search("(.*)>(.*)", line):
        c1 = line.split('[')
        c2 = c1[1].split(']')
        c3 = c2[0]
        my_line = getline(file.name, ind + 5)
        f1 = my_line.split('= ')
        if '+' in f1[1]:
            f2 = f1[1].split('+')
            f3 = f2[1].split('\n')[0]
        else:
            f3 = f1[1].split('\n')[0]
            my_line2 = getline(file.name, ind + 7)
            q1 = my_line2.split(' ')[2]
            my_line3 = getline(file.name, ind - 3)  
            l1= [c3,f3,q1]
            if l1 not in x:
                x.extend([l1])

这是我收到的实际输出：

[['1568', '-1', '12'], ['0003', '1', '12'], ['0130', '3', '12'], ['0097', '1', '20'], ['0512', '3', '11'], ['0315', '-1', '296'], ['0118', '-2', '52'], ['0308', '-3', '488'], ['1568', '-1', '1'], ['0003', '1', '1'], ['0130', '3', '4'], ['0097', '1', '28'], ['0512', '3', '23'], ['0315', '-1', '21'], ['0118', '-2', '39'], ['0102', '-3', '293'], ['0495', '-1', '146'], ['0386', '-3', '146']]

这是我的期望：

[['1568', '-1', '12'], ['0003', '1', '12'], ['0130', '3', '12'], ['0097', '1', '20'], ['0512', '3', '11'], ['0315', '-1', '296'], ['0118', '-2', '52'], ['0308', '-3', '488'], ['0102', '-3', '293'], ['0495', '-1', '146'], ['0386', '-3', '146']]

如果新子列表的第一项不在其他任何子列表中，我如何才能添加子列表？请帮忙！

Answer 1

这可能是一种快速解决方案，请替换以下行：

class Person(models.Model):
    person_name = models.CharField(max_length=255)
    person_location = models.CharField(max_length=255, null=True)


classReport (models.Model):
    person = models.ForeignKey(
        Person, related_name='people', default="", on_delete=models.CASCADE)
    product_name = models.CharField(max_length=255)
    product_description = models.CharField(max_length=255)

使用：

if l1 not in x:

这将检查#if (any(c3 in temp for temp in x)): if (not any(c3 == temp[0] for temp in x)):中是否已经包含c3列表中的任何l1实例（您在temp子列表中的第一个元素）

仅当新列表的第一项唯一时才扩展列表列表

1 个答案: