Question

Stack Overflow上有一个类似的问题，但它使用的是Linux终端（Search for specific characters in specific positions of line）。我想用python做类似的事情，我无法弄清楚什么是pythonic方法来做到这一点，而无需手动编写成员资格检查。

我想在多序列比对的特定位置搜索特定氨基酸。我已经在索引列表中定义了氨基酸比对的位置，

e.g Index = [1, 100, 235, 500].

我已在这些位置定义了我想要的氨基酸。

Res1 = ["A","G"]
Res2 = ["T","F"]
Res3 = ["S,"W"]
Res4 = ["H","J"]

我目前正在做这样的事情：

for m in records_dict:
    if (records_dict[m].seq[Index[0]] \
        in Res1) and (records_dict[m].seq[Index[1]] \
        in Res2) and (records_dict[m].seq[Index[2]] \
        in Res3) and (records_dict[m].seq[Index[3]]\
        in Res4)
    print m

现在，假设我有一个我要检查的40个残基的清单，我知道我必须手动检查残留物清单，但当然，有一种更简单的方法可以使用while循环或别的。

另外，有没有什么方法可以合并一个系统，如果没有序列匹配所有40个成员资格检查，我会得到最接近匹配所有40个检查的5个最佳序列，以及诸如序列“m”的输出有30/40场比赛和30场比赛的名单，哪10场比赛不匹配？

Answer 1

我假设您要检查Res1是Index[0]，Res2是Index[1]，依此类推。

res = [Res1, Res2, Res3, Res4]
for m in records_dist:
    match = 0
    match_log = []
    for i in Index:
        if records_dict[m].seq[i] in res[i]:
            match += 1
            match_log.append(i)

使用这个小代码，您可以计算匹配数，并跟踪每个records_dist值匹配的索引。

如果您想检查ResX是否在多个位置，或者您不想像Res列表那样订购索引列表，我会定义一个列表，其中包含键是ResX，值是索引列表：

to_check = {}
to_check[Res1] = [index1, index2]
to_check[Res2] = [index1, ..., indexN]
...
to_check[ResX] = [indexI, ..., indexJ]

然后，使用

match_log = {}
for m in records_dist:
    match_log[m] = {}
    for res, indexes in to_check:
        match_log[m][res] = []
        for i in indexes:
            if records_dict[m].seq[i] in res:
                match_log[m][res].append(i)
        nb_match = len(match_log[m][res])

或以更加pythonic的方式，使用filter：

match_log = {}
for m in records_dist:
    match_log[m] = {}
    for res, indexes in to_check:
        match_log[m][res] = filter(lamba i: records_dict[m].seq[i] in res, indexes)
        nb_match = len(match_log[m][res])

检查多序列比对中特定位置的特定氨基酸

1 个答案: