Question

我正在尝试编写一个python函数，它将两个列表作为输入：一个包含一些分子SMILES代码，另一个包含分子名称。

然后它计算所有分子对之间的TANIMOTO系数（我已经有了这个功能）并分别返回两个新列表，其中包含所有分子的SMILES和名称，其中Tanimoto与其他任何分子不高于某个分子阈值。

这是我到目前为止所做的，但它给出了错误的结果（我获得的大多数分子几乎相同......）：

def TanimotoFilter(molist,namelist,threshold):
    # molist is the smiles list
    # namelist is the name list (SURPRISE!) is this some global variable name?
    # threshold is the tanimoto threshold (SURPRISE AGAIN!)
    smilesout=[]
    names=[]
    tans=[]
    exclude=[]
    for i in range(1,len(molist)):
        if i not in exclude:
            smilesout.append(molist[i])
            names.append(namelist[i])
            for j in range(i,len(molist)):
                if i==j:
                   tans.append('SAME')
                else:
                   tanimoto=tanimoto_calc(molist[i],molist[j])
                   if tanimoto>threshold:
                      exclude.append(j)
                      #print 'breaking for '+str(i)+' '+str(j)
                      break
                   else:
                      tans.append(tanimoto)

    return smilesout, names, tans

如果您提出的修改尽可能基本，我将非常感激，因为此代码适用于那些在他们的生活中几乎看不到终端的人......如果是的话，这并不重要充满了让它变慢的循环。

谢谢大家！

Answer 1

我对函数的逻辑做了一些修改。如问题中所述，它返回两个带有SMILES和名称的列表。我不确定晒黑的目的，因为tanimoto值是一个元组而不是单个分子。无法在没有数据的情况下测试代码，请告诉我这是否有效。

def TanimotoFilter(molist, namelist, threshold):
    # molist is the smiles list
    # namelist is the name list (SURPRISE!) is this some global variable name?
    # threshold is the tanimoto threshold (SURPRISE AGAIN!)
    smilesout=[]
    names=[]
    tans=[]
    exclude=[]

    for i in range(0, len(molist)):
        if i not in exclude:
            temp_exclude = []
            for j in range(i + 1, len(molist)):
                tanimoto = tanimoto_calc(molist[i], molist[j])
                if tanimoto > threshold:
                    temp_exclude.append(j)
            if temp_exclude:
                temp_exclude.append(i)
                exclude.extend(temp_exclude)
            else:
                smilesout.append(molist[i])
                names.append(namelist[i])

    return smilesout, names

一个python函数，根据Tanimoto系数丢弃太相似的分子？

1 个答案: