Question

我有以下DataFrame：

    Activity    SMILES
0   1.0         CCN1CCCC1CNC(=O)c1cc([N+](=O)[O-])cc(O)c1OC
1   1.0         O=c1cc(-c2cccs2)oc2ccc(OCCCCCCN3CCC(O)CC3)cc12
2   1.0         CCCCCCCCCC(=O)N1c2ccc(Cl)cc2N=C(N2CCN(C)CC2)c2...
3   1.0         CCN1C(=O)c2ccccc2S(=O)c2ccc(C(=O)NCc3ccc4c(c3)...
4   1.0         CCN1CCc2cc(OCCF)cc3c2C1Cc1cccc(O)c1-3
    ...         ...

，我想获得以下输出：

    Activity    SMILES                                          cluster cluster set
0   1.0     CCN1CCCC1CNC(=O)c1cc([N+](=O)[O-])cc(O)c1OC         0.0     val
1   1.0     O=c1cc(-c2cccs2)oc2ccc(OCCCCCCN3CCC(O)CC3)cc12      898.0   test
2   1.0     CCCCCCCCCC(=O)N1c2ccc(Cl)cc2N=C(N2CCN(C)CC2)c2...   7.0     val
3   1.0     CCN1C(=O)c2ccccc2S(=O)c2ccc(C(=O)NCc3ccc4c(c3)...   4.0     train
5   1.0     FC(F)(F)c1cccc(N2CCN(Cc3cn(-c4ccccc4)c(-c4cccc...   856.0   val
    ...     ...                                                 ...     ...

我有三个元组列表（train_points，test_points和val_points），如下所示：

[(4633, 0),
 (3935, 3907),
 (1410, 1409),
 (1120, 1121, 3, 3771),
 ...]

我尝试实现以下循环序列：

#Remove irrelevant information from the DataFrame
df_triplets = df[['Activity','SMILES']]

# Add clustering information
list_points = [train_points, test_points, val_points]
name_points = ['train','test','val']

# this loop should be working but it doesn't work
for name, points in zip(name_points,list_points):
    for num, cluster in enumerate(points):
        for molecule in cluster:
            df_triplets.loc[molecule,'cluster'] = num
            df_triplets.loc[molecule, 'cluster set'] = name

但是，这仅为df_triplets提供了最后一个群集集（val_points），而不是我感兴趣的三个集。请注意，每个“分子”都是唯一的编号。

嵌套循环仅在Python中提供最后一个循环

0 个答案: