Question

我有数据框：

 sepallength sepalwidth petallength petalwidth        class   cluster
0         5.1        3.5         1.4        0.2  Iris-setosa  cluster1
1         4.9          3         1.4        0.2  Iris-setosa  cluster1
2         4.7        3.2         1.3        0.2  Iris-setosa  cluster1
3         4.6        3.1         1.5        0.2  Iris-setosa  cluster1
4           5        3.6         1.4        0.2  Iris-setosa  cluster1
5         5.4        3.9         1.7        0.4  Iris-setosa  cluster1
6         4.6        3.4         1.4        0.3  Iris-setosa  cluster1
7           5        3.4         1.5        0.2  Iris-setosa  cluster1
8         4.4        2.9         1.4        0.2  Iris-setosa  cluster1
9         4.9        3.1         1.5        0.1  Iris-setosa  cluster1

和字典：

{'cluster2': 'Iris-virginica', 'cluster0': 'Iris-versicolor', 'cluster1': 'Iris-setosa'}

我需要添加另一个列并使用此df [＆＃39; cluster＆＃39;] == key

字典中的值填充它

我尝试过使用np.where

def countTruth(df):
    # dictionary mapping cluster to most frequent class

    clustersClass = df.groupby(['cluster'])['class'].agg(lambda x:x.value_counts().index[0]).to_dict()
    for eachKey in clustersClass:
        newv = clustersClass[eachKey]
        print df
        df['new'] = np.where(df['cluster']==eachKey , newv)

崩溃说应该给出x和y两者或两者都不应该

我的最终目标是根据群集和类标签计算真正的正面，真正的负面因素，FP和FN。这是迈向...的一步。

Answer 1

致电map并传递字典：

In [326]:

d={'cluster2': 'Iris-virginica', 'cluster0': 'Iris-versicolor', 'cluster1': 'Iris-setosa'}
df['key'] = df['cluster'].map(d)
df
Out[326]:
   sepallength  sepalwidth  petallength  petalwidth        class   cluster  \
0          5.1         3.5          1.4         0.2  Iris-setosa  cluster1   
1          4.9         3.0          1.4         0.2  Iris-setosa  cluster1   
2          4.7         3.2          1.3         0.2  Iris-setosa  cluster1   
3          4.6         3.1          1.5         0.2  Iris-setosa  cluster1   
4          5.0         3.6          1.4         0.2  Iris-setosa  cluster1   
5          5.4         3.9          1.7         0.4  Iris-setosa  cluster1   
6          4.6         3.4          1.4         0.3  Iris-setosa  cluster1   
7          5.0         3.4          1.5         0.2  Iris-setosa  cluster1   
8          4.4         2.9          1.4         0.2  Iris-setosa  cluster1   
9          4.9         3.1          1.5         0.1  Iris-setosa  cluster1   

           key  
0  Iris-setosa  
1  Iris-setosa  
2  Iris-setosa  
3  Iris-setosa  
4  Iris-setosa  
5  Iris-setosa  
6  Iris-setosa  
7  Iris-setosa  
8  Iris-setosa  
9  Iris-setosa

创建新列并放置条件值pandas数据帧

1 个答案: