我对熊猫遮罩的子集执行了一项任务:
pdxy = pd.DataFrame(data,columns=['X','Y','C','CC'])
mask = pdxy[:]['Y']==8
print("pdxy[mask]")
print(pdxy[mask][:10])
pdxy[mask]
X Y C CC
17 17 8 0 0
18 18 8 0 0
48 48 8 0 0
56 56 8 0 0
63 63 8 0 0
66 66 8 0 0
73 73 8 0 0
87 87 8 0 0
103 103 8 0 0
116 116 8 0 0
kmeans = KMeans(n_clusters=5,random_state=0).fit(pdxy[mask]['X','Y'])
之后,我想将结果(集群和聚类中心)关联到pandas数据框中的列:
pdxy.loc[mask]['C'] = np.array(kmeans.labels_)
pdxy.loc[mask]['CC'] = np.array(kmeans.cluster_centers_[kmeans.labels_])[:,0]
不幸的是,DataFrame未被修改,即与分配之前一样:
print("pdxy[mask] labeled")
print(pdxy[mask][:10])
pdxy[mask] labeled
X Y C CC
17 17 8 0 0
18 18 8 0 0
48 48 8 0 0
56 56 8 0 0
63 63 8 0 0
66 66 8 0 0
73 73 8 0 0
87 87 8 0 0
103 103 8 0 0
116 116 8 0 0
我该怎么办?
答案 0 :(得分:2)
使用.loc访问行+列是用逗号完成的,例如[row,col]而不是[row] [col]
尝试一下:
import numpy as np
import pandas as pd
pdxy = pd.DataFrame(data, columns=['X', 'Y', 'C', 'CC'])
mask = pdxy[:]['Y'] == 8
kmeans = KMeans(n_clusters=5,random_state=0).fit(pdxy[mask]['X','Y'])
pdxy.loc[mask, 'C'] = np.array(kmeans.labels_)
pdxy.loc[mask, 'CC'] = np.array(kmeans.cluster_centers_[kmeans.labels_])[:,0]
print("pdxy[mask] labeled")
print(pdxy[mask][:10])