我在pandas DataFrame中有数据,如下所示: - 朋友'列中的名称 - 朋友'债务清偿证明书'名 - 我朋友之间的距离和他们的确认(在多个地方)
Friend Acquaintance Distance Acq.Country
0 Lennon Martin 25 England
1 Lennon McCartney 10 England
2 Lennon McCartney 60 Scotland
3 Lennon Harrison 200 India
4 Lennon Starr 40 England
5 Lennon Ono 350 Japan
7 McCartney Eastman 110 United States
8 Harrison Lennon 200 England
8 Harrison McCartney 220 England
9 Harrison Starr 222 England
我希望能够重新格式化数据,以便我有一个平均距离矩阵,包含朋友行和熟人列。该方法基本上是强力计算。关于如何使用更高效的代码的任何建议?
vectorR=data['Friends'].unique() #takes list of friends
vectorC=data['Acquaint'].unique() #list of acquaintances
distance_matrix=np.zeros((len(vectorR),len(vectorC)))
for i in range(0,len(vectorX)):
for j in range(0,len(vectorY)):
inter=(data['Person']==vectorR[i]) & (data['MatchName']==vectorC[j])
distance_avg=sum(data['Distance'][inter])/len(data['Distance'][inter])
distance_matrix[(i,j)]=distance_avg
答案 0 :(得分:4)
这听起来像是pivot_table
的工作:
In [11]: df.pivot_table(index='Friend', columns='Acquaintance', values='Distance')
Out[11]:
Acquaintance Eastman Harrison Lennon Martin McCartney Ono Starr
Friend
Harrison NaN NaN 200 NaN 220 NaN 222
Lennon NaN 200 NaN 25 35 350 40
McCartney 110 NaN NaN NaN NaN NaN NaN
注意:默认agg_func
为np.mean
,这是您想要的 - 但您可以将其设置为不同的内容,例如'总和'。