我正在尝试将值从一列映射到一个单独的列。使用下面的 calculate_distances
函数测量每个点到每个 Group
的最近点的距离。我也返回每个点的索引值进行识别。
这一切正常。但不是索引值,我希望将函数内相应的 ID
值映射到输出。
如果我不映射 ID 值,两个 nearest_object
列都将显示索引值,而不是实际的 ID
值。
我会注释掉我显示输出的尝试。
from sklearn.neighbors import BallTree
import pandas as pd
df = pd.DataFrame({
'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],
'ID' : ['A','B','C','X','U','V','A','B','C','X','U','V'],
'Group' : ['Red','Red','Red','Grn','Grn','Grn','Red','Red','Red','Grn','Grn','Grn'],
'X' : [2.0,3.0,4.0,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,3.0],
'Y' : [3.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0],
})
def calculate_distances(df, group_column='Group'):
'''
Calculate distance and id to both red and green groups.
'''
# unq groups
groups = df[group_column].unique()
all_points = df[['X','Y']].values
for group in groups:
group_points = df[df[group_column] == group][['X','Y']]
# calculate distance between points
tree = BallTree(group_points, leaf_size=15, metric='minkowski')
distance, index = tree.query(all_points, k=1)
distances = distance[:,0]
nearest_id = group_points.index[index[:,0]]
distance_column_name = "distance_{}".format( group )
df[ distance_column_name ] = distances
distance_column_nearest_name = "nearest_object_{}".format( group )
df[distance_column_nearest_name] = nearest_id
# map ID values
#df.iloc[:,-3] = df.iloc[:,-3].map(df.set_index('index')['ID'])
#df.iloc[:,-1] = df.iloc[:,-1].map(df.set_index('index')['ID'])
return df
df = df.groupby(['Time']).apply(calculate_distances).reset_index()
出:
Time ID Group X Y distance_Red nearest_object_Red distance_Grn nearest_object_Grn
0 1 A Red 2.0 3.0 0.000000 0 1.000000 4
1 1 B Red 3.0 1.0 0.000000 1 1.414214 3
2 1 C Red 4.0 0.0 0.000000 2 2.000000 3
3 1 X Grn 2.0 0.0 1.414214 1 0.000000 3
4 1 U Grn 2.0 2.0 1.000000 0 0.000000 4
5 1 V Grn 1.0 1.0 2.000000 1 0.000000 5
6 2 A Red 1.0 2.0 0.000000 6 1.414214 9
7 2 B Red 6.0 0.0 0.000000 7 1.000000 10
8 2 C Red 4.0 1.0 0.000000 8 1.414214 10
9 2 X Grn 2.0 1.0 1.414214 6 0.000000 9
10 2 U Grn 5.0 0.0 1.000000 7 0.000000 10
11 2 V Grn 3.0 0.0 1.414214 8 0.000000 11
预期输出:
Time ID Group X Y distance_Red nearest_object_Red distance_Grn nearest_object_Grn
0 1 A Red 2.0 3.0 0.000000 A 1.000000 U
1 1 B Red 3.0 1.0 0.000000 B 1.414214 X
2 1 C Red 4.0 0.0 0.000000 C 2.000000 X
3 1 X Grn 2.0 0.0 1.414214 B 0.000000 X
4 1 U Grn 2.0 2.0 1.000000 A 0.000000 U
5 1 V Grn 1.0 1.0 2.000000 B 0.000000 V
6 2 A Red 1.0 2.0 0.000000 A 1.414214 X
7 2 B Red 6.0 0.0 0.000000 B 1.000000 U
8 2 C Red 4.0 1.0 0.000000 C 1.414214 U
9 2 X Grn 2.0 1.0 1.414214 A 0.000000 X
10 2 U Grn 5.0 0.0 1.000000 B 0.000000 U
11 2 V Grn 3.0 0.0 1.414214 C 0.000000 V
答案 0 :(得分:1)
因为索引被命名为 Time
而数据框已经有一个同名的列。当您执行 reset_index
时,pandas 尝试使索引成为普通列,在这种情况下由于名称重复而失败。试试:
df = df.groupby(['Time']).apply(calculate_distances).reset_index(drop=True)