将值映射到单独的列 - 熊猫

时间:2021-01-08 02:27:45

标签: python pandas

我正在尝试将值从一列映射到一个单独的列。使用下面的 calculate_distances 函数测量每个点到每个 Group 的最近点的距离。我也返回每个点的索引值进行识别。

这一切正常。但不是索引值,我希望将函数内相应的 ID 值映射到输出。

如果我不映射 ID 值,两个 nearest_object 列都将显示索引值,而不是实际的 ID 值。

我会注释掉我显示输出的尝试。

from sklearn.neighbors import BallTree
import pandas as pd

df = pd.DataFrame({              
    'Time' : [1,1,1,1,1,1,2,2,2,2,2,2],             
    'ID' : ['A','B','C','X','U','V','A','B','C','X','U','V'],      
    'Group' : ['Red','Red','Red','Grn','Grn','Grn','Red','Red','Red','Grn','Grn','Grn'],           
    'X' : [2.0,3.0,4.0,2.0,2.0,1.0,1.0,6.0,4.0,2.0,5.0,3.0],
    'Y' : [3.0,1.0,0.0,0.0,2.0,1.0,2.0,0.0,1.0,1.0,0.0,0.0],           
    })

def calculate_distances(df, group_column='Group'):    

    '''
    Calculate distance and id to both red and green groups.
    '''
    # unq groups
    groups = df[group_column].unique()

    all_points = df[['X','Y']].values

    for group in groups:
        group_points = df[df[group_column] == group][['X','Y']]
    
        # calculate distance between points
        tree = BallTree(group_points, leaf_size=15, metric='minkowski')

        distance, index = tree.query(all_points, k=1)
        distances = distance[:,0]
        nearest_id = group_points.index[index[:,0]]
                    
        distance_column_name = "distance_{}".format( group )
        df[ distance_column_name ] = distances
    
        distance_column_nearest_name = "nearest_object_{}".format( group )
        df[distance_column_nearest_name] = nearest_id   

    # map ID values
    #df.iloc[:,-3] = df.iloc[:,-3].map(df.set_index('index')['ID']) 
    #df.iloc[:,-1] = df.iloc[:,-1].map(df.set_index('index')['ID'])       

    return df

df = df.groupby(['Time']).apply(calculate_distances).reset_index()

出:

    Time ID Group    X    Y  distance_Red  nearest_object_Red  distance_Grn  nearest_object_Grn
0      1  A   Red  2.0  3.0      0.000000                   0      1.000000                   4
1      1  B   Red  3.0  1.0      0.000000                   1      1.414214                   3
2      1  C   Red  4.0  0.0      0.000000                   2      2.000000                   3
3      1  X   Grn  2.0  0.0      1.414214                   1      0.000000                   3
4      1  U   Grn  2.0  2.0      1.000000                   0      0.000000                   4
5      1  V   Grn  1.0  1.0      2.000000                   1      0.000000                   5
6      2  A   Red  1.0  2.0      0.000000                   6      1.414214                   9
7      2  B   Red  6.0  0.0      0.000000                   7      1.000000                  10
8      2  C   Red  4.0  1.0      0.000000                   8      1.414214                  10
9      2  X   Grn  2.0  1.0      1.414214                   6      0.000000                   9
10     2  U   Grn  5.0  0.0      1.000000                   7      0.000000                  10
11     2  V   Grn  3.0  0.0      1.414214                   8      0.000000                  11

预期输出:

    Time ID Group    X    Y  distance_Red nearest_object_Red  distance_Grn nearest_object_Grn
0      1  A   Red  2.0  3.0      0.000000                  A      1.000000                  U
1      1  B   Red  3.0  1.0      0.000000                  B      1.414214                  X
2      1  C   Red  4.0  0.0      0.000000                  C      2.000000                  X
3      1  X   Grn  2.0  0.0      1.414214                  B      0.000000                  X
4      1  U   Grn  2.0  2.0      1.000000                  A      0.000000                  U
5      1  V   Grn  1.0  1.0      2.000000                  B      0.000000                  V
6      2  A   Red  1.0  2.0      0.000000                  A      1.414214                  X
7      2  B   Red  6.0  0.0      0.000000                  B      1.000000                  U
8      2  C   Red  4.0  1.0      0.000000                  C      1.414214                  U
9      2  X   Grn  2.0  1.0      1.414214                  A      0.000000                  X
10     2  U   Grn  5.0  0.0      1.000000                  B      0.000000                  U
11     2  V   Grn  3.0  0.0      1.414214                  C      0.000000                  V

1 个答案:

答案 0 :(得分:1)

因为索引被命名为 Time 而数据框已经有一个同名的列。当您执行 reset_index 时,pandas 尝试使索引成为普通列,在这种情况下由于名称重复而失败。试试:

df = df.groupby(['Time']).apply(calculate_distances).reset_index(drop=True)