相对于函数输出合并熊猫数据框

时间:2020-08-29 12:44:47

标签: python pandas dataframe

是否存在一种方便的方式来合并关于行之间距离的两个数据框?对于以下示例,我想从最近的df2行中获取df1行的颜色。距离应计算为((x1-x2)**0.5+(y1-y2)**0.5)**0.5

import pandas as pd

df1 = pd.DataFrame({'x': [50,16,72,61,95,47],'y': [14,22,11,45,58,56],'size':[1,4,3,7,6,5]})
df2 = pd.DataFrame({'x': [10,21,64,31,25,55],'y': [54,76,68,24,34,19],'color':['red','green','blue','white','brown','black']})

2 个答案:

答案 0 :(得分:7)

# function to compare one row of df1 with every row of df2
# note the use of abs() here, square root of negative numbers would be complex number, 
# so the result of the computation would be NaN. abs() helps to avoids that
def compare(x, y):
    df2['distance'] = (abs(x-df2['x'])**0.5 + abs(y-df2['y'])**0.5)**0.5
    return df2.loc[df2['distance'].idxmin()]['color']

df1['color'] = df1.apply(lambda row: compare(row['x'], row['y']), axis=1)
print(df1)

    x   y  size  color
0  50  14     1  black
1  16  22     4  white
2  72  11     3  black
3  61  45     7   blue
4  95  58     6   blue
5  47  56     5    red

答案 1 :(得分:6)

numpy广播中的内容

df1['color']=df2.color.iloc[np.argmin(np.sum(np.abs(df1[['x','y']].values-df2[['x','y']].values[:,None])**0.5,2),0)].values
df1
Out[79]: 
    x   y  size  color
0  50  14     1  black
1  16  22     4  white
2  72  11     3  black
3  61  45     7   blue
4  95  58     6   blue
5  47  56     5    red