是否存在一种方便的方式来合并关于行之间距离的两个数据框?对于以下示例,我想从最近的df2行中获取df1行的颜色。距离应计算为((x1-x2)**0.5+(y1-y2)**0.5)**0.5
。
import pandas as pd
df1 = pd.DataFrame({'x': [50,16,72,61,95,47],'y': [14,22,11,45,58,56],'size':[1,4,3,7,6,5]})
df2 = pd.DataFrame({'x': [10,21,64,31,25,55],'y': [54,76,68,24,34,19],'color':['red','green','blue','white','brown','black']})
答案 0 :(得分:7)
# function to compare one row of df1 with every row of df2
# note the use of abs() here, square root of negative numbers would be complex number,
# so the result of the computation would be NaN. abs() helps to avoids that
def compare(x, y):
df2['distance'] = (abs(x-df2['x'])**0.5 + abs(y-df2['y'])**0.5)**0.5
return df2.loc[df2['distance'].idxmin()]['color']
df1['color'] = df1.apply(lambda row: compare(row['x'], row['y']), axis=1)
print(df1)
x y size color
0 50 14 1 black
1 16 22 4 white
2 72 11 3 black
3 61 45 7 blue
4 95 58 6 blue
5 47 56 5 red
答案 1 :(得分:6)
numpy
广播中的内容
df1['color']=df2.color.iloc[np.argmin(np.sum(np.abs(df1[['x','y']].values-df2[['x','y']].values[:,None])**0.5,2),0)].values
df1
Out[79]:
x y size color
0 50 14 1 black
1 16 22 4 white
2 72 11 3 black
3 61 45 7 blue
4 95 58 6 blue
5 47 56 5 red