我有一个数据集,其中points
的维数可变,坐标均为原始中心和相应中心:
point | c_1 | c_2 | ... | c_n | center_1 | center_2 | ... | center_n
--------------------------------------------------------------------
p_1 | 0.1 | 0.3 | ... | 0.5 | 1.2 | 1.1 | ... | 0.7
p_2 | 1.0 | 1.5 | ... | 1.7 | 3.1 | 2.0 | ... | 1.3
p_3 | 0.5 | 0.8 | ... | 1.0 | 2.0 | 1.2 | ... | 3.8
... | ... | ... | ... | ... | ... | ... | ... | ...
现在,我需要计算每个点到其中心的Euclidean
距离。
例如,具有三个点的简化3D数据集将看起来像:
point | c_1 | c_2 | c_3 | center_1 | center_2 | center_3 | distance
-------------------------------------------------------------------
p_1 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.732
p_2 | 1.0 | 1.0 | 1.0 | 3.0 | 3.0 | 3.0 | 3.464
p_3 | 0.5 | 0.5 | 0.5 | 2.0 | 2.0 | 2.0 | 2.598
我可以在一维上执行以下操作:
import pandas as pd
import numpy as np
points = pd.DataFrame({
"point": ("p_1", "p_2", "p_3"),
"c_1": (0.0, 1.0, 0.5),
"c_2": (0.0, 1.0, 0.5),
"c_3": (0.0, 1.0, 0.5),
"center_1": (1.0, 3.0, 2.0),
"center_2": (1.0, 3.0, 2.0),
"center_3": (1.0, 3.0, 2.0)
})
points['distance'] = points.apply(lambda row:
np.linalg.norm(row['c_1']-row['center_1']), axis=1)
但是如何在给出范围的可变尺寸(例如10)上更好地做到这一点?
答案 0 :(得分:1)
IIUC
from scipy.spatial import distance
a=distance.cdist(df[['c_1','c_2','c_2']].values, df[['center_1','center_2','center_3']].values)
a[np.arange(len(a)),np.arange(len(a))]
Out[249]: array([1.73205081, 3.46410162, 2.59807621])