我有一个名为origA
的数据框:
X, Y
10, 20
11, 2
9, 35
8, 7
另一个名为calcB
的人:
Xc, Yc
1, 7
9, 22
我想检查Xc, Yc
中每个calcB
对是否在X,Y
中有一个origA
对与Xc, Yc
有欧氏距离的对小于delta
,如果是,则将True
放在相应行的Detected
中新列origA
上。
答案 0 :(得分:1)
您可以使用scipy
import scipy
delta=5
ary = scipy.spatial.distance.cdist(dfa, dfb, metric='euclidean')
ary
Out[189]:
array([[15.8113883 , 2.23606798],
[11.18033989, 20.09975124],
[29.12043956, 13. ],
[ 7. , 15.03329638]])
dfa['detected']=(ary<delta).any(1)
dfa
Out[191]:
X Y detected
0 10 20 False
1 11 2 True
2 9 35 True
3 8 7 False
答案 1 :(得分:1)
@ Wen-Ben的解决方案可能适用于小型数据集。但是,当您尝试计算许多点的距离时,您会很快遇到性能问题。因此,已经有很多智能算法可以减少所需距离的计算,其中之一就是BallTree(由scikit-learn提供):
from sklearn.neighbors import BallTree
# Prepare the data and the search radius:
origA = pd.DataFrame()
origA['X'] = [10, 11, 9, 8]
origA['Y'] = [20, 2, 35, 7]
calcB = pd.DataFrame()
calcB['Xc'] = [1, 9]
calcB['Yc'] = [7, 22]
delta = 5
# Stack the coordinates together:
pointsA = np.column_stack([origA.X, origA.Y])
pointsB = np.column_stack([calcB.Xc, calcB.Yc])
# Create the Ball Tree and search for close points:
tree = BallTree(pointsB)
detected = tree.query_radius(pointsA, r=delta, count_only=True)
# Add results as additional column:
origA['Detected'] = detected.astype(bool)
输出
X Y Detected
0 10 20 True
1 11 2 False
2 9 35 False
3 8 7 False