基于calculating average distance of nearest neighbours in pandas dataframe中的代码,如何调整它,以便将第二个和第三个最近的邻居返回到新列中?
(或创建一个可调参数来定义要返回的邻居数):
示例代码:
import numpy as np
from sklearn.neighbors import NearestNeighbors
import pandas as pd
def nn(x):
nbrs = NearestNeighbors(
n_neighbors=2,
algorithm='auto',
metric='euclidean'
).fit(x)
distances, indices = nbrs.kneighbors(x)
return distances, indices
time = [0, 0, 0, 1, 1, 2, 2]
x = [216, 218, 217, 280, 290, 130, 132]
y = [13, 12, 12, 110, 109, 3, 56]
car = [1, 2, 3, 1, 3, 4, 5]
df = pd.DataFrame({'time': time, 'x': x, 'y': y, 'car': car})
#This has the index of the nearest neighbor in the group, as well as the distance
nns = df.drop('car', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))
groups = df.groupby('time')
nn_rows = []
for i, nn_set in enumerate(nns):
group = groups.get_group(i)
for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
nn_rows.append({'time': i,
'car': group.iloc[j]['car'],
'nearest_neighbour': group.iloc[tup[1][1]]['car'],
'euclidean_distance': tup[0][1]})
nn_df = pd.DataFrame(nn_rows).set_index('time')
结果数据框:
>>> nn_df
time car euclidean_distance nearest_neighbour
0 1 1.414214 3
0 2 1.000000 3
0 3 1.000000 2
1 1 10.049876 3
1 3 10.049876 1
2 4 53.037722 5
2 5 53.037722 4
如何获取NEAREST NEIGHBOR 2、3和N的输出并将其插入新列?
答案 0 :(得分:1)
这是NearestNeighbors
方法的文档。
我认为可以使用n_neighbors
参数解决您的问题。该参数指定要返回的最近邻居数的indices and distances
。
当我们旨在查找点本身以外的单个最近邻居时,通常使用的值是 2 。最接近的邻居总是自身,因为距离为0。
要查找第二个和第三个最近的邻居,应将n_neighbors
设置为4。这将返回该点本身,然后是下一个N-1个最近的邻居
# Argument
n_neighbor = 4
# Indices
[point_itself, neighbor_1, neighbor_2, neighbor_3]
# Distances
[ 0, distance_1, distance_2, distance_3]