我正在尝试使用此代码提取第二个和第三个K最近邻居。当它们存在时,我能够得到它们。当它们不存在时,我会收到类似以下错误:IndexError:索引3超出了轴3的大小3的范围。
import numpy as np
from sklearn.neighbors import NearestNeighbors
import pandas as pd
def nn(x):
nbrs = NearestNeighbors(
n_neighbors=3,
algorithm='auto',
metric='euclidean'
).fit(x)
distances, indices = nbrs.kneighbors(x)
return distances, indices
df = pd.DataFrame({'time': updated_df['upd_time_code'], 'x': updated_df['x'], 'y': updated_df['y'], 'id': updated_df['id']})
#This has the index of the nearest neighbor in the group, as well as the distance
nns = df.drop('id', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))
groups = df.groupby('time')
nn_rows = []
for i, nn_set in enumerate(nns):
group = groups.get_group(i)
print("processing group at: ", group.time)
for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
nn_rows.append({'time': i,
'id': group.iloc[j]['id'],
'nearest_neighbour1': group.iloc[tup[1][1]]['id'],
'nearest_neighbour2': group.iloc[tup[1][2]]['id'],
'nearest_neighbour3': group.iloc[tup[1][3]]['id']
'euclidean_distance1': tup[0][1],
'euclidean_distance2': tup[0][2],
'euclidean_distance3': tup[0][2]})
nn_df = pd.DataFrame(nn_rows).set_index('time')
nn_df
我该如何处理有时没有邻居而有时却没有邻居的问题,可以通过调整此代码来忽略它?
答案 0 :(得分:1)
您正在访问第四最近的邻居。
这是代码中的经典数组索引错误,需要修复。