Question

我正在尝试使用此代码提取第二个和第三个K最近邻居。当它们存在时，我能够得到它们。当它们不存在时，我会收到类似以下错误：IndexError：索引3超出了轴3的大小3的范围。

import numpy as np 
from sklearn.neighbors import NearestNeighbors
import pandas as pd

def nn(x):
    nbrs = NearestNeighbors(
        n_neighbors=3, 
        algorithm='auto', 
        metric='euclidean'
    ).fit(x)
    distances, indices = nbrs.kneighbors(x)
    return distances, indices

df = pd.DataFrame({'time': updated_df['upd_time_code'], 'x': updated_df['x'], 'y': updated_df['y'], 'id': updated_df['id']})

#This has the index of the nearest neighbor in the group, as well as the distance

nns = df.drop('id', 1).groupby('time').apply(lambda x: nn(x.as_matrix()))

groups = df.groupby('time')
nn_rows = []

for i, nn_set in enumerate(nns):
    group = groups.get_group(i)
    print("processing group at: ", group.time)
    for j, tup in enumerate(zip(nn_set[0], nn_set[1])):
        nn_rows.append({'time': i,
                    'id': group.iloc[j]['id'],
                    'nearest_neighbour1': group.iloc[tup[1][1]]['id'],
                    'nearest_neighbour2': group.iloc[tup[1][2]]['id'],
                    'nearest_neighbour3': group.iloc[tup[1][3]]['id']
                    'euclidean_distance1': tup[0][1],
                    'euclidean_distance2': tup[0][2],
                    'euclidean_distance3': tup[0][2]})

nn_df = pd.DataFrame(nn_rows).set_index('time')
nn_df

我该如何处理有时没有邻居而有时却没有邻居的问题，可以通过调整此代码来忽略它？

Answer 1

您正在访问第四最近的邻居。

这是代码中的经典数组索引错误，需要修复。

提取第二和第三邻居时的代码，当第二和第三邻居不存在时将其忽略

1 个答案: