how to get the actual index of my dataframe row while getting topk nearest neighbors?

时间:2017-08-04 12:42:12

标签: python pandas numpy scikit-learn

this is the sample dataframe to be fit

from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(3,.4)
neigh.fit(df)
neighbor_index = neigh.kneighbors([[1.3,4.5,2.5]],return_distance=False)
print(neighbor_index)

output: here is my 3 nearest neighbors index--> array([[0, 1, 3]], dtype=int64)

I want the actual index in the dataframe like array([[a,b,d]]) how can I get this ??

2 个答案:

答案 0 :(得分:0)

这很容易实现。你只需要一些pandas索引魔法。

这样做:

from sklearn.neighbors import NearestNeighbors
import pandas as pd

#load the data
df = pd.read_csv('data.csv')
print(df)

#build the model and fit it
neigh = NearestNeighbors(3,.4)
neigh.fit(df)

#get the index
neighbor_index = neigh.kneighbors([[1.3,4.5,2.5]],return_distance=False)
print(neighbor_index)

#get the row index (the row names) of the dataframe
names = list(df.index[neighbor_index])
print(names)

结果:

   0  1  2
a  1  2  3
b  3  4  5
c  5  2  3
d  4  3  5

[[0 1 3]]

[array(['a', 'b', 'd'], dtype=object)]

答案 1 :(得分:-1)

See the pandas documentation here about using numeric indices with a pandas DataFrame.

Below is an example recreating the dataframe in your question. The .iloc function will return rows in a dataframe based on their numeric index. You can retrieve the rows by their numeric index to get the index as it appears in the dataframe.

df = pd.DataFrame([[1, 2, 3], [3, 4, 5], [5, 3, 2], [4, 3, 5]], index=['a', 'b', 'c', 'd'])
df.iloc[[0, 1, 3]].index

which returns ['a', 'b', 'd']