this is the sample dataframe to be fit
from sklearn.neighbors import NearestNeighbors
neigh = NearestNeighbors(3,.4)
neigh.fit(df)
neighbor_index = neigh.kneighbors([[1.3,4.5,2.5]],return_distance=False)
print(neighbor_index)
output: here is my 3 nearest neighbors index--> array([[0, 1, 3]], dtype=int64)
I want the actual index in the dataframe like array([[a,b,d]]) how can I get this ??
答案 0 :(得分:0)
这很容易实现。你只需要一些pandas索引魔法。
这样做:
from sklearn.neighbors import NearestNeighbors
import pandas as pd
#load the data
df = pd.read_csv('data.csv')
print(df)
#build the model and fit it
neigh = NearestNeighbors(3,.4)
neigh.fit(df)
#get the index
neighbor_index = neigh.kneighbors([[1.3,4.5,2.5]],return_distance=False)
print(neighbor_index)
#get the row index (the row names) of the dataframe
names = list(df.index[neighbor_index])
print(names)
结果:
0 1 2
a 1 2 3
b 3 4 5
c 5 2 3
d 4 3 5
[[0 1 3]]
[array(['a', 'b', 'd'], dtype=object)]
答案 1 :(得分:-1)
See the pandas documentation here about using numeric indices with a pandas DataFrame.
Below is an example recreating the dataframe in your question. The .iloc
function will return rows in a dataframe based on their numeric index. You can retrieve the rows by their numeric index to get the index as it appears in the dataframe.
df = pd.DataFrame([[1, 2, 3], [3, 4, 5], [5, 3, 2], [4, 3, 5]], index=['a', 'b', 'c', 'd'])
df.iloc[[0, 1, 3]].index
which returns ['a', 'b', 'd']