如何解决Sklearn KNN不适合的错误

时间:2019-06-30 11:59:03

标签: python scikit-learn knn sklearn-pandas

我尝试为某些RMS_features实现一些KNN分类,这些RMS_features是从某些传感器数据中提取的。 标记的传感器数据如下所示:

RMS_x   RMS_y   RMS_z   RMS_euclidian   labels
0.137221994086372451    0.141361458137922474    0.373367693426083891    0.422156809730974525    1
0.653967197231734354    0.523601431745291057    0.857427471986578205    1.19875494747598155 0
0.547301970096429224    0.510460963300706561    0.851980921284600901    1.13401116915058431 1
0.200317415034924756    0.137815296326320835    0.353579753893964288    0.429113930129869203    1
0.802069910360720617    0.752364652538367706    0.909861874144165417    1.42731122797950638 1
0.879041000013726426    0.746218766636731257    0.88728425792715937 1.45493260385191925 1
0.144637160351783728    0.117846411938445361    0.445677862167030925    0.483152607141023704    0
0.142457833655985133    0.0730350196404254831   0.287273765845172724    0.328868613593180703    0
0.0866202724953416131   0.0616184109162635982   0.266749047302988929    0.287149707309732383    1
0.839153663116914195    0.714433206853633651    0.785256227002287477    1.35322615235723642 0
0.112852384316477455    0.113895536346822021    0.298205076872631036    0.338576611298323393    1
1.03867993617356702 0.860906249377046295    0.826493656885982309    1.58212115367273398 1
1.08309298701834544 0.777872116663065438    0.107827834335941439    1.33783492638956725 0
0.269545256634713071    0.173020210546502379    0.396383770058648055    0.509618221610782407    0
2.82554170256769766 2.75559888003772846 2.72907654403846411 4.79842368740352843 0
0.956220220626555983    0.849082605233856036    1.16655931706066363 1.73094165732610805 0
0.393801166109265799    0.283932207763270439    0.591509176401210479    0.765231966661861884    0
0.809556622304495543    0.540659060535479075    0.909773758642383967    1.3324347775296399  0

我提取数据并在其上使用KNN的代码如下:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.exceptions import NotFittedError


def file_read_in(filename):
    df = pd.read_csv(filename, sep='\t', low_memory=False, skiprows=0)  # use seperations character e.g '\t' ';'
    data = df.apply(lambda x: pd.to_numeric(x), axis=0)

    return data


def knn_alg(X_train, y_train, X_test, y_test, N):
    knn = KNeighborsClassifier(n_neighbors=N)
    knn.fit = (X_train, y_train)

    try:
        knn.predict(X_test)
    except NotFittedError as e:
        print(repr(e))

    # print(knn.predict(X_test))


def split_dataset(X, y):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1, stratify=y)
    return X_train, X_test, y_train, y_test


def main():
    filename_labeled = "labeled_session_data/out/labeled.csv"
    filename_unlabeled = "unlabeled_session_data/out/unlabeled.csv"
    # column_nr = 4
    data_labeled = file_read_in(filename_labeled)
    data_unlabeled = file_read_in(filename_unlabeled)

    X = data_labeled.drop(columns=['labels'])
    y = data_labeled['labels'].values

    X_train, X_test, y_train, y_test = split_dataset(X, y)
    n_neighbors = 3

    print("X_train " + "\n" + str(X_train))
    print("X_test " + "\n" + str(X_test))
    print("y_train " + "\n" + str(y_train))
    print("y_test " + "\n" + str(y_test))

    knn_alg(X_train, y_train, X_test, y_test, n_neighbors)


if __name__ == '__main__':
    main()

首先,我将数据从csv文件提取到熊猫数据框。之后,我提取标签并将数据集拆分以进行训练和测试。在最后一步中,我想看看拟合的knn模型是否可以预测我的测试数据集,但是尽管拟合了数据,但是该模型引发了异常:

  

NotFittedError(“此KNeighborsClassifier实例尚未安装。在使用此方法之前,先使用适当的参数调用'fit'。”,)

我是否以错误的方式拟合数据?感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

您似乎不适合KNeighborsClassifier(例如,看一下Scikit-learn website上的示例)。

尝试一下:

def knn_alg(X_train, y_train, X_test, y_test, N):
    knn = KNeighborsClassifier(n_neighbors=N)
    knn.fit(X_train, y_train)

    try:
        knn.predict(X_test)
    except NotFittedError as e:
        print(repr(e))