我所有的机器学习模型都获得了100%的准确性。我的模型怎么了

时间:2020-02-25 19:11:21

标签: python database scikit-learn

我正在处理一个包含5个手工字母的数据集。我已经在Kaggle上上传了数据库,如果有人想看看它,请这样做。

https://www.kaggle.com/shayanriyaz/gesture-recognition

目前,我已经训练和测试了多个模型,但是我一直保持100%的准确性。

这是我的代码。

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
# importing alll the necessary packages to use the various classification algorithms
from sklearn.linear_model import LogisticRegression  # for Logistic Regression algorithm
from sklearn.model_selection import train_test_split #to split the dataset for training and testing
from sklearn.neighbors import KNeighborsClassifier  # for K nearest neighbours
from sklearn import svm  #for Support Vector Machine (SVM) Algorithm
from sklearn import metrics #for checking the model accuracy
from sklearn.tree import DecisionTreeClassifier #for using Decision Tree Algoithm
from mpl_toolkits.mplot3d import Axes3D
import os # accessing directory structure

from subprocess import check_output

df = df.drop(['Id','Time', 'Wrist_Pitch','Wrist_Roll'],axis = 1)
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

nRowsRead = None 

df = pd.read_csv('/kaggle/input/ASL_DATA.csv', delimiter=',', nrows = nRowsRead)

df.dataframeName = 'ASL_DATA.csv'
nRow, nCol = df.shape

print(f'There are {nRow} rows and {nCol} columns')

plt.figure(figsize=(30,20)) 
sns.heatmap(df.corr(),annot=True,cmap='cubehelix_r') #draws  heatmap with input as the correlation matrix calculted by(iris.corr())
plt.show()

train, test = train_test_split(df, test_size = 0.2)# in this our main data is split into train and test
# the attribute test_size=0.3 splits the data into 70% and 30% ratio. train=70% and test=30%
print(train.shape)
print(test.shape)

train_X = train[['Thumb_Pitch','Thumb_Roll','Index_Pitch','Index_Roll','Middle_Pitch','Middle_Roll','Ring_Pitch','Ring_Roll','Pinky_Pitch','Pinky_Roll']]# taking the training data features
train_y=train.Letter# output of our training data
test_X= test[['Thumb_Pitch','Thumb_Roll','Index_Pitch','Index_Roll','Middle_Pitch','Middle_Roll','Ring_Pitch','Ring_Roll','Pinky_Pitch','Pinky_Roll']] # taking test data features
test_y =test.Letter   #output value of test data

from sklearn import preprocessing
mm_scaler = preprocessing.RobustScaler()
train_X = mm_scaler.fit_transform(train_X)
test_X = mm_scaler.transform(test_X)


model=DecisionTreeClassifier()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y))


model=DecisionTreeClassifier()
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the Decision Tree is',metrics.accuracy_score(prediction,test_y))

model=KNeighborsClassifier(n_neighbors=) #this examines 3 neighbours for putting the new data into a class
model.fit(train_X,train_y)
prediction=model.predict(test_X)
print('The accuracy of the KNN is',metrics.accuracy_score(prediction,test_y))

1 个答案:

答案 0 :(得分:1)

您的模型没有错,这只是模型要解决的一个小问题。当考虑所有功能时,这些字母看起来完全不同。如果您选择了所有字母或看上去都相同的字母,则可能会 看到错误。

仅使用index_pitch和index_roll重新运行模型。您仍将获得95%的AUC。至少通过这样做,您可以猜测到唯一的损失来自B,D和K,如果仅看一下食指,它们通过观察看起来像的图像就是仅有的3个可能造成混淆的3个损失。事实确实如此。

鉴于您的数据集实际上是可解决的,这只是一个问题

相关问题