我在完成代码/训练模型时遇到问题,评估结果

时间:2021-02-07 07:14:18

标签: python pandas matplotlib

我用什么来实现两个数据集和预测 预处理数据 对包含文本的列进行编码、规范化数字列、可视化数据。 形成问题陈述:选择模型、训练模型、评估结果。 www.kaggle.com/alfrednieves/notebooke0d8d2a5e2/edit https://www.kaggle.com/questions-and-answers/217299

我需要解决方案的帮助

import pandas as pd
import matplotlib.pyplot as plt

data2013=pd.read_csv('laser_incidents_2013.csv')
data2014=pd.read_csv('laser_incidents_2014.csv')
data2013.head()
data2014.head()
data2013.all
data2014.all

data2013.columns
data2014.columns

data2013=data2013.rename(columns={'DATE':'D1','Time_(UTC)':'T1','Aircraft ID':'AI1','No. A/C':'N1',
            'ALT':'A1','MAJOR CITY':'MC1', 'COLOR':'C1', 'Injury Reported':'IR1',`enter code here`'CITY':'C1', 'STATE':'S1'})

data2014=data2014.rename(columns={'DATE':'D2','Time_(UTC)':'T2','Aircraft ID':'AI2','No. A/C':'N2',
           'ALT':'A2','MAJOR CITY':'MC2', 'COLOR':'C2','Injury Reported':'IR2','CITY':'C2','STATE':'S2'})

#Lets first import the preprocessing module
from sklearn import preprocessing
#Now let's form a label Encoder model
le = preprocessing.LabelEncoder()
#Now we use feed the label column to the model
le.fit(data2013['S1'])
le.fit(data2014['S2'])

#Model will go through column and find the unique labels (Number of classes that are there)
#Following line will print the labels found in the column
list(le.classes_)

#Following line will convert the labels to an array of numbers
le.transform(data['S1'])
#Following line will convert the labels to an array of numbers
le.transform(data['S2'])
data2013.head()
data2014.head()
#Visualize Data
import matplotlib.pyplot as plt
#plot not stressed class
plt.scatter(data['X1'], data['X2'],c=data['S1'])

plt.scatter(data['X1'][data['L']==0], data['X2'][data['L']==0],label='class1',color='blue')
plt.scatter(data['X1'][data['L']==1], data['X2'][data['L']==1],label='class2',color='red')
plt.scatter(data['X1'][data['L']==2], data['X2'][data['L']==2],label='class3',color='green')
plt.legend()

from sklearn.model_selection import train_test_split
X_training, X_testing, Y_training, Y_testing = train_test_split(data[['X1','X2','X3','X4']], data['L'], test_size=0.3)
plt.scatter(X_training['X1'][Y_training==0],X_training['X2'][Y_training==0],label='class1',color='blue')
plt.scatter(X_training['X1'][Y_training==1],X_training['X2'][Y_training==1],label='class2',color='red')
plt.scatter(X_training['X1'][Y_training==2],X_training['X2'][Y_training==2],label='class3',color='green')
plt.title('Training Data')
plt.legend()
plt.figure()
plt.scatter(X_testing['X1'][Y_testing==0],X_testing['X2'][Y_testing==0],label='class1',color='blue')
plt.scatter(X_testing['X1'][Y_testing==1],X_testing['X2'][Y_testing==1],label='class2',color='red')
plt.scatter(X_testing['X1'][Y_testing==2],X_testing['X2'][Y_testing==2],label='class3',color='green')
plt.title('Test Data')
plt.legend()

MDC 分类器 使用讲座中描述的代码并使用MDC对数据进行分类

然后计算训练数据和测试数据中误分类的点数。

K-近邻分类器 使用 5NN(5 个最近邻)

然后计算训练数据和测试数据中误分类的点数。

0 个答案:

没有答案