ValueError:模型的特征数量必须与输入匹配。模型n_features为356,输入n_features为164

时间:2019-02-22 01:44:32

标签: python scikit-learn

错误如上所述。我认为这可能与我的get_dummies函数有关,但是由于我对此感到非常陌生,因此我不确定。非常感谢我对愚蠢的新手的帮助。

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn import tree

df = pd.read_csv("D:/Machine Learning/Kaggle/Loan Prediction/train.csv")

df = df.dropna()

print(df.isnull().sum())

train, test = train_test_split(df, test_size=0.3, random_state=0)

xTrain = train.drop('Loan_Status', axis=1)
yTrain = train['Loan_Status']

xTest = test.drop('Loan_Status', axis=1)
yTest = test['Loan_Status']

xTrain = pd.get_dummies(xTrain)
xTest = pd.get_dummies(xTest)

model = BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
model.fit(xTrain,yTrain)
score = model.score(xTest,yTest)
print(score)

1 个答案:

答案 0 :(得分:0)

针对您问题的一种可能的解决方案是在分手训练和测试之前先弄傻瓜:

df = pd.read_csv("D:/Machine Learning/Kaggle/Loan Prediction/train.csv")

df = df.dropna()
df_X = df.drop('Loan_Status', axis=1)
df_X = pd.get_dummies(df_X)
df_y = df['Loan_Status']

train_X, test_X, train_y, test_y = train_test_split(df_X, df_y, test_size=0.3, random_state=0)

model = BaggingClassifier(tree.DecisionTreeClassifier(random_state=1))
model.fit(train_X,train_y)
score = model.score(test_X, test_y)