模型的特征数必须与输入匹配。模型n_features为40,输入n_features为38

时间:2017-06-05 07:20:29

标签: python machine-learning scikit-learn random-forest sklearn-pandas

我收到了这个错误。请给我任何建议来解决它。我的code.i正在从train.csv获取traing数据并测试来自另一个文件test.csv.i的数据是机器学习的新手所以我无法理解什么是问题。给我任何建议。

import quandl,math    
import numpy as np    
import pandas as pd    
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
train = pd.read_csv("train.csv", index_col=None)
test = pd.read_csv("test.csv", index_col=None)
vectorizer = CountVectorizer(min_df=1)
X1 = vectorizer.fit_transform(train['question'])
Y1 = vectorizer.fit_transform(test['testing'])
X=X1.toarray()
Y=Y1.toarray()
#print(Y.shape)
number=LabelEncoder()
train['answer']=number.fit_transform(train['answer'].astype('str'))
features = ['question','answer']
y = train['answer']
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X[:25],y)
predicted_result=clf.predict(Y[17])
p_result=number.inverse_transform(predicted_result)
f = open('output.txt', 'w')
t=str(p_result)
f.write(t)
print(p_result)

1 个答案:

答案 0 :(得分:1)

您的代码存在多个问题。 但与此问题相关的是您在列车和测试数据上拟合CountVectorizer(vectorizer),这就是您获得不同功能的原因。

你应该做的是:

X1 = vectorizer.fit_transform(train['question'])

# The following line is changed
Y1 = vectorizer.transform(test['testing'])