模型的特征数必须与输入匹配。模型n_features为20,输入n_features为4

时间:2017-06-03 07:16:05

标签: python machine-learning scikit-learn random-forest sklearn-pandas

enter image description here我在使用随机林分类器时收到此错误。这是我的代码:

import quandl, math    
import numpy as np    
import pandas as pd    
import matplotlib.pyplot as plt    
from matplotlib import style   
import datetime    
from sklearn.ensemble import RandomForestClassifier    
from sklearn.preprocessing import LabelEncoder    
from sklearn.feature_extraction.text import CountVectorizer

train = pd.read_csv("train.csv", index_col=None)    
vectorizer = CountVectorizer(min_df=1)    
X1 = vectorizer.fit_transform(train['question'])    
X=X1.toarray()    
corpus=['tell me your name']    
t1= vectorizer.fit_transform(corpus)    
t=t1.toarray()    
number=LabelEncoder()   
train['answer']=number.fit_transform(train['answer'].astype('str'))    
features = ['question','answer']    
y= train['question'].values    
clf=RandomForestClassifier(n_estimators=20)    
clf.fit(X,y)    
predicted_result=clf.predict(t)

1 个答案:

答案 0 :(得分:0)

对训练和测试数据使用相同的训练矢量器。在第二次,如果您再次适合数据,那么它将仅基于此新数据将其转换为矢量。

X1 = vectorizer.fit_transform(train['question'])
t1= vectorizer.transform(corpus)