我收到了这个错误。请给我任何建议来解决它。我的code.i正在从train.csv获取traing数据并测试来自另一个文件test.csv.i的数据是机器学习的新手所以我无法理解什么是问题。给我任何建议。
import quandl,math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
import datetime
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
train = pd.read_csv("train.csv", index_col=None)
test = pd.read_csv("test.csv", index_col=None)
vectorizer = CountVectorizer(min_df=1)
X1 = vectorizer.fit_transform(train['question'])
Y1 = vectorizer.fit_transform(test['testing'])
X=X1.toarray()
Y=Y1.toarray()
#print(Y.shape)
number=LabelEncoder()
train['answer']=number.fit_transform(train['answer'].astype('str'))
features = ['question','answer']
y = train['answer']
clf=RandomForestClassifier(n_estimators=100)
clf.fit(X[:25],y)
predicted_result=clf.predict(Y[17])
p_result=number.inverse_transform(predicted_result)
f = open('output.txt', 'w')
t=str(p_result)
f.write(t)
print(p_result)
答案 0 :(得分:1)
您的代码存在多个问题。
但与此问题相关的是您在列车和测试数据上拟合CountVectorizer(vectorizer
),这就是您获得不同功能的原因。
你应该做的是:
X1 = vectorizer.fit_transform(train['question'])
# The following line is changed
Y1 = vectorizer.transform(test['testing'])