如何使用模型进行预测?

时间:2020-01-26 11:07:38

标签: python machine-learning scikit-learn

我使用RandomForest创建了一个模型,该模型具有98%的准确性。我使用pickle保存模型并预测新数据集。我的输入字符串是纯文本,因此无法将其传递给模型。我试图进行矢量化和解析,但是并没有帮助。

import pickle
modelFile=os.path.join('D:\PYPrograms','Data','model')
with open(modelFile, 'rb') as training_model:
    model = pickle.load(training_model)
tf2 = CountVectorizer()
File=os.path.join('D:\PYPrograms','Data','POS','TestData.csv')
data = pd.read_csv(File)
data.columns = data.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
data.loc[:,"title"] = data.title.apply(lambda x : " ".join(re.findall('[\w]+',x)))
df2 = data

df3 = df2["tickettype"]+" "+df2["title"]
#cv_data2 = tf1.transform(df2["type"])
cv_data = tf2.transform(df3)

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-205-c0ac8462bce6> in <module>
----> 1 model.predict(test)

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\ensemble\forest.py in predict(self, X)
    543             The predicted classes.
    544         """
--> 545         proba = self.predict_proba(X)
    546 
    547         if self.n_outputs_ == 1:

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\ensemble\forest.py in predict_proba(self, X)
    586         check_is_fitted(self, 'estimators_')
    587         # Check data
--> 588         X = self._validate_X_predict(X)
    589 
    590         # Assign chunk of trees to jobs

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\ensemble\forest.py in _validate_X_predict(self, X)
    357                                  "call `fit` before exploiting the model.")
    358 
--> 359         return self.estimators_[0]._validate_X_predict(X, check_input=True)
    360 
    361     @property

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\tree\tree.py in _validate_X_predict(self, X, check_input)
    400                              "match the input. Model n_features is %s and "
    401                              "input n_features is %s "
--> 402                              % (self.n_features_, n_features))
    403 
    404         return X

ValueError: Number of features of the model must match the input. Model n_features is 6639 and input n_features is 3 

数据可从https://drive.google.com/open?id=1xaKKSXzpr7THezqU_8jycfvAueg0nnCQ

获得

0 个答案:

没有答案