Question

我正在尝试使用GradientBoosting分类器。我使用data(X_train)将我的训练Tfidf更改为稀疏矩阵。通过GradientBoostingClassifier.fit.它工作正常。当我尝试在我的测试数据上应用模型时。

它出错了

＆＃34;传递了稀疏矩阵，但需要密集数据。使用 X.toarray（）转换为密集的numpy数组。＆＃34;

我也使用了X_test.toarray()，但却提供了memory error。有没有办法解决这个问题？

我正在使用Pipeline，如下所示

    text_clf_1 = Pipeline([('vect', CountVectorizer(stop_words=STOPWORDS, 
    ngram_range=(1,2))),('tfidf',TfidfTransformer()), 'clf', 
    GradientBoostingClassifier(verbose=100,n_estimators=100))])

培训模式：工作正常

text_clf = text_clf_1.fit(X_train, y_train)

虽然测试给出了上述错误：

    text_clf_1predicted = text_clf.predict(X_test)

Error : "A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array."

我看到了一个问题，但是当我尝试＆＃34; X_test.toarray（）＆＃34;时，它给出了错误

    AttributeError: 'list' object has no attribute 'toarray'

GradientBoostingClassifier.fit接受稀疏X，但.predict不接受

0 个答案: