我正在使用scikit的StackingClassifier运行分类器,并且遇到无法解决的错误。这是代码:
testset = pd.read_csv('testset_200.csv').fillna(0.0)
X = testset.iloc[:, 0]
y = testset.iloc[:, 1:10]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, shuffle = True)
vec = TfidfVectorizer(stop_words = 'english')
X_train = vec.fit_transform(X_train)
X_test = vec.transform(X_test)
estimators = [('rcv', RidgeCV()),
('rfc', RandomForestClassifier(n_estimators = 10))]
classifier = StackingClassifier(estimators = estimators, final_estimator = LogisticRegression())
y_pred = classifier.fit(X_train, y_train)
y_pred = y_pred.predict(X_test)
y_pred = np.argmax(y_pred, axis = 1)
y_test = np.argmax(y_test.values, axis = 1)
homogeneity_score(y_pred, y_test)
这是我收到的错误消息:
ValueError Traceback (most recent call last)
<ipython-input-433-4bc84f21b661> in <module>
3
4 classifier = StackingClassifier(estimators = estimators, final_estimator = LogisticRegression())
----> 5 y_pred = classifier.fit(X_train, y_train)
6 y_pred = y_pred.predict(X_test)
7
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/ensemble/_stacking.py in fit(self, X, y, sample_weight)
409 """
410 check_classification_targets(y)
--> 411 self._le = LabelEncoder().fit(y)
412 self.classes_ = self._le.classes_
413 return super().fit(X, self._le.transform(y), sample_weight)
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/preprocessing/_label.py in fit(self, y)
233 self : returns an instance of self.
234 """
--> 235 y = column_or_1d(y, warn=True)
236 self.classes_ = _encode(y)
237 return self
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
795 return np.ravel(y)
796
--> 797 raise ValueError("bad input shape {0}".format(shape))
798
799
ValueError: bad input shape (139, 9)
任何帮助将不胜感激。注意,当我将“分类器”变量设置为仅与RidgeCV()线性模型相等时,可能会有所提示,并且代码可以正确运行。