Question

我正在构建一个程序，该程序将多个标签/标签分配给文本描述。我正在使用MultiOutputRegressor标记文本描述。当我预测矢量化文本的数组时，在最后一行（y_pred = clf.predict（yTest））会弹出以下错误：

ValueError：形状（74,28）和（3532,2）不对齐：28（dim 1）！= 3532（dim 0）

下面是我的代码：

textList = df.Text
vectorizer2 = TfidfVectorizer(stop_words=stopWords)
vectorizer2.fit(textList)
x = vectorizer2.transform(textList)

tagList = df.Tags
vectorizer = MultiLabelBinarizer()
vectorizer.fit(tagList)
y = vectorizer.transform(tagList)

print("x.shape = " + str(x.shape))
print("y.shape = " + str(y.shape))

xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.50)

nb_clf = MultinomialNB()
sgd = SGDClassifier()
lr = LogisticRegression()
mn = MultinomialNB()

xTrain = csr_matrix(xTrain).toarray()
xTest = csr_matrix(xTest).toarray()
yTrain = csr_matrix(yTrain).toarray()

print("xTrain.shape = " + str(xTrain.shape))
print("xTest.shape = " + str(xTest.shape))
print("yTrain.shape = " + str(yTrain.shape))
print("yTest.shape = " + str(yTest.shape))

for classifier in [nb_clf, sgd, lr, mn]:
    clf = MultiOutputRegressor(classifier)
    clf.fit(xTrain, yTrain)
    y_pred = clf.predict(yTest)

以下是形状的打印说明：

x.shape = (147, 3532)
y.shape = (147, 28)
xTrain.shape = (73, 3532)
xTest.shape = (74, 3532)
yTrain.shape = (73, 28)
yTest.shape = (74, 28)

Answer 1

这可能仅仅是因为您将yTest而不是clf.test()作为xTest的输入。

SkLearn：ValueError形状在预测期间未对齐

1 个答案: