带Keras的SKlearn Tfidfvectorizer:期望dense_input_1具有形状

时间:2017-01-18 15:28:55

标签: python scikit-learn keras

我试图将SKLearn Tfidfvectorizer与Keras结合使用,但我遇到了以下错误: 例外:检查模型输入时出错:期望dense_input_1具有形状(无,126)但是具有形状的数组(700,116)

我知道它与矩阵的形状有关,但我无法弄清楚如何解决它。

vectorizer = TfidfVectorizer(analyzer=self.identity, use_idf=True, max_features=2000)

#a list of sentences
x_train_vec = vectorizer.fit_transform(x_train).toarray()
x_test_vec = vectorizer.fit_transform(self.x_test[i]).toarray()

#labels
y_train = np_utils.to_categorical(y_train, self.nb_classes)
y_test = np_utils.to_categorical(y_test, self.nb_classes)

#get model
model = self.build_model(x_train_vec.shape[1])
model.fit(x_train_vec, y_train, nb_epoch=self.n_epochs, batch_size=self.batch_size, shuffle='batch', verbose=1, validation_data=(x_test_vec, y_test), )

构建模型:

def build_model(self, nb_features):
    print("Building model...")

    model = Sequential()
    model.add(Dense(input_dim = nb_features, output_dim = self.hidden_units_1))
    model.add(Activation('relu'))

2 个答案:

答案 0 :(得分:1)

问题是x_train和x_test中的尺寸差异。更改tfidfvectorizer中的最大功能解决了这个问题。

vectorizer = TfidfVectorizer(analyzer=self.identity, use_idf=True, max_features=100)

答案 1 :(得分:1)

向量化测试集时,需要致电transform而不是fit_transform

x_train_vec = vectorizer.fit_transform(x_train).toarray()
x_test_vec = vectorizer.transform(self.x_test[i]).toarray()