我想将matrix (m, n)
压缩为matrix (m, 300)
。
我正在使用TruncatedSVD(n_components=300)
,并且可以在我的数据集上工作,但是当我尝试将LSTM model
与该数据集配合使用时,结果是错误的
max_nb_words = 50000
max_length = 12903
tockenizer = Tokenizer( num_words=max_nb_words, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True)
tockenizer.fit_on_texts(train_df.values)
word_index = tockenizer.word_index
vectorizer = TfidfVectorizer(ngram_range=(3,3), norm=None)
x = vectorizer.fit_transform(train_df)
x = x.toarray()
pca = TruncatedSVD(n_components=300)
x = pca.fit_transform(x)
X_train, X_test, Y_train, Y_test = train_test_split(x,y, test_size = 0.30, random_state = 42)
hidden_nodes=4
word_vec_length=word_index
char_vec_length=max_length
model = Sequential()
model.add(Embedding(max_nb_words, 100, input_length=300))
model.add(LSTM(hidden_nodes, return_sequences=True, input_shape=(word_vec_length, char_vec_length)))
model.add(LSTM(hidden_nodes, activation='softmax'))
model.add(Dropout(0.4, noise_shape=None, seed=None))
model.add(Dense(3859, activation='softmax'))
model.add(Dropout(0.3, noise_shape=None, seed=None))
model.add(Dense(3859, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print('Train...')
history = model.fit(X_train, Y_train, batch_size=200, epochs=2, verbose=1, shuffle=1)
我希望输入matrix (m, 300)
适合模型参数。