用于keras中可变长度文本输入的LSTM自动编码器

时间:2018-04-11 13:17:16

标签: keras lstm autoencoder

这里padded_docs.shape =(736,50)。因为它是自动编码器的输入和输出是相同的。最后一个LSTM层的输出是3维的,但是作为输出保持的padded_docs是2维的。如何解决这个问题?

df1=pd.read_csv('snapdeal_data.csv')


df1=df1.head(1000)
df2=df1['Review_Text']
labels=df1['B_Helpfulness']

# encode full sentence into vector
encoded_docs=[one_hot(d,vocab_size) for d in X_train]
print encoded_docs

#####Padding encoded sequence of words
max_length=50
padded_docs = sequence.pad_sequences(encoded_docs, maxlen=max_length, padding='pre')
print padded_docs

model = Sequential()
timesteps = padded_docs.shape[1]
input_dim = max_length
#inputs = Input(shape=(input_dim,))
model.add(Embedding(vocab_size+1, 100,weights=[embedding_matrix],input_length=max_length,trainable=False))
model.add(LSTM(200,return_sequences = True))
model.add(LSTM(100,return_sequences = True))
model.add(LSTM(50))
model.add(RepeatVector(timesteps)) 
model.add(LSTM(100,return_sequences = True))
model.add(LSTM(200,return_sequences = True))
model.add(LSTM(input_dim,return_sequences = True))

model.compile(loss='mean_squared_error', optimizer='Adam')
model.summary()
    `model.fit(padded_docs,padded_docs,epochs=100,batch_size=1,shuffle=True, verbose=2)`


ValueError: Error when checking target: expected lstm_6 to have 3 dimensions, but got array with shape (736, 50)

1 个答案:

答案 0 :(得分:0)

我认为return_sequences = True不应保留在任何基于LSTM的体系结构的最后一层。