嵌入输入形状时出错:预期embedding_1_input具有形状(25,),但数组的形状为(1,)

时间:2020-02-13 10:53:37

标签: python tensorflow keras keras-layer embedding

我不确定为什么会不断收到此错误。我检查了我实际的标记化+编码文本数据的长度,它与我选择的输入长度匹配。代码如下:

from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

import numpy as np

training_samples = 6603 
max_words = 10000  # We will only consider the top 10,000 words in the dataset

tokenizer = Tokenizer(num_words=max_words)
tokenizer.fit_on_texts(X_train) 
sequences = tokenizer.texts_to_sequences(X_train) 
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
print('Shape of data tensor:', X_train.shape)

max_length = 25
padded_s = pad_sequences(sequences, maxlen=max_length, padding='post')
print(padded_s)

print(padded_s.shape)

y_train = np.array(y_train)
y_test = np.array(y_test)

由此-输出为:

Found 10759 unique tokens.
Shape of data tensor: (5942,)
[[  17  119  154 ...    0    0    0]
 [  31  116   40 ...    0    0    0]
 [1925 1711   15 ...  184    0    0]
 ...
 [   6 1915  375 ...    0    0    0]
 [ 693  190   24 ...    0    0    0]
 [   1  570    2 ...    0    0    0]]
**(5942, 25)**

从上面可以看到,它是25,而不是1!

 glove_dir = '/Users/xxx/Downloads/glove.6B'
embeddings_index = {}
f = open(os.path.join(glove_dir, 'glove.6B.100d.txt'))
for line in f:
    values = line.split()
    word = values[0]
    coefs = np.asarray(values[1:], dtype='float32')
    embeddings_index[word] = coefs
f.close()
print('Found %s word vectors.' % len(embeddings_index))

embedding_dim = 100

embedding_matrix = np.zeros((max_words, embedding_dim))

for word, i in word_index.items():
    embedding_vector = embeddings_index.get(word)
    if i < max_words:
        if embedding_vector is not None:
            # Words not found in embedding index will be all-zeros.
            embedding_matrix[i] = embedding_vector

from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense
model = Sequential()
model.add(Embedding(max_words, embedding_dim, weights=[embedding_matrix], input_length=max_length, trainable=False))
model.add(Flatten())
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
history = model.fit(padded_s, y_train,
                    epochs=10,
                    batch_size=32,
                    validation_data=(X_val, y_val))
model.save_weights('pre_trained_glove_model.h5')

这将返回错误:

ValueError: Error when checking input: expected embedding_1_input to have shape (25,) but got array with shape (1,)

任何帮助将不胜感激-非常感谢!

1 个答案:

答案 0 :(得分:0)

已修复:我忘了将验证集转换为序列+填充序列。