训练CBOW模型时Keras输入形状错误

时间:2020-08-12 08:23:32

标签: numpy keras nlp word-embedding

我正在为单词嵌入训练一个连续碗的单词模型,其中每个热门向量的形状都是形状为(V,1)的列向量。我正在使用生成器根据语料库生成训练示例和标签,但是输入形状有误。

(这里V = 5778)

这是我的代码:

def windows(words, C):
    i = C
    while len(words) - i > C:
        center = words[i]
        context_words = words[i-C:i] + words[i+1:i+C+1]
        i += 1
        yield context_words, center

def one_hot_rep(word, word_to_index, V):
    vec = np.zeros((V, 1))
    vec[word_to_index[word]] = 1
    return vec

def context_to_one_hot(words, word_to_index, V):
    arr = [one_hot_rep(w, word_to_index, V) for w in words]
    return np.mean(arr, axis=0)
def get_training_examples(words, C, words_to_index, V):
    for context_words, center_word in windows(words, C):
        yield context_to_one_hot(context_words, words_to_index, V), one_hot_rep(center_word, words_to_index, V)
V = len(vocab)
N = 50

w2i, i2w = build_dict(vocab)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=(V, )),
    keras.layers.Dense(units=N, activation='relu'),
    keras.layers.Dense(units=V, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

model.fit_generator(get_training_examples(data, 2, w2i, V), epochs=5, steps_per_epoch=20)

Error I'm getting

2 个答案:

答案 0 :(得分:0)

平整层获得至少3维的numpy数组,但您将其赋予2维

答案 1 :(得分:0)

我弄清楚是什么原因引起的错误。模型期望输入_shape =(None,V),其中,当训练开始时,None保持Keras的batch_size,但是我以形状(1,V)的数组发送,当成批发送时,会得到一个额外的第一维,例如( 128,1,V)正在发送,与预期的input_shape冲突。