所以我试图建立一个单词嵌入模型,但是我一直收到这个错误。 在训练期间,准确性不会改变,并且val_loss仍为“ nan”
数据的原始形状为
x.shape, y.shape
((94556,), (94556, 2557))
然后我将其重塑为:
xr= np.asarray(x).astype('float32').reshape((-1,1))
yr= np.asarray(y).astype('float32').reshape((-1,1))
((94556, 1), (241779692, 1))
然后我通过模型运行它
model = Sequential()
model.add(Embedding(2557, 64, input_length=150, embeddings_initializer='glorot_uniform'))
model.add(Flatten())
model.add(Reshape((64,), input_shape=(94556, 1)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1, activation='relu'))
# compile the mode
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())
plot_model(model, show_shapes = True, show_layer_names=False)
训练后,我得到了一个恒定的精度,并且每个时期都有val_loss nan
history=model.fit(xr, yr, epochs=20, batch_size=32, validation_split=3/9)
Epoch 1/20
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1960/1970 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.9996WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 2/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 3/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 4/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 5/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 6/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 7/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 8/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 9/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 10/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 11/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 12/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 13/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 14/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 15/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 16/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 17/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 18/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 19/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 20/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
我认为它必须做输入/输出形状,但我不确定。我尝试以各种方式修改模型,添加层/删除层/使用不同的优化器/使用不同的批处理大小,但到目前为止没有任何效果。
答案 0 :(得分:5)
好的,这就是我的理解,如果我错了,请纠正我:
x
包含94556个整数,每个整数是2557个单词中一个的索引。y
包含94556个2557个整数的向量,每个向量还包含一个单词的索引,但这一次是单次编码而不是分类编码。x
和y
的相应单词对表示原始文本中附近的两个单词。如果到目前为止我是对的,那么以下内容可以正确运行:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
x = np.random.randint(0,2557,94556)
y = np.eye((2557))[np.random.randint(0,2557,94556)]
xr = x.reshape((-1,1))
print("x.shape: {}\nxr.shape:{}\ny.shape: {}".format(x.shape, xr.shape, y.shape))
model = Sequential()
model.add(Embedding(2557, 64, input_length=1, embeddings_initializer='glorot_uniform'))
model.add(Reshape((64,)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(2557, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
history=model.fit(xr, y, epochs=20, batch_size=32, validation_split=3/9)
最重要的修改:
y
重塑正在失去x
和y
中元素之间的关系。 input_length
层中的Embedding
应该与xr
的第二维相对应。y
的第二维相同。我真的很惊讶代码没有崩溃就运行了。
最后,根据我的研究,似乎人们实际上并没有在训练这样的跳跃图,而是在试图预测训练示例是否正确(两个词在附近)。也许这就是您提出第一个维度输出的原因。
这是一个受https://github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/Chapter05/keras_skipgram.py启发的模型:
word_model = Sequential()
word_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
word_model.add(Reshape((embed_size,)))
context_model = Sequential()
context_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
context_model.add(Reshape((64,)))
model = Sequential()
model.add(Merge([word_model, context_model], mode="dot", dot_axes=0))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))
在这种情况下,您将有3个向量,它们的大小都相同(94556, 1)
(或者甚至可能大于94556,因为您可能必须生成其他负样本):
x
包含0到2556之间的整数y
包含0到2556之间的整数output
包含0和1,无论x
和y
中的每对都是阴性还是阳性示例培训内容如下:
history = model.fit([x, y], output, epochs=20, batch_size=32, validation_split=3/9)