我想使用我自己的词数据集来创建嵌入。并使用我自己的标签数据来训练和测试模型。为此,我已经使用word2vec创建了自己的单词嵌入。并在使用标签数据训练模型时面临问题。
尝试训练模型时出现错误。我的模型创建代码:
# create the tokenizer
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)
encoded_docs = tokenizer.texts_to_sequences(X_train)
max_length = max([len(s.split()) for s in X_train])
X_train = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_test)
encoded_docs = tokenizer.texts_to_sequences(X_test)
X_test = pad_sequences(encoded_docs, maxlen=max_length, padding='post')
# setup the embedding layer
embeddings = Embedding(input_dim=embedding_matrix.shape[0], output_dim=embedding_matrix.shape[1],
weights=[embedding_matrix],input_length= max_length, trainable=False)
new_model = Sequential() new_model.add(embeddings)
new_model.add(Conv1D(filters=128, kernel_size=5, activation='relu'))
new_model.add(MaxPooling1D(pool_size=2)) new_model.add(Flatten())
new_model.add(Dense(1, activation='sigmoid'))
这就是我创建嵌入矩阵的方式-
embedding_matrix = np.zeros((len(model.wv.vocab), vector_dim))
for i in range(len(model.wv.vocab)):
embedding_vector = model.wv[model.wv.index2word[i]]
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
这样做,我得到以下错误-
WARNING:tensorflow:From /Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1290: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Epoch 1/10
Traceback (most recent call last):
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
[[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/embedding-tut/src/main.py", line 359, in <module>
custom_keras_model(embedding_matrix, model.wv)
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 295, in custom_keras_model
new_model.fit(X_train, y_train, epochs=10, verbose=2)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 867, in fit
initial_epoch=initial_epoch)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1598, in fit
validation_steps=validation_steps)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/training.py", line 1183, in _fit_loop
outs = f(ins_batch)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2273, in __call__
**self.session_kwargs)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,2] = 1049 is not in [0, 1045)
[[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]
Caused by op 'embedding_1/GatherV2', defined at:
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 359, in <module>
custom_keras_model(embedding_matrix, model.wv)
File "/Users/faysal/Desktop/My Computer/D/Code Workspace/Research-IoT/Collaboration/embedding-tut/src/main.py", line 278, in custom_keras_model
new_model.add(embeddings)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/models.py", line 442, in add
layer(x)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/engine/topology.py", line 602, in __call__
output = self.call(inputs, **kwargs)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/layers/embeddings.py", line 134, in call
out = K.gather(self.embeddings, inputs)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1134, in gather
return tf.gather(reference, indices)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 2736, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3065, in gather_v2
"GatherV2", params=params, indices=indices, axis=axis, name=name)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/Users/faysal/anaconda2/envs/python3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): indices[27,2] = 1049 is not in [0, 1045)
[[Node: embedding_1/GatherV2 = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/GatherV2/axis)]]
Process finished with exit code 1
在将训练数据拟合到模型中时出现错误。我认为我在计算训练数据形状并将其注入模型中是错误的。
答案 0 :(得分:1)
您正在使用两个不同的Tokenizer,并分别在训练和测试中对其进行训练。发生的情况是,您的令牌与培训和测试不匹配。造成您的错误是因为发生令牌(1049),该令牌不在max_length中。即使您修复了该问题,但是如果您有两个标记程序,您的模型也将无法工作。
您应该怎么做才能使Tokenizer适合所有数据(X_train和X_test),并且仅使用一个Tokenizer。