我正在使用tf.distribute.MirroredStrategy()训练textcnn模型,但是当我设置vocab_size = 0或其他错误值时,在此模式下不会报告任何错误。当不使用tf.distribute.MirroredStrategy()时,错误的vocab_size将立即报告错误。 vocab_size使用错误的值:
model=TextCNN(padding_size,vocab_size-10,embed_size,filter_num,num_classes)
model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks)
错误:
2 root error(s) found.
(0) Invalid argument: indices[63,10] = 4726 is not in [0, 4726)
[[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]]
[[Adam/Adam/update/AssignSubVariableOp/_45]]
(1) Invalid argument: indices[63,10] = 4726 is not in [0, 4726)
[[node text_cnn_1/embedding/embedding_lookup (defined at <ipython-input-7-6ef8a4397184>:37) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_234431]
但是使用strategy.scope()没有错误,并且运行良好:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
print(vocab_size)
model=TextCNN(padding_size,vocab_size-1000,embed_size,filter_num,num_classes)
model.compile(loss='sparse_categorical_crossentropy',optimizer=tf.keras.optimizers.Adam(),metrics=['accuracy'])
model.fit(train_dataset, epochs=epoch,validation_data=valid_dataset, callbacks=callbacks)
这样的日志(看起来很好):
Learning rate for epoch 1 is 0.0010000000474974513
2813/2813 [==============================] - 16s 6ms/step - loss: 0.8097 - accuracy: 0.7418 - val_loss: 0.4567 - val_accuracy: 0.8586 - lr: 0.0010
Epoch 2/15
2813/2813 [==============================] - ETA: 0s - loss: 0.4583 - accuracy: 0.8560
Learning rate for epoch 2 is 0.0010000000474974513
2813/2813 [==============================] - 14s 5ms/step - loss: 0.4583 - accuracy: 0.8560 - val_loss: 0.4051 - val_accuracy: 0.8756 - lr: 0.0010
Epoch 3/15
2810/2813 [============================>.] - ETA: 0s - loss: 0.3909 - accuracy: 0.8768
Learning rate for epoch 3 is 0.0010000000474974513
2813/2813 [==============================] - 14s 5ms/step - loss: 0.3909 - accuracy: 0.8767 - val_loss: 0.3853 - val_accuracy: 0.8844 - lr: 0.0010
Epoch 4/15
2811/2813 [============================>.] - ETA: 0s - loss: 0.2999 - accuracy: 0.9047
Learning rate for epoch 4 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2998 - accuracy: 0.9047 - val_loss: 0.3700 - val_accuracy: 0.8865 - lr: 1.0000e-04
Epoch 5/15
2807/2813 [============================>.] - ETA: 0s - loss: 0.2803 - accuracy: 0.9114
Learning rate for epoch 5 is 9.999999747378752e-05
2813/2813 [==============================] - 15s 5ms/step - loss: 0.2803 - accuracy: 0.9114 - val_loss: 0.3644 - val_accuracy: 0.8888 - lr: 1.0000e-04
Epoch 6/15
2803/2813 [============================>.] - ETA: 0s - loss: 0.2639 - accuracy: 0.9162
Learning rate for epoch 6 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2636 - accuracy: 0.9163 - val_loss: 0.3615 - val_accuracy: 0.8896 - lr: 1.0000e-04
Epoch 7/15
2805/2813 [============================>.] - ETA: 0s - loss: 0.2528 - accuracy: 0.9188
Learning rate for epoch 7 is 9.999999747378752e-05
2813/2813 [==============================] - 14s 5ms/step - loss: 0.2526 - accuracy: 0.9189 - val_loss: 0.3607 - val_accuracy: 0.8909 - lr: 1.0000e-04
更简单地,像这样运行并没有错误:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = Sequential()
model.add(Embedding(1000, 64, input_length=20))
test_array=np.random.randint(10000,size=(32,20))
model.predict(test_array)
为什么?