当我使用Transformer模型运行演练的shell训练一个良好的英语 - 德语翻译模型,但我遇到了问题。 我的问题是:
INFO:tensorflow:Total trainable variables size: 60276736
INFO:tensorflow:Total embedding variables size: 16384
INFO:tensorflow:Total non-embedding variables size: 60260352
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-30 15:05:58.562782: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562814: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562820: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562824: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562829: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/contextlib.py", line 66, in exit
next(self.gen)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]
在处理上述异常期间,发生了另一个异常:
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 532, in run_locally
exp.train()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]
Caused by op 'symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather', defined at:
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 532, in run_locally
exp.train()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model
model_fn_ops = self._get_train_ops(features, labels)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 424, in model_fn
len(hparams.problems) - 1)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 751, in _cond_on_index
return fn(cur_idx)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 406, in nth_model
features, skip=(skipping_is_on and skip_this_one))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/t2t_model.py", line 377, in model_fn
sharded_features[key], dp)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/modality.py", line 91, in bottom_sharded
return data_parallelism(self.bottom, xs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/expert_utils.py", line 294, in call
outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/models/modalities.py", line 88, in bottom
return self.bottom_simple(x, "shared", reuse=None)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/models/modalities.py", line 80, in bottom_simple
ret = tf.gather(var, x)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
validate_indices=validate_indices, name=name)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_task_type': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f884bf21630>, '_model_dir': '/root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base', '_save_checkpoints_secs': 600, '_save_summary_steps': 100, '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_tf_random_seed': None, '_num_ps_replicas': 0, '_evaluation_master': '', '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_is_chief': True, '_num_worker_replicas': 0, '_save_checkpoints_steps': None, '_environment': 'local'}
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/l`enter code here`ib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 544, in run_locally
decode_from_file(estimator, FLAGS.decode_from_file)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.`enter code here`4/site-packages/tensor2tensor/utils/trainer_utils.py", line`enter code here` 648, in decode_from_file
as_iterable=True)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
我该如何解决这个问题? 谢谢