Question

我一直在尝试使用tensor2tensor和Google Colab上的GPU运行时来运行tensor2tensor来运行LibriSpeech问题，但是培训在开始之前就陷入了困境。这是我从here获得的代码：

Python的版本：3.6.8

Tensorflow的版本：1.14.0

tensor2tensor的版本：1.14.0

CUDA的版本：10.1

操作系统：Ubuntu 18.04

!t2t-trainer \
    --tmp_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/tmp/' \
    --problem='librispeech_clean_small' \
    --model='transformer' \
    --train_steps=2 \
    --eval_steps=1 \
    --hparams_set='transformer_librispeech' \
    --data_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/data/' \
    --output_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/output/'

这是输出：

WARNING: Logging before flag parsing goes to stderr.
W0826 18:25:13.545084 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0826 18:25:14.192255 139977848022912 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0826 18:25:15.506344 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/adafactor.py:27: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0826 18:25:15.506757 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/multistep_optimizer.py:32: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0826 18:25:15.512568 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/mesh_tensorflow/ops.py:4237: The name tf.train.CheckpointSaverListener is deprecated. Please use tf.estimator.CheckpointSaverListener instead.

W0826 18:25:15.512784 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/mesh_tensorflow/ops.py:4260: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

W0826 18:25:15.551204 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:105: The name tf.OptimizerOptions is deprecated. Please use tf.compat.v1.OptimizerOptions instead.

W0826 18:25:16.042501 139977848022912 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0826 18:25:16.042751 139977848022912 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

W0826 18:25:16.042839 139977848022912 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:33: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

:::MLPv0.5.0 transformer 1566843916.257424116 (/usr/local/bin/t2t-trainer:28) run_set_random_seed
I0826 18:25:16.257472 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843916.257424116 (/usr/local/bin/t2t-trainer:28) run_set_random_seed
W0826 18:25:16.257829 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:789: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0826 18:25:16.259287 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:142: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0826 18:25:16.260453 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:117: The name tf.GraphOptions is deprecated. Please use tf.compat.v1.GraphOptions instead.

W0826 18:25:16.260685 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:123: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.

W0826 18:25:16.260875 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:278: __init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
W0826 18:25:16.261037 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py:301: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

I0826 18:25:16.261096 139977848022912 trainer_lib.py:301] Configuring DataParallelism to replicate the model.
I0826 18:25:16.261162 139977848022912 devices.py:76] schedule=continuous_train_and_eval
I0826 18:25:16.261207 139977848022912 devices.py:77] worker_gpu=1
I0826 18:25:16.261248 139977848022912 devices.py:78] sync=False
W0826 18:25:16.261311 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/devices.py:139: The name tf.logging.warn is deprecated. Please use tf.compat.v1.logging.warn instead.

W0826 18:25:16.261358 139977848022912 devices.py:141] Schedule=continuous_train_and_eval. Assuming that training is running on a single machine.
I0826 18:25:16.261914 139977848022912 devices.py:170] datashard_devices: ['gpu:0']
I0826 18:25:16.261970 139977848022912 devices.py:171] caching_devices: None
I0826 18:25:16.262233 139977848022912 devices.py:172] ps_devices: ['gpu:0']
I0826 18:25:16.262940 139977848022912 estimator.py:209] Using config: {'_save_checkpoints_secs': None, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4ec60062d0>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_protocol': None, '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_session_config': gpu_options {
  per_process_gpu_memory_fraction: 0.95
}
allow_soft_placement: true
graph_options {
  optimizer_options {
    global_jit_level: OFF
  }
}
, '_model_dir': '/content/gdrive/My Drive/TCC/T2T LibriSpeech/output/', 'use_tpu': False, '_tf_random_seed': None, '_master': '', '_device_fn': None, '_num_worker_replicas': 0, '_task_id': 0, '_log_step_count_steps': 100, '_experimental_max_worker_delay_secs': None, '_evaluation_master': '', '_eval_distribute': None, 'data_parallelism': <tensor2tensor.utils.expert_utils.Parallelism object at 0x7f4ec6006310>, '_environment': 'local', '_save_summary_steps': 100, 't2t_device_info': {'num_async_replicas': 1}}
W0826 18:25:16.263118 139977848022912 model_fn.py:630] Estimator's model_fn (<function wrapping_model_fn at 0x7f4ec6005500>) includes params argument, but params are not passed to Estimator.
W0826 18:25:16.263513 139977848022912 trainer_lib.py:722] ValidationMonitor only works with --schedule=train_and_evaluate
W0826 18:25:16.264281 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py:310: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0826 18:25:16.266737 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py:326: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

I0826 18:25:16.276473 139977848022912 estimator_training.py:186] Not using Distribute Coordinator.
I0826 18:25:16.276839 139977848022912 training.py:612] Running training and evaluation locally (non-distributed).
I0826 18:25:16.277188 139977848022912 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
W0826 18:25:16.282742 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/training/training_util.py:236: initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
:::MLPv0.5.0 transformer 1566843916.300036907 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:759) input_max_length: 1240000
I0826 18:25:16.300060 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843916.300036907 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:759) input_max_length: 1240000
I0826 18:25:16.300235 139977848022912 problem.py:614] Reading data files from /content/gdrive/My Drive/TCC/T2T LibriSpeech/data/librispeech_clean_small-train*
I0826 18:25:16.321804 139977848022912 problem.py:644] partition: 0 num_data_files: 100
:::MLPv0.5.0 transformer 1566843916.322568893 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:872) input_order
I0826 18:25:16.322586 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843916.322568893 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:872) input_order
W0826 18:25:16.324836 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:654: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
W0826 18:25:16.324963 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/contrib/data/python/ops/interleave_ops.py:77: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0826 18:25:16.361968 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_audio.py:92: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0826 18:25:16.458050 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_audio.py:115: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0826 18:25:16.617947 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:172: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
W0826 18:25:17.132765 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:893: output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
W0826 18:25:17.133136 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:896: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W0826 18:25:17.133249 139977848022912 problem.py:897] Shapes are not fully defined. Assuming batch_size means tokens.
W0826 18:25:17.153662 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:947: bucket_by_sequence_length (from tensorflow.contrib.data.python.ops.grouping) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.bucket_by_sequence_length(...)`.
W0826 18:25:17.174197 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/data/experimental/ops/grouping.py:193: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0826 18:25:17.212933 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/problem.py:1209: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

I0826 18:25:17.264877 139977848022912 estimator.py:1145] Calling model_fn.
I0826 18:25:17.276007 139977848022912 t2t_model.py:1905] Unsetting shared_embedding_and_softmax_weights.
:::MLPv0.5.0 transformer 1566843917.277432919 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:59) model_hp_embedding_shared_weights: {"vocab_size": 256, "hidden_size": 384}
I0826 18:25:17.277457 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843917.277432919 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:59) model_hp_embedding_shared_weights: {"vocab_size": 256, "hidden_size": 384}
I0826 18:25:17.277616 139977848022912 t2t_model.py:1905] Setting T2TModel mode to 'train'
W0826 18:25:17.333712 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py:167: The name tf.summary.text is deprecated. Please use tf.compat.v1.summary.text instead.

:::MLPv0.5.0 transformer 1566843919.260778904 (/tmp/tmpurXynh.py:100) model_hp_initializer_gain: 1.0
W0826 18:25:21.170526 139977848022912 ag_logging.py:145] Entity <bound method PythonHandler.emit of <absl.logging.PythonHandler object at 0x7f4eea764510>> could not be transformed and will be executed as-is. Please report this to the AutgoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: The global keyword is not yet supported.
I0826 18:25:20.087136 139977848022912 api.py:452] :::MLPv0.5.0 transformer 1566843919.260778904 (/tmp/tmpurXynh.py:100) model_hp_initializer_gain: 1.0
I0826 18:25:21.171293 139977848022912 api.py:255] Using variable initializer: uniform_unit_scaling
I0826 18:25:21.585083 139977848022912 t2t_model.py:1905] Transforming feature 'inputs' with speech_recognition_modality.bottom
W0826 18:25:21.586750 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/modalities.py:585: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
I0826 18:25:21.853492 139977848022912 t2t_model.py:1905] Transforming feature 'targets' with symbol_modality_256_384.targets_bottom
I0826 18:25:21.967683 139977848022912 t2t_model.py:1905] Building model body
:::MLPv0.5.0 transformer 1566843922.029903889 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:186) model_hp_layer_postprocess_dropout: 0.2
I0826 18:25:22.029930 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843922.029903889 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:186) model_hp_layer_postprocess_dropout: 0.2
W0826 18:25:22.030196 139977848022912 deprecation.py:506] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:92: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
:::MLPv0.5.0 transformer 1566843922.038706064 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_hidden_layers: 6
I0826 18:25:22.038723 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843922.038706064 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_hidden_layers: 6
:::MLPv0.5.0 transformer 1566843922.039638042 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_attention_dropout: 0.1
I0826 18:25:22.039654 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843922.039638042 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_attention_dropout: 0.1
:::MLPv0.5.0 transformer 1566843922.040515900 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_attention_dense: {"num_heads": 2, "use_bias": "false", "hidden_size": 384}
I0826 18:25:22.040528 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843922.040515900 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_attention_dense: {"num_heads": 2, "use_bias": "false", "hidden_size": 384}
W0826 18:25:22.067670 139977848022912 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_layers.py:2926: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
W0826 18:25:22.656837 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/layers/common_attention.py:1171: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

:::MLPv0.5.0 transformer 1566843925.825310946 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_norm: {"hidden_size": 384}
I0826 18:25:25.825342 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843925.825310946 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:101) model_hp_norm: {"hidden_size": 384}
:::MLPv0.5.0 transformer 1566843925.911139011 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:202) model_hp_layer_postprocess_dropout: 0.2
I0826 18:25:25.911170 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843925.911139011 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:202) model_hp_layer_postprocess_dropout: 0.2
:::MLPv0.5.0 transformer 1566843925.919984102 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_hidden_layers: 4
I0826 18:25:25.920005 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843925.919984102 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_hidden_layers: 4
:::MLPv0.5.0 transformer 1566843925.920943975 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_attention_dropout: 0.1
I0826 18:25:25.920955 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843925.920943975 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_attention_dropout: 0.1
:::MLPv0.5.0 transformer 1566843925.921816111 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_attention_dense: {"num_heads": 2, "use_bias": "false", "hidden_size": 384}
I0826 18:25:25.921828 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843925.921816111 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_attention_dense: {"num_heads": 2, "use_bias": "false", "hidden_size": 384}
:::MLPv0.5.0 transformer 1566843929.180166006 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_norm: {"hidden_size": 384}
I0826 18:25:29.180202 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.180166006 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/transformer.py:153) model_hp_norm: {"hidden_size": 384}
I0826 18:25:29.203882 139977848022912 t2t_model.py:1905] Transforming body output with symbol_modality_256_384.top
:::MLPv0.5.0 transformer 1566843929.399625063 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py:582) opt_learning_rate: "DEFERRED: acbbcfbd-a251-4ef2-8405-8a4a42bb03d7"
I0826 18:25:29.399658 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.399625063 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py:582) opt_learning_rate: "DEFERRED: acbbcfbd-a251-4ef2-8405-8a4a42bb03d7"
:::MLPv0.5.0 transformer 1566843929.400494099 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py:582) opt_learning_rate_warmup_steps: 8000
I0826 18:25:29.400509 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.400494099 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py:582) opt_learning_rate_warmup_steps: 8000
W0826 18:25:29.400695 139977848022912 deprecation_wrapper.py:119] From /usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/learning_rate.py:100: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

I0826 18:25:29.401774 139977848022912 learning_rate.py:29] Base learning rate: 2.000000
I0826 18:25:29.414273 139977848022912 optimize.py:251] Trainable Variables Total size: 70441856
I0826 18:25:29.414625 139977848022912 optimize.py:251] Non-trainable variables Total size: 5
I0826 18:25:29.415019 139977848022912 optimize.py:89] Using optimizer Adam
:::MLPv0.5.0 transformer 1566843929.415780067 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_name: "Adam"
I0826 18:25:29.415796 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.415780067 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_name: "Adam"
:::MLPv0.5.0 transformer 1566843929.416517019 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_hp_Adam_beta1: 0.9
I0826 18:25:29.416527 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.416517019 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_hp_Adam_beta1: 0.9
:::MLPv0.5.0 transformer 1566843929.417274952 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_hp_Adam_beta2: 0.997
I0826 18:25:29.417284 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.417274952 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_hp_Adam_beta2: 0.997
:::MLPv0.5.0 transformer 1566843929.418004036 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_hp_Adam_epsilon: 1e-09
I0826 18:25:29.418014 139977848022912 mlperf_log.py:156] :::MLPv0.5.0 transformer 1566843929.418004036 (/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/optimize.py:53) opt_hp_Adam_epsilon: 1e-09
I0826 18:25:40.607661 139977848022912 estimator.py:1147] Done calling model_fn.
I0826 18:25:40.609323 139977848022912 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
I0826 18:25:45.471376 139977848022912 monitored_session.py:240] Graph was finalized.
2019-08-26 18:25:45.471909: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX512F
2019-08-26 18:25:45.477428: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2000155000 Hz
2019-08-26 18:25:45.477710: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562891013c00 executing computations on platform Host. Devices:
2019-08-26 18:25:45.477743: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-26 18:25:45.480285: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-26 18:25:45.634041: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-26 18:25:45.634841: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562891013f80 executing computations on platform CUDA. Devices:
2019-08-26 18:25:45.634873: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-08-26 18:25:45.635107: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-26 18:25:45.635709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-08-26 18:25:45.636103: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-26 18:25:45.637595: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-26 18:25:45.639006: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-26 18:25:45.639369: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-26 18:25:45.640924: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-26 18:25:45.642059: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-26 18:25:45.645505: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-26 18:25:45.645641: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-26 18:25:45.646182: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-26 18:25:45.646660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-26 18:25:45.646722: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-26 18:25:45.647904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-26 18:25:45.647930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-08-26 18:25:45.647940: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-08-26 18:25:45.648047: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-26 18:25:45.648618: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-26 18:25:45.649113: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-08-26 18:25:45.649167: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14325 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-08-26 18:25:49.626228: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
I0826 18:25:53.582676 139977848022912 session_manager.py:500] Running local_init_op.
I0826 18:25:53.897416 139977848022912 session_manager.py:502] Done running local_init_op.
I0826 18:26:06.040513 139977848022912 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /content/gdrive/My Drive/TCC/T2T LibriSpeech/output/model.ckpt.
2019-08-26 18:26:28.882768: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0

它被卡在最后一行。 Tensorflow和tensor2tensor的版本均为1.14.0。关于如何解决的任何想法？一周前，我曾尝试在Github上的tensor2tensor的存储库中发布，但尚未得到答复。

tensor2tensor培训不会在具有GPU运行时的Google Colab上开始

0 个答案: