在GPU上训练Tensorflow BERT微调

时间:2019-11-12 13:42:37

标签: python tensorflow nvidia nvidia-docker

我正在使用官方脚本run_pretraining.py

运行BERT微调。

这利用了TPUEstimator

estimator = tf.contrib.tpu.TPUEstimator(
      use_tpu=FLAGS.use_tpu,
      model_fn=model_fn,
      config=run_config,
      train_batch_size=FLAGS.train_batch_size,
      eval_batch_size=FLAGS.eval_batch_size)
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)

model_fn创建的位置

model = modeling.BertModel(
        config=bert_config,
        is_training=is_training,
        input_ids=input_ids,
        input_mask=input_mask,
        token_type_ids=segment_ids,
        use_one_hot_embeddings=use_one_hot_embeddings)

使用优化器TPUEstimatorSpec

train_op = optimization.create_optimizer(
          total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)

      output_spec = tf.contrib.tpu.TPUEstimatorSpec(
          mode=mode,
          loss=total_loss,
          train_op=train_op,
          scaffold_fn=scaffold_fn)

根据文档,TPUEstimator应该由GPU作为TPU的后备支持。

在脚本中,有Tensorflow session的任何其他设置。 使用docker run --runtime=nvidia在docker容器中运行脚本会发生什么,培训从CPU开始:

|===============================+======================+======================|
|   0  GeForce GTX 1080    On   | 00000000:01:00.0  On |                  N/A |
|  0%   47C    P8     7W / 200W |    593MiB /  8111MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

为确保可见GPU设备,我添加了os.environ["CUDA_VISIBLE_DEVICES"]="0",并开始了一个会话:

def load_tensorflow_shared_session(self):
        """ Load a Tensorflow/Keras shared session """

        N_CPU = multiprocessing.cpu_count()

        # OMP_NUM_THREADS controls MKL's intra-op parallelization
        # Default to available physical cores
        os.environ['OMP_NUM_THREADS'] = str( max(1, N_CPU) )

        # LP: set Tensorflow logging level
        MXM_DIST = os.getenv('MXM_DIST', 'prod')
        if MXM_DIST == 'prod':
            tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
        else:
            tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)

        # LP: create a config by gpu cpu backend
        config = tf.ConfigProto(
                device_count={ 'GPU' : 1, 'CPU': N_CPU },
                intra_op_parallelism_threads = 0,
                inter_op_parallelism_threads = N_CPU,
                allow_soft_placement=True
            )
        config.gpu_options.allow_growth = True
        config.gpu_options.per_process_gpu_memory_fraction = 0.6

        # LP: create session by config
        self.tf_session = tf.Session(config=config)

        return self.tf_session

因此将所有内容包装在其中:

train_input_fn = input_fn_builder(
        input_files=input_files,
        max_seq_length=FLAGS.max_seq_length,
        max_predictions_per_seq=FLAGS.max_predictions_per_seq,
        is_training=True)
with session.as_default():
    with session.graph.as_default():
      estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)

我可以在设备列表中看到GPU:0,但是无论如何,处理仍在GPU上进行:

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                           
15225 root      20   0 37.956g 0.017t 271720 S 951.8 28.5 904:59.30 run_pretraining  

0 个答案:

没有答案