我正在使用官方脚本run_pretraining.py
运行BERT微调。这利用了TPUEstimator
estimator = tf.contrib.tpu.TPUEstimator(
use_tpu=FLAGS.use_tpu,
model_fn=model_fn,
config=run_config,
train_batch_size=FLAGS.train_batch_size,
eval_batch_size=FLAGS.eval_batch_size)
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)
model_fn创建的位置
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
使用优化器TPUEstimatorSpec
:
train_op = optimization.create_optimizer(
total_loss, learning_rate, num_train_steps, num_warmup_steps, use_tpu)
output_spec = tf.contrib.tpu.TPUEstimatorSpec(
mode=mode,
loss=total_loss,
train_op=train_op,
scaffold_fn=scaffold_fn)
根据文档,TPUEstimator
应该由GPU作为TPU的后备支持。
在脚本中,有Tensorflow session
的任何其他设置。
使用docker run --runtime=nvidia
在docker容器中运行脚本会发生什么,培训从CPU开始:
|===============================+======================+======================|
| 0 GeForce GTX 1080 On | 00000000:01:00.0 On | N/A |
| 0% 47C P8 7W / 200W | 593MiB / 8111MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
为确保可见GPU设备,我添加了os.environ["CUDA_VISIBLE_DEVICES"]="0"
,并开始了一个会话:
def load_tensorflow_shared_session(self):
""" Load a Tensorflow/Keras shared session """
N_CPU = multiprocessing.cpu_count()
# OMP_NUM_THREADS controls MKL's intra-op parallelization
# Default to available physical cores
os.environ['OMP_NUM_THREADS'] = str( max(1, N_CPU) )
# LP: set Tensorflow logging level
MXM_DIST = os.getenv('MXM_DIST', 'prod')
if MXM_DIST == 'prod':
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
else:
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.DEBUG)
# LP: create a config by gpu cpu backend
config = tf.ConfigProto(
device_count={ 'GPU' : 1, 'CPU': N_CPU },
intra_op_parallelism_threads = 0,
inter_op_parallelism_threads = N_CPU,
allow_soft_placement=True
)
config.gpu_options.allow_growth = True
config.gpu_options.per_process_gpu_memory_fraction = 0.6
# LP: create session by config
self.tf_session = tf.Session(config=config)
return self.tf_session
因此将所有内容包装在其中:
train_input_fn = input_fn_builder(
input_files=input_files,
max_seq_length=FLAGS.max_seq_length,
max_predictions_per_seq=FLAGS.max_predictions_per_seq,
is_training=True)
with session.as_default():
with session.graph.as_default():
estimator.train(input_fn=train_input_fn, max_steps=FLAGS.num_train_steps)
我可以在设备列表中看到GPU:0
,但是无论如何,处理仍在GPU上进行:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15225 root 20 0 37.956g 0.017t 271720 S 951.8 28.5 904:59.30 run_pretraining