为我提供了可运行的云TPU服务器,但不确定如何在其上训练我的模型。 我在服务器上有一个jupyter笔记本文件,但是运行时:
try:
tpu = tf.distribute.cluster_resolver.TPUClusterResolver() # TPU detection
except ValueError:
tpu = None
gpus = tf.config.experimental.list_logical_devices("GPU")
# Select appropriate distribution strategy
if tpu:
tf.tpu.experimental.initialize_tpu_system(tpu)
strategy = tf.distribute.experimental.TPUStrategy(tpu, steps_per_run=128) # Going back and forth between TPU and host is expensive. Better to run 128 batches on the TPU before reporting back.
print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
elif len(gpus) > 1:
strategy = tf.distribute.MirroredStrategy([gpu.name for gpu in gpus])
print('Running on multiple GPUs ', [gpu.name for gpu in gpus])
elif len(gpus) == 1:
strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU
print('Running on single GPU ', gpus[0].name)
else:
strategy = tf.distribute.get_strategy() # default strategy that works on CPU and single GPU
print('Running on CPU')
print("Number of accelerators: ", strategy.num_replicas_in_sync)
输出为:
Running on CPU Number of accelerators: 1
但是,当我运行tf.tpu.core(0)
时,输出为:device:TPU_REPLICATED_CORE:0
我确定该服务器具有TPU,但是同样,不确定如何在其上训练我的模型。
我有一个keras model
,只缺少数据上的model.fit
。
这是我的模型脚本:
base_model = keras.applications.ResNet50(input_shape=(IMAGE_SIZE,IMAGE_SIZE,3),
include_top=False, #Set to false to train
weights='imagenet')
base_model.trainable = True
model = keras.Sequential([
base_model,
keras.layers.GlobalAveragePooling2D(),
keras.layers.Dense(2, activation='sigmoid')
])
# Print out model summary
model.summary()
在这个阶段之后我被困住了。