如何通过Tensorflow的对象检测API手动使用Google TPU?

时间:2018-11-13 03:08:42

标签: tensorflow object-detection-api google-cloud-tpu tpu

我已经使用Tensorflow的对象检测API成功地训练了模型,这些API在GPU上本地运行(使用model_main.py)和Google的ML Engine(GPU和TPU)。但是,当我在Google的云端(使用手动配置的VM和TPU)上运行时,我似乎无法使用model_tpu_main.py来训练模型。

当我使用model_tpu_main.py之类的东西启动python -m object_detection.model_tpu_main --model_dir=gs://bucket/training --tpu_zone us-central1-b --pipeline_config_path=gs://bucket/training/pipeline.config --job-dir gs://bucket/training --tpu_name mytpu_name时,它卡在了:

...
W1113 03:05:16.628712 139998232708864 variables_helper.py:144] Variable [resnet_v1_50/fpn/smoothing_2/BatchNorm/moving_mean] is not available in checkpoint
W1113 03:05:16.629062 139998232708864 variables_helper.py:144] Variable [resnet_v1_50/fpn/smoothing_2/BatchNorm/moving_variance] is not available in checkpoint
W1113 03:05:16.629330 139998232708864 variables_helper.py:144] Variable [resnet_v1_50/fpn/smoothing_2/weights] is not available in checkpoint
2018-11-13 03:06:08.618834: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:349] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
...

看看TPU日志,我几乎得到的是:

...
Start master session b9186abfa4e15b1d with config: isolate_session_state: true A 
Start master session 48b812f9ca0d3ebf with config: isolate_session_state: true A 
Start master session 33048226cb131f4c with config: isolate_session_state: true A 
Start master session cab95e277a429f9d with config: isolate_session_state: true A 
Start master session 56b5d3296c9bfe15 with config: isolate_session_state: true A 
Start master session 3fdac64b285c365d with config: isolate_session_state: true A 
Start master session ec1fa14806ad9351 with config: isolate_session_state: true A 
...

知道我在做什么错吗?

0 个答案:

没有答案