我在分布式模式下运行代码,而我的代码在异步模式下运行良好;但是代码无法在同步模式下运行。
opt = tf.train.MomentumOptimizer(learning_rate=lr_placeholder, momentum=0.9) opt=tf.train.SyncReplicasOptimizer(opt,replicas_to_aggregate=len(worker_hosts),total_num_replicas=len(worker_hosts),use_locking=True)
train_op = opt.minimize(full_loss, global_step=global_step)
val_op = validation_op(validation_step, vali_top1_error, vali_loss)
sync_replicas_hook = opt.make_session_run_hook(True)
init=tf.global_variables_initializer()
with training.MonitoredTrainingSession(master=server.target, is_chief=True,hooks=[sync_replicas_hook]) as sess:
回溯(最近通话最近): 文件“ /usr/local/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py”,第1行 292,在_do_call中返回fn(* args) 文件“ /usr/local/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py”,第1行 277,在_run_fn选项中,feed_dict,fetch_list,target_list,run_metadata) 文件“ /usr/local/python3/lib/python3.6/site-packages/tensorflow/python/client/session.py”,第1行 367,在_call_tf_sessionrun run_metadata中) tensorflow.python.framework.errors_impl.InvalidArgumentError:NodeDef缺少attr'reduction_type' 来自Op句柄:Ref(string); attr = dtype:type,allowed = [DT_FLOAT,DT_DOUBLE,DT_INT32,DT_UINT8,DT_INT16,...,DT_UINT16,DT_COMPLEX128,DT_HALF,DT_UINT32,DT_UINT64]; attr = shape:shape; attr = container:string,default =“”; attr = shared_name:string,default =“”; attr = reduction_type:string,default =“ MEAN”,allowed = [“” MEAN“,” SUM“]; is_stateful = true>; NodeDef:{{node sync_replicas / conditional_accumulator}} = ConditionalAccumulator_class = [“ loc:@ sync_replicas / SetGlobalStep”],container =“”,dtype = DT_FLOAT,shape = [3,3,3,16],shared_name =“ conv0 / conv:0 / grad_accum“,_ device =” / job:ps /副本:0 / task:0 / device:CPU:0“ 在处理上述异常期间,发生了另一个异常:
答案 0 :(得分:0)
tensorflow.python.framework.errors_impl.InvalidArgumentError:NodeDef缺少attr'reduction_type' 来自Op句柄:Ref(string); attr = dtype:type,allowed = [DT_FLOAT,DT_DOUBLE,DT_INT32,DT_UINT8,DT_INT16,...,DT_UINT16,DT_COMPLEX128,DT_HALF,DT_UINT32,DT_UINT64]; attr = shape:shape; attr = container:string,default =“”; attr = shared_name:string,default =“”; attr = reduction_type:string,default =“ MEAN”,allowed = [“” MEAN“,” SUM“]; is_stateful = true>; NodeDef:{{node sync_replicas / conditional_accumulator}} = ConditionalAccumulator_class = [“ loc:@ sync_replicas / SetGlobalStep”],container =“”,dtype = DT_FLOAT,shape = [3、3、3、16],shared_name =“ conv0 / conv:0 / grad_accum“,_ device =” / job:ps /副本:0 / task:0 / device:CPU:0“