我正在尝试在实例级别进行计算。为了提高效率,当前的计划是获取一批数据(现在,batch_size = 1024),拆分该批数据,更新本地权重(1024次),然后推送至ps以在那里更新权重。
简化的代码段如下:
def build(self, input_paths, epochs=1, mode='train', variable_partitions=8, config=None):
variable_partitions = 1
self.global_step = tf.train.get_or_create_global_step()
dataset = self.get_dataset(input_paths, mode=mode, epochs=epochs).repeat()
dataset = dataset.prefetch(1)
self.next_batch = dataset.make_one_shot_iterator().get_next()
label, features = self.next_batch
self.non_zero_i = features.values
self.idx, _ = tf.unique(self.non_zero_i)
self.sorted_idx = tf.contrib.framework.sort(self.idx)
self.shape = self.sorted_idx.shape
partitioner = tf.min_max_variable_partitioner(
max_partitions=variable_partitions,
min_slice_size=64 << 20)
with tf.variable_scope(
'linear',
# values=tuple(six.itervalues(self.next_batch)),
partitioner=partitioner):
self.ps_parameters = tf.get_variable(name="psconstants", shape=(3, self.model_size), initializer=tf.zeros_initializer())
# pull partial varibles from ps_parameters
self.local_parameter = tf.gather(self.ps_parameters, self.sorted_idx, axis=1)
# keep updating during training
w_init = tf.reshape(tf.gather(self.local_parameter, [0]), [-1])
self.w_init_var = tf.Variable(w_init, trainable=False, validate_shape=False)
# keep clean to get final deltas
init_w = tf.gather(self.local_parameter, [0])
self.ops_list = []
for i in range(self.batch_size):
# fetch each record via indices
# features is a sparse tensor, with non zero feature indices in values
line = tf.sparse_slice(features, [i,0,0], [i, 1, self.model_size])
self.ops_list.append(line)
feas = line.values
self.ops_list.append(feas)
# inner loop
initial_outputs = tf.TensorArray(dtype=tf.int64, size=lens)
t = tf.constant(0)
lens = tf.shape(feas, out_type=tf.int32)[0]
def cond(t, *args):
return t < lens
def body():
some computation
t, _, outputs = tf.while_loop(cond, body, [t, other argus])
outputs = doutputs.stack()
self.ops_list.append(outputs)
# OTHER COMPUTATIONS
我发现如果不使用ops_list = []将所有op附加到外部for循环中,则计算将正确运行,但只能运行一次,而不是batch_size次。但是,当我尝试将所有操作附加到数组中并最终使用
时sess.run(self.ops_list)
返回错误:
UnimplementedError (see above for traceback): TensorArray has size zero, but element shape <unknown> is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
我不知道该如何解决。还尝试使用嵌套的tf.while_loop,导致tf.sparse_splice操作出现另一个错误
TypeError: Expected int64, got list containing Tensors of type '_Message' instead.
我的情况是进行渐进式培训,但到目前为止找不到一个很好的榜样。
谢谢