Question

我正在尝试在实例级别进行计算。为了提高效率，当前的计划是获取一批数据（现在，batch_size = 1024），拆分该批数据，更新本地权重（1024次），然后推送至ps以在那里更新权重。

简化的代码段如下：

def build(self, input_paths, epochs=1, mode='train', variable_partitions=8, config=None):
    variable_partitions = 1
    self.global_step = tf.train.get_or_create_global_step()

    dataset = self.get_dataset(input_paths, mode=mode, epochs=epochs).repeat()
    dataset = dataset.prefetch(1)
    self.next_batch = dataset.make_one_shot_iterator().get_next()
    label, features = self.next_batch

    self.non_zero_i = features.values
    self.idx, _ = tf.unique(self.non_zero_i)
    self.sorted_idx = tf.contrib.framework.sort(self.idx)
    self.shape = self.sorted_idx.shape

    partitioner = tf.min_max_variable_partitioner(
            max_partitions=variable_partitions,
            min_slice_size=64 << 20)
    with tf.variable_scope(
        'linear',
        # values=tuple(six.itervalues(self.next_batch)),
        partitioner=partitioner):
        self.ps_parameters = tf.get_variable(name="psconstants", shape=(3, self.model_size), initializer=tf.zeros_initializer())

    # pull partial varibles from ps_parameters
    self.local_parameter = tf.gather(self.ps_parameters, self.sorted_idx, axis=1)
    # keep updating during training
    w_init = tf.reshape(tf.gather(self.local_parameter, [0]), [-1])
    self.w_init_var = tf.Variable(w_init, trainable=False, validate_shape=False)
    # keep clean to get final deltas
    init_w = tf.gather(self.local_parameter, [0])

    self.ops_list = []
    for i in range(self.batch_size):
        # fetch each record via indices
        # features is a sparse tensor, with non zero feature indices in values
        line = tf.sparse_slice(features, [i,0,0], [i, 1, self.model_size])
        self.ops_list.append(line)
        feas = line.values
        self.ops_list.append(feas)

        # inner loop
        initial_outputs = tf.TensorArray(dtype=tf.int64, size=lens)
        t = tf.constant(0)
        lens = tf.shape(feas, out_type=tf.int32)[0]
        def cond(t, *args):
            return t < lens
        def body():
            some computation

        t, _, outputs = tf.while_loop(cond, body, [t, other argus])
        outputs = doutputs.stack()
        self.ops_list.append(outputs)
        # OTHER COMPUTATIONS

我发现如果不使用ops_list = []将所有op附加到外部for循环中，则计算将正确运行，但只能运行一次，而不是batch_size次。但是，当我尝试将所有操作附加到数组中并最终使用

时

sess.run(self.ops_list)

返回错误：

UnimplementedError (see above for traceback): TensorArray has size zero, but element shape <unknown> is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.

我不知道该如何解决。还尝试使用嵌套的tf.while_loop，导致tf.sparse_splice操作出现另一个错误

TypeError: Expected int64, got list containing Tensors of type '_Message' instead.

我的情况是进行渐进式培训，但到目前为止找不到一个很好的榜样。

谢谢

使用嵌套循环时无法正确使用张量数组

0 个答案: