Question

使用tensorflow函数py_func我有一些效率问题。

上下文

在我的项目中，我有一批大小input_features的张量[? max_items m]。第一个维度设置为?，因为它是动态形状（为自定义张量流读取器读取批次，并使用tf.train.shuffle_batch_join（）进行混洗）。第二个维度对应于上限（我可以为我的示例采用的最大项目数），第三个维度对应于要素维度空间。我还有一个张量num_items，它具有批量大小（因此形状为(?,)），表示示例中的项目数，其他设置为0（以numpy书写样式{{ 1}}）

问题

我的工作流程需要一些自定义的python操作（特别是对于处理索引，我需要或实例在一些示例上执行聚类操作）并且我使用包含在input_feature[k, num_items[k]:, :] = 0函数中的一些numpy函数。这很好用，但是训练变得非常慢（比没有这个py_func的模型慢大约50倍），并且函数本身并不耗时。

问题

1 - 此计算时间是否正常增加？包含在py_func中的函数给了我一个新的张量，在此过程中进一步增加。它能解释计算时间吗？（我的意思是用这种函数计算梯度可能更难）。

2 - 我尝试修改处理并避免使用py_func功能。但是，使用numpy索引（特别是我的数据格式化）提取数据非常方便，而且我有一些困难以TF方式传递它。例如，如果我的形状py_func具有形状t1（第一个维度是动态的batch_size），[-1, n_max, m]形状t2包含整数。是否有一种简单的方法可以在tensorflow中执行平均操作，这将导致[-1,2]形状为t_mean_chunk，其中（在一个numpy公式中）： (-1, m)？这是（在其他操作中）我在包装函数中所做的事情。

Answer 1

如果没有确切的py_func，问题1很难回答，但正如hpaulj在他的评论中提到的那样，它减慢了速度并不太令人惊讶。作为最坏情况的后备，tf.scan或tf.while_loop TensorArray可能会更快一些。但是，最好的情况是使用带有TensorFlow操作的矢量化解决方案，我认为在这种情况下是可行的。

至于问题2，我不确定它是否算简单，但这是一个计算索引表达式的函数：

import tensorflow as tf

def range_mean(index_ranges, values):
  """Take the mean of `values` along ranges specified by `index_ranges`.

  return[i, ...] = tf.reduce_mean(
    values[i, index_ranges[i, 0]:index_ranges[i, 1], ...], axis=0)

  Args:
    index_ranges: An integer Tensor with shape [N x 2]
    values: A Tensor with shape [N x M x ...].
  Returns:
    A Tensor with shape [N x ...] containing the means of `values` having
    indices in the ranges specified.
  """
  m_indices = tf.range(tf.shape(values)[1])[None]
  # Determine which parts of `values` will be in the result
  selected = tf.logical_and(tf.greater_equal(m_indices, index_ranges[:, :1]),
                            tf.less(m_indices, index_ranges[:, 1:]))
  n_indices = tf.tile(tf.range(tf.shape(values)[0])[..., None],
                      [1, tf.shape(values)[1]])
  segments = tf.where(selected, n_indices + 1, tf.zeros_like(n_indices))
  # Throw out segment 0, since that's our "not included" segment
  segment_sums = tf.unsorted_segment_sum(
      data=values,
      segment_ids=segments, 
      num_segments=tf.shape(values)[0] + 1)[1:]
  divisor = tf.cast(index_ranges[:, 1] - index_ranges[:, 0],
                    dtype=values.dtype)
  # Pad the shape of `divisor` so that it broadcasts against `segment_sums`.
  divisor_shape_padded = tf.reshape(
      divisor,
      tf.concat([tf.shape(divisor), 
                 tf.ones([tf.rank(values) - 2], dtype=tf.int32)], axis=0))
  return segment_sums / divisor_shape_padded

使用示例：

index_range_tensor = tf.constant([[2, 4], [1, 6], [0, 3], [0, 9]])
values_tensor = tf.reshape(tf.range(4 * 10 * 5, dtype=tf.float32), [4, 10, 5])
with tf.Session():
  tf_result = range_mean(index_range_tensor, values_tensor).eval()
  index_range_np = index_range_tensor.eval()
  values_np = values_tensor.eval()

for i in range(values_np.shape[0]):
  print("Slice {}: ".format(i),
        tf_result[i],
        numpy.mean(values_np[i, index_range_np[i, 0]:index_range_np[i, 1], :],
                   axis=0))

打印：

Slice 0:  [ 12.5  13.5  14.5  15.5  16.5] [ 12.5  13.5  14.5  15.5  16.5]
Slice 1:  [ 65.  66.  67.  68.  69.] [ 65.  66.  67.  68.  69.]
Slice 2:  [ 105.  106.  107.  108.  109.] [ 105.  106.  107.  108.  109.]
Slice 3:  [ 170.  171.  172.  173.  174.] [ 170.  171.  172.  173.  174.]

tensorflow py_func很方便但是我的训练步骤很慢。

1 个答案: