Question

我目前正在使用Tensorflow和Keras（Python3）进行深度学习，并且我正在尝试实现高效的数据生成器。由于Keras也提供data generator implementation，我将训练速度与我的发电机进行了比较。

使用Keras cifar10 example，结果显示使用我的发生器时，训练速度比使用Keras数据发生器慢约20％。

所以我尝试使用Keras生成器代码来查看加速的来源，因为我的生成器基本上只是一个简单的迭代器，它返回一个没有任何预处理的图像批处理。

我将其追溯到函数random_transform：

def random_transform(self, x):
  # x is a single image, so it doesn't have image number at index 0
  img_row_index = self.row_index - 1
  img_col_index = self.col_index - 1
  img_channel_index = self.channel_index - 1

  # use composition of homographies to generate final transform that needs to be applied
  if self.rotation_range:
    theta = np.pi / 180 * np.random.uniform(-self.rotation_range, self.rotation_range)
  else:
    theta = 0
  rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],
                            [np.sin(theta), np.cos(theta), 0],
                            [0, 0, 1]])
  if self.height_shift_range:
    tx = np.random.uniform(-self.height_shift_range,   self.height_shift_range) * x.shape[img_row_index]
  else:
    tx = 0

  if self.width_shift_range:
    ty = np.random.uniform(-self.width_shift_range,   self.width_shift_range) * x.shape[img_col_index]
  else:
    ty = 0

  translation_matrix = np.array([[1, 0, tx],
                               [0, 1, ty],
                               [0, 0, 1]])
  if self.shear_range:
    shear = np.random.uniform(-self.shear_range, self.shear_range)
  else:
    shear = 0
  shear_matrix = np.array([[1, -np.sin(shear), 0],
                         [0, np.cos(shear), 0],
                         [0, 0, 1]])

  if self.zoom_range[0] == 1 and self.zoom_range[1] == 1:
    zx, zy = 1, 1
  else:
    zx, zy = np.random.uniform(self.zoom_range[0], self.zoom_range[1], 2)
  zoom_matrix = np.array([[zx, 0, 0],
                        [0, zy, 0],
                        [0, 0, 1]])

  transform_matrix = np.dot(np.dot(np.dot(rotation_matrix, translation_matrix), shear_matrix), zoom_matrix)

  h, w = x.shape[img_row_index], x.shape[img_col_index]
  transform_matrix = transform_matrix_offset_center(transform_matrix, h, w)
  x = apply_transform(x, transform_matrix, img_channel_index,
                    fill_mode=self.fill_mode, cval=self.cval)
  if self.channel_shift_range != 0:
    x = random_channel_shift(x, self.channel_shift_range, img_channel_index)

  if self.horizontal_flip:
    if np.random.random() < 0.5:
      x = flip_axis(x, img_col_index)

  if self.vertical_flip:
    if np.random.random() < 0.5:
      x = flip_axis(x, img_row_index)

  # TODO:
  # channel-wise normalization
  # barrel/fisheye
  return x

如果删除此功能中的某些代码，训练速度会降低。

然而我很困惑，因为我不知道用于增强目的的简单预处理（即原始图像的随机变换）可能会影响训练速度（例如，一个纪元的完成速度有多快）。

有人可以向我解释一下random_transform函数中的方法如何影响训练速度？

要重现它，请使用前面提到的cifar10示例并将图像生成器代码复制粘贴到此文件中。例如，删除函数apply_transform中对random_transform的调用时，一个纪元完成所需的时间会增加。

我用过：

Python 3.4
Tensorflow 0.11.0rc1
Keras 1.1.0
Cuda 8.0
1 GPU Nvidia GeForce 1070 GTX with Nvidia 367.44 Driver

使用原始代码完成一个批量大小为32的纪元花了22秒，当删除apply_transform（和我自己的生成器）时，时间增加到28秒。

编辑：我无法解释的一些额外行为：使用默认参数剪切，旋转和缩放是标识矩阵，因此它们不应对结果产生任何影响。正如预期的那样，在删除点积后，性能保持不变。但是，如果我删除三个矩阵的计算，性能会降低（25秒而不是22秒）！即使矩阵没有在其他地方使用......这是一个python / numpy代码优化错误吗？

Keras：数据增加会影响培训速度吗？

0 个答案: