Question

This回答指向训练神经网络时Epoch和迭代之间的差异。但是，当我在斯坦福CS231n课程中查看求解器API的源代码时（我假设大多数库都是这种情况），在每次迭代期间，batch_size的示例数量为随机选择替换。因此，不能保证在每个时代都能看到所有的例子吗？

一个时代是否意味着所有的例子都会在期待中被看到？或者我理解这个错误？

相关源代码：

  def _step(self):
    """
    Make a single gradient update. This is called by train() and should not
    be called manually.
    """
    # Make a minibatch of training data
    num_train = self.X_train.shape[0]
    batch_mask = np.random.choice(num_train, self.batch_size)
    X_batch = self.X_train[batch_mask]
    y_batch = self.y_train[batch_mask]

    # Compute loss and gradient
    loss, grads = self.model.loss(X_batch, y_batch)
    self.loss_history.append(loss)

    # Perform a parameter update
    for p, w in self.model.params.iteritems():
      dw = grads[p]
      config = self.optim_configs[p]
      next_w, next_config = self.update_rule(w, dw, config)
      self.model.params[p] = next_w
      self.optim_configs[p] = next_config

  def train(self):
    """
    Run optimization to train the model.
    """
    num_train = self.X_train.shape[0]
    iterations_per_epoch = max(num_train / self.batch_size, 1)
    num_iterations = self.num_epochs * iterations_per_epoch

    for t in xrange(num_iterations):
      self._step()

      # Maybe print training loss
      if self.verbose and t % self.print_every == 0:
        print '(Iteration %d / %d) loss: %f' % (
               t + 1, num_iterations, self.loss_history[-1])

      # At the end of every epoch, increment the epoch counter and decay the
      # learning rate.
      epoch_end = (t + 1) % iterations_per_epoch == 0
      if epoch_end:
        self.epoch += 1
        for k in self.optim_configs:
          self.optim_configs[k]['learning_rate'] *= self.lr_decay

      # Check train and val accuracy on the first iteration, the last
      # iteration, and at the end of each epoch.
      first_it = (t == 0)
      last_it = (t == num_iterations + 1)
      if first_it or last_it or epoch_end:
        train_acc = self.check_accuracy(self.X_train, self.y_train,
                                        num_samples=1000)
        val_acc = self.check_accuracy(self.X_val, self.y_val)
        self.train_acc_history.append(train_acc)
        self.val_acc_history.append(val_acc)

        if self.verbose:
          print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                 self.epoch, self.num_epochs, train_acc, val_acc)

        # Keep track of the best model
        if val_acc > self.best_val_acc:
          self.best_val_acc = val_acc
          self.best_params = {}
          for k, v in self.model.params.iteritems():
            self.best_params[k] = v.copy()

    # At the end of training swap the best params into the model
    self.model.params = self.best_params

感谢。

Answer 1

我相信，正如你所说，在斯坦福大学的课程中，他们实际上正在使用“时代”，其含义不太严格，即“每个例子在培训期间看到的预期次数”。但是，根据我的经验，大多数实现都认为一个时代贯穿了训练集中的每个示例，并且我会说他们只选择了替换的简单方法。如果你有大量的数据，很可能你不会看到差异，但是，在没有更多的例子之前，没有替换的样本更为正确。

例如，您可以查看Keras在source code中进行培训的方式;它有点复杂，但重要的是调用make_batches将（可能是洗牌的）示例拆分成批次，这与你最初的“epoch”概念相符。

Epoch和迭代之间的澄清

1 个答案: