Epoch和迭代之间的澄清

时间:2017-01-19 12:49:23

标签: neural-network epoch

This回答指向训练神经网络时Epoch和迭代之间的差异。但是,当我在斯坦福CS231n课程中查看求解器API的源代码时(我假设大多数库都是这种情况),在每次迭代期间,batch_size的示例数量为随机选择替换。因此,不能保证在每个时代都能看到所有的例子吗?

一个时代是否意味着所有的例子都会在期待中被看到?或者我理解这个错误?

相关源代码:

  def _step(self):
    """
    Make a single gradient update. This is called by train() and should not
    be called manually.
    """
    # Make a minibatch of training data
    num_train = self.X_train.shape[0]
    batch_mask = np.random.choice(num_train, self.batch_size)
    X_batch = self.X_train[batch_mask]
    y_batch = self.y_train[batch_mask]

    # Compute loss and gradient
    loss, grads = self.model.loss(X_batch, y_batch)
    self.loss_history.append(loss)

    # Perform a parameter update
    for p, w in self.model.params.iteritems():
      dw = grads[p]
      config = self.optim_configs[p]
      next_w, next_config = self.update_rule(w, dw, config)
      self.model.params[p] = next_w
      self.optim_configs[p] = next_config

  def train(self):
    """
    Run optimization to train the model.
    """
    num_train = self.X_train.shape[0]
    iterations_per_epoch = max(num_train / self.batch_size, 1)
    num_iterations = self.num_epochs * iterations_per_epoch

    for t in xrange(num_iterations):
      self._step()

      # Maybe print training loss
      if self.verbose and t % self.print_every == 0:
        print '(Iteration %d / %d) loss: %f' % (
               t + 1, num_iterations, self.loss_history[-1])

      # At the end of every epoch, increment the epoch counter and decay the
      # learning rate.
      epoch_end = (t + 1) % iterations_per_epoch == 0
      if epoch_end:
        self.epoch += 1
        for k in self.optim_configs:
          self.optim_configs[k]['learning_rate'] *= self.lr_decay

      # Check train and val accuracy on the first iteration, the last
      # iteration, and at the end of each epoch.
      first_it = (t == 0)
      last_it = (t == num_iterations + 1)
      if first_it or last_it or epoch_end:
        train_acc = self.check_accuracy(self.X_train, self.y_train,
                                        num_samples=1000)
        val_acc = self.check_accuracy(self.X_val, self.y_val)
        self.train_acc_history.append(train_acc)
        self.val_acc_history.append(val_acc)

        if self.verbose:
          print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                 self.epoch, self.num_epochs, train_acc, val_acc)

        # Keep track of the best model
        if val_acc > self.best_val_acc:
          self.best_val_acc = val_acc
          self.best_params = {}
          for k, v in self.model.params.iteritems():
            self.best_params[k] = v.copy()

    # At the end of training swap the best params into the model
    self.model.params = self.best_params

感谢。

1 个答案:

答案 0 :(得分:0)

我相信,正如你所说,在斯坦福大学的课程中,他们实际上正在使用“时代”,其含义不太严格,即“每个例子在培训期间看到的预期次数”。但是,根据我的经验,大多数实现都认为一个时代贯穿了训练集中的每个示例,并且我会说他们只选择了替换的简单方法。如果你有大量的数据,很可能你不会看到差异,但是,在没有更多的例子之前,没有替换的样本更为正确。

例如,您可以查看Kerassource code中进行培训的方式;它有点复杂,但重要的是调用make_batches将(可能是洗牌的)示例拆分成批次,这与你最初的“epoch”概念相符。