This回答指向训练神经网络时Epoch和迭代之间的差异。但是,当我在斯坦福CS231n课程中查看求解器API的源代码时(我假设大多数库都是这种情况),在每次迭代期间,batch_size的示例数量为随机选择替换。因此,不能保证在每个时代都能看到所有的例子吗?
一个时代是否意味着所有的例子都会在期待中被看到?或者我理解这个错误?
相关源代码:
def _step(self):
"""
Make a single gradient update. This is called by train() and should not
be called manually.
"""
# Make a minibatch of training data
num_train = self.X_train.shape[0]
batch_mask = np.random.choice(num_train, self.batch_size)
X_batch = self.X_train[batch_mask]
y_batch = self.y_train[batch_mask]
# Compute loss and gradient
loss, grads = self.model.loss(X_batch, y_batch)
self.loss_history.append(loss)
# Perform a parameter update
for p, w in self.model.params.iteritems():
dw = grads[p]
config = self.optim_configs[p]
next_w, next_config = self.update_rule(w, dw, config)
self.model.params[p] = next_w
self.optim_configs[p] = next_config
def train(self):
"""
Run optimization to train the model.
"""
num_train = self.X_train.shape[0]
iterations_per_epoch = max(num_train / self.batch_size, 1)
num_iterations = self.num_epochs * iterations_per_epoch
for t in xrange(num_iterations):
self._step()
# Maybe print training loss
if self.verbose and t % self.print_every == 0:
print '(Iteration %d / %d) loss: %f' % (
t + 1, num_iterations, self.loss_history[-1])
# At the end of every epoch, increment the epoch counter and decay the
# learning rate.
epoch_end = (t + 1) % iterations_per_epoch == 0
if epoch_end:
self.epoch += 1
for k in self.optim_configs:
self.optim_configs[k]['learning_rate'] *= self.lr_decay
# Check train and val accuracy on the first iteration, the last
# iteration, and at the end of each epoch.
first_it = (t == 0)
last_it = (t == num_iterations + 1)
if first_it or last_it or epoch_end:
train_acc = self.check_accuracy(self.X_train, self.y_train,
num_samples=1000)
val_acc = self.check_accuracy(self.X_val, self.y_val)
self.train_acc_history.append(train_acc)
self.val_acc_history.append(val_acc)
if self.verbose:
print '(Epoch %d / %d) train acc: %f; val_acc: %f' % (
self.epoch, self.num_epochs, train_acc, val_acc)
# Keep track of the best model
if val_acc > self.best_val_acc:
self.best_val_acc = val_acc
self.best_params = {}
for k, v in self.model.params.iteritems():
self.best_params[k] = v.copy()
# At the end of training swap the best params into the model
self.model.params = self.best_params
感谢。
答案 0 :(得分:0)
我相信,正如你所说,在斯坦福大学的课程中,他们实际上正在使用“时代”,其含义不太严格,即“每个例子在培训期间看到的预期次数”。但是,根据我的经验,大多数实现都认为一个时代贯穿了训练集中的每个示例,并且我会说他们只选择了替换的简单方法。如果你有大量的数据,很可能你不会看到差异,但是,在没有更多的例子之前,没有替换的样本更为正确。
例如,您可以查看Keras在source code中进行培训的方式;它有点复杂,但重要的是调用make_batches
将(可能是洗牌的)示例拆分成批次,这与你最初的“epoch”概念相符。