Question

我正在处理一个非常大的数据集，其中包含数百个要用作训练的长视频，并且我正在使用 Google Colab 执行一些测试。我写的整个代码非常简单，并且使用了 PyTorch。当我尝试执行训练时，如果我一次使用超过 200 个视频，则在训练期间 RAM 已满并且 Colab 崩溃。我注意到，如果我使用较少数量的训练视频进行训练，则不会发生这种情况。

出于这个原因，我认为我的模型可以逐步训练以创建如下结构：

model = torch.nn.Sequential( # create a model
    ...
    nn.Softmax(dim=1)
)

MAX_VIDEOS_PER_BATCH = 100
for current_batch in range (0, TOTAL_BATCHES): # Perform TOTAL_BATCHES trainings
    videos = []
    labels = []
    for index, video_file_name in enumerate(os.listdir(VIDEOS_DIR)): # Read 100 videos as training set
        if index < MAX_VIDEOS_PER_BATCH * current_batch:
           continue

        ... # read the video and add it to videos
        ... # add the considered labels to videos list


    video_training = torch.tensor(np.asarray(videos)).float() # (batch x frames x channels x height x width)
    learning_rate = 1e-4
    for t in range(ITERATIONS): # Train the model, if I already trained it the model is not resetted
        y_pred = model(torch.FloatTensor(np.asarray(video_training )))

        loss = loss_fn(y_pred, torch.tensor(labels))
        print("#" + str(t), " loss:" + str(loss.item()))

        model.zero_grad()
        loss.backward()
        with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

我的问题是，这个方法正确吗？我正在以正确的方式训练网络，还是这种批量方法会对模型造成一些损害或偏差？

当我从第 1 批转到第 2 批时，模型不会丢失之前训练的知识，对吗？

Answer 1

这是正确的，但最好的方法是反过来。

增量神经网络训练

1 个答案: