我们能否在数千个这样的视频上输入连续视频,同时包含正类和负类序列来训练LSTM?
我的总体目标是实时标记特定视频 场景(例如,如果我有0-100帧,而第30-60帧包含一些 瑜伽场景,我需要标记一下)
现在,我要遵循的方法是将视频分为正序序列和负序序列两类,并训练LSTM(在Mobnet CNN上,FC被LSTM层代替)。
但是与Mobnet相比,这在某种程度上没有任何改善 仅当我们对非分割视频进行评估时。
Mobnet和LSTM都经过单独培训。我将Mobnet(已删除FC)的输出保存在numpy数组中,然后读取这些数组以训练LSTM。
以下是用于此方法的代码示例:
epochs = 250
batch_size = 128
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
in_size = 1024
classes_no = 2
hidden_size = 512
layer_no = 2
self.lstm = nn.LSTM(in_size, hidden_size, layer_no, batch_first=True)
self.linear = nn.Linear(hidden_size, classes_no)
def forward(self, input_seq):
output_seq, _ = self.lstm(input_seq)
last_output = output_seq[:,-1]
class_predictions = self.linear(last_output)
return class_predictions
def nploader(npfile):
a = np.load(npfile)
return a
def train():
npdataloader = torchvision.datasets.DatasetFolder('./featrs/',
nploader, ['npy'], transform=None, target_transform=None)
data_loader = torch.utils.data.DataLoader(npdataloader,
batch_size=batch_size,
shuffle=False,
num_workers=1)
model = Model().cuda()
loss = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.8)
model.train()
for epoch in range(0, epochs):
for input_seq, target in data_loader:
optimizer.zero_grad()
output = model(input_seq.cuda())
err = loss(output.cuda(), target.cuda())
err.backward()
optimizer.step()
scheduler.step()
torch.save(model.state_dict(), 'lstm.ckpt')