我正在尝试针对本文所示的时间序列分类训练一维ConvNet(请参见图1b的FCN)https://arxiv.org/pdf/1611.06455.pdf
Keras实现为我提供了卓越的性能。有人可以解释为什么会这样吗?
Pytorch的代码如下:
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv1d(x_train.shape[1], 128, 8)
self.bnorm1 = nn.BatchNorm1d(128)
self.conv2 = nn.Conv1d(128, 256, 5)
self.bnorm2 = nn.BatchNorm1d(256)
self.conv3 = nn.Conv1d(256, 128, 3)
self.bnorm3 = nn.BatchNorm1d(128)
self.dense = nn.Linear(128, nb_classes)
def forward(self, x):
c1=self.conv1(x)
b1 = F.relu(self.bnorm1(c1))
c2=self.conv2(b1)
b2 = F.relu(self.bnorm2(c2))
c3=self.conv3(b2)
b3 = F.relu(self.bnorm3(c3))
output = torch.mean(b3, 2)
dense1=self.dense(output)
return F.softmax(dense1)
model = Net()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.5, momentum=0.99)
losses=[]
for t in range(1000):
y_pred_1= model(x_train.float())
loss_1 = criterion(y_pred_1, y_train.long())
print(t, loss_1.item())
optimizer.zero_grad()
loss_1.backward()
optimizer.step()
为进行比较,我将以下代码用于Keras:
x = keras.layers.Input(x_train.shape[1:])
conv1 = keras.layers.Conv1D(128, 8, padding='valid')(x)
conv1 = keras.layers.BatchNormalization()(conv1)
conv1 = keras.layers.Activation('relu')(conv1)
conv2 = keras.layers.Conv1D(256, 5, padding='valid')(conv1)
conv2 = keras.layers.BatchNormalization()(conv2)
conv2 = keras.layers.Activation('relu')(conv2)
conv3 = keras.layers.Conv1D(128, 3, padding='valid')(conv2)
conv3 = keras.layers.BatchNormalization()(conv3)
conv3 = keras.layers.Activation('relu')(conv3)
full = keras.layers.GlobalAveragePooling1D()(conv3)
out = keras.layers.Dense(nb_classes, activation='softmax')(full)
model = keras.models.Model(inputs=x, outputs=out)
optimizer = keras.optimizers.SGD(lr=0.5, decay=0.0, momentum=0.99)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
hist = model.fit(x_train, Y_train, batch_size=x_train.shape[0], nb_epoch=2000)
我看到的唯一的区别是初始化,但是结果却大不相同。作为参考,我对两个数据集使用相同的预处理,但在输入形状,Pytorch(Batch_Size,Channels,Length)和Keras:(Batch_Size,Length,Channels)的输入形状上有细微的差别。
答案 0 :(得分:1)
结果不同的原因是由于图层和优化器的默认参数不同。例如,在{{1}中的pytorch
中的decay-rate
被认为是batch-norm
,而在0.9
中则是keras
。这样,默认参数可能会有其他变化。
如果使用相同的参数和固定的随机种子进行初始化,则两个库的结果不会有太大差异。