Question

在caffe中，我希望能够同时预测多个标签，例如键盘箭头键：可以同时按下两个键。我试图在TM Nation Forever游戏中使用卷积神经网络驾驶虚拟F1赛车，我计划很快收集并整形训练数据，我想知道我是否做得对。

我认为这篇文章就如何在Python中进行这样的分类提供了一个很好的例子，我没有找到任何令人满意的例子来说明如何做到这一点。

有人可以确认这种收集和表示神经网络中的数据的方式将按照我的预期运行：

HDF5 python代码

comp_kwargs = {'compression': 'gzip', 'compression_opts': 1}

with h5py.File(train_filename, 'w') as f:
    f.create_dataset('data_img', data=X, **comp_kwargs)
    f.create_dataset('data_speed', data=S.astype(np.float_), **comp_kwargs)

    f.create_dataset('label_forward', data=f.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_backward', data=b.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_left', data=l.astype(np.int_), **comp_kwargs)
    f.create_dataset('label_right', data=r.astype(np.int_), **comp_kwargs)

with open(train_filename_list_txt, 'w') as f:
    f.write(train_filename + '\n')

有关HDF5数据形状的信息

输入：

data_img: 
-> number N x channel K x height H x width W

data_speed:
-> number N  x  1 float number (from 0.0 to 1.0)

输出：

注意：我使用numpy＆＃39; int _＆＃34;获得要分类的标签类。

label_forward:
-> number N  x  1 integer number (0 or 1)

label_backward:
-> number N  x  1 integer number (0 or 1)

label_left:
-> number N  x  1 integer number (0 or 1)

label_right:
-> number N  x  1 integer number (0 or 1)

卷积神经网络架构

我在这里提出了一些半相关的评论，如果它可以使其更高效，我将不胜感激任何关于网络架构的意见：）

import numpy as np

import caffe
from caffe import layers as L
from caffe import params as P

def cnn(hdf5, batch_size):
    n = caffe.NetSpec()
    n.data_img, n.data_speed, n.label_forward, n.label_backward, n.label_left, label_right = (
        L.HDF5Data(batch_size=batch_size, source=hdf5, ntop=6)
    )

    n.conv1 = L.Convolution(n.data, kernel_size=7, num_output=32, weight_filler=dict(type='xavier'))
    n.pool1 = L.Pooling(n.conv1, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop1 = L.Dropout(n.pool1, in_place=True)
    n.relu1 = L.ReLU(n.drop1, in_place=True)

    n.conv2 = L.Convolution(n.relu1, kernel_size=5, num_output=42, weight_filler=dict(type='xavier'))
    n.pool2 = L.Pooling(n.conv2, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop2 = L.Dropout(n.pool2, in_place=True)
    n.relu2 = L.ReLU(n.drop2, in_place=True)

    n.conv3 = L.Convolution(n.relu2, kernel_size=5, num_output=50, weight_filler=dict(type='xavier'))
    n.pool3 = L.Pooling(n.conv3, kernel_size=3, stride=2, pool=P.Pooling.MAX)
    n.drop3 = L.Dropout(n.pool3, in_place=True)
    n.relu3 = L.ReLU(n.drop3, in_place=True)

    n.conv4 = L.Convolution(n.relu3, kernel_size=3, num_output=64, weight_filler=dict(type='xavier'))
    n.pool4 = L.Pooling(n.conv4, kernel_size=3, stride=2, pool=P.Pooling.AVE)
    # Data of shape `batch_size*64*3*3` out of this layer (if dropout ignored), 
    # for a total of `batch_size*576` neurons.
    # Would you recommend to downsize this `3*3` feature map to `2*2`
    # or even `1*1` and to remove dropout at this level?
    n.drop4 = L.Dropout(n.pool4, in_place=True)
    n.relu4 = L.ReLU(n.drop4, in_place=True)

    n.join_speed = L.Concat(n.relu4, n.data_speed, in_place=True)
    # Note that I might be wrong on how the parameters are passed to the concat layer 
    n.ip1 = L.InnerProduct(n.join_speed, num_output=512, weight_filler=dict(type='xavier'))
    n.sig1 = L.Sigmoid(n.ip1, in_place=True)

    n.ip_f = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_f = L.Accuracy(n.ip_f, n.label_forward)
    n.loss_f = L.SoftmaxWithLoss(n.ip_f, n.label_forward)

    n.ip_b = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_b = L.Accuracy(n.ip_b, n.label_backward)
    n.loss_b = L.SoftmaxWithLoss(n.ip_b, n.label_backward)

    n.ip_l = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_l = L.Accuracy(n.ip_l, n.label_left)
    n.loss_l = L.SoftmaxWithLoss(n.ip_l, n.label_left)

    n.ip_r = L.InnerProduct(n.sig1, num_output=2, weight_filler=dict(type='xavier'))
    n.accuracy_r = L.Accuracy(n.ip_r, n.label_right)
    n.loss_r = L.SoftmaxWithLoss(n.ip_r, n.label_right)

    return n.to_proto()

with open('cnn_train.prototxt', 'w') as f:
    f.write(str(
        cnn(train_filename_list_txt, 100)
    ))

此外，我想一次只按下左箭头键或右箭头键中的一个。考虑到我将使用一些SoftmaxWithLossLayer

而不是在编程之后进行编程，而不是像这样融合label_right和label_left。

label_right:
-> number N  x  1 integer number (0 for left or 1 for right)

Answer 1

最后，我所做的是为该任务做的正确的事情，除了连接层可能不起作用，因为连接层的差异＆＃39;形状。我使用cifar-100数据集进行了测试，其中有粗标签和精细标签，并且效果很好。

Caffe：一次预测多个标签（在Python中）

HDF5 python代码

有关HDF5数据形状的信息

卷积神经网络架构

1 个答案: