我正在使用200亿个小丑的数据集构建手势识别系统。目前,我只参加4个班级。数据集包含以12帧/秒的速度从视频中提取的图像。我建立了2个模型,分别是3D-CNN和CNN-LSTM,但仅使用keras和Tensorflow时,其精度大约为25-30%。
Dataset looks like X_train = (651, 1, 128, 128, 22)
X_valid=(260, 1, 128, 128, 22)
Y_train=(651, 4)
Y_valid = (260, 4)
1个通道的图像大小为128 * 128,我一次附加22个图像,样本总数为651。 3d-CNN架构
model = Sequential()
model.add(Convolution3D(32, (3, 3, 3), strides=(1, 1, 1), input_shape=(1, img_rows, img_cols, img_depth),
activation='relu', data_format='channels_first'))
model.add(MaxPooling3D((3, 3, 3), data_format='channels_first'))
model.add(Dropout(0.5))
model.add(Flatten())
model.add(Dense(512, activation='sigmoid'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes, kernel_initializer='normal'))
model.add(Activation('softmax'))
CNN-LSTM模型架构
(651, 22, 128, 128, 1)
(260, 22, 128, 128, 1)
(651, 4)
(260, 4)
model = Sequential()
model.add(TimeDistributed(Conv2D(32, (7, 7), strides=(2, 2),
activation='relu', padding='same'), input_shape=input_shape))
model.add(TimeDistributed(Conv2D(32, (3, 3),
kernel_initializer="he_normal", activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(64, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(Conv2D(64, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(128, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(Conv2D(128, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(256, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(Conv2D(256, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Conv2D(512, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(Conv2D(512, (3, 3),
padding='same', activation='relu')))
model.add(TimeDistributed(MaxPooling2D((2, 2), strides=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(Dropout(0.5))
model.add(LSTM(256, return_sequences=False, dropout=0.5))
model.add(Dense(nb_classes, activation='softmax'))
数据集与每个类的100个手势保持平衡,我将模型运行了100个纪元。架构正确吗?对于提供图像,我也有疑问。当我处理视频时,我需要一个额外的维度来提取时间特征,因此首先将1张数组中的22张图像附加并保持相似地添加,这是正确的方法吗?首先,我也尝试了12,而不是22,但结果相同。
答案 0 :(得分:1)
以下是一些建议