我正在做手势识别,我将视频分为几帧。
我已经使用CNN3D
Layer (type) Output Shape Param #
=================================================================
conv3d_1 (Conv3D) (None, 10, 116, 116, 32) 12032
_________________________________________________________________
max_pooling3d_1 (MaxPooling3 (None, 5, 58, 58, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 5, 58, 58, 32) 0
_________________________________________________________________
conv3d_2 (Conv3D) (None, 3, 56, 56, 64) 55360
_________________________________________________________________
max_pooling3d_2 (MaxPooling3 (None, 1, 28, 28, 64) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 1, 28, 28, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 50176) 0
_________________________________________________________________
dense_1 (Dense) (None, 256) 12845312
_________________________________________________________________
dropout_3 (Dropout) (None, 256) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 32896
_________________________________________________________________
dropout_4 (Dropout) (None, 128) 0
_________________________________________________________________
dense_3 (Dense) (None, 5) 645
categorical_accuracy 与 val_categorical_accuracy 的关系图看起来像
这可能是什么原因?我想念什么或做错什么了?