我正在努力将CNN的验证准确率从76%(目前)提高到90%以上。我将在下面显示有关CNN性能和配置的所有信息。
实质上,我希望我的CNN能够区分两类mel光谱图:
以下是损失与纪元的关系图
最后,这是模型架构配置
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=(3, 640, 480)))
model.add(Conv2D(64, (3, 3), activation='relu', dim_ordering="th"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(2, activation='softmax'))
以下是我对model.compile()和model.fit()
的调用model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.001),
metrics=['accuracy'])
print("Compiled model")
history = model.fit(X_train, Y_train,
batch_size=8,
epochs=50,
verbose=1,
validation_data=(X_test, Y_test))
如何更改CNN配置以提高验证准确度分数?
我尝试过的事情:
答案 0 :(得分:1)
首先,您必须获取music_tagger_cnn.py并将其放入项目路径中。之后,您可以构建模型:
from music_tagger_cnn import *
input_tensor = Input(shape=(1, 18, 119))
model =MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')
您可以按所需尺寸更改输入张量... 我通常使用Theano暗淡的顺序,但Tensorflow作为后端,所以这就是原因:
from keras import backend as K
K.set_image_dim_ordering('th')
使用Theano暗淡排序,您必须考虑到样本维度的顺序必须更改
X_train = X_train.transpose(0, 3, 2, 1)
X_val = X_val.transpose(0, 3, 2, 1)
之后你必须冻结那些你不想更新的图层
for layer in model.layers:
layer.trainable = False
现在您可以设置自己的输出,例如:
last_layer = model.get_layer('pool3').output
out = Flatten()(last_layer)
out = Dense(128, activation='relu', name='fc2')(out)
out = Dropout(0.5)(out)
out = Dense(n_classes, activation='softmax', name='fc3')(out)
model = Model(input=model.input, output=out)
之后你必须能够训练它:
sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = model.fit(X_train, labels_train,
validation_data=(X_val, labels_val), nb_epoch=100, batch_size=5)
请注意,标签应采用单热编码
我希望它会有所帮助!!
更新:发布代码,以便我可以帮助调试这些行并防止崩溃。
input_tensor = Input(shape=(3, 640, 480))
model = MusicTaggerCNN(input_tensor=input_tensor, include_top=False, weights='msd')
for layer in model.layers:
layer.trainable = False
last_layer = model.get_layer('pool3').output
out = Flatten()(last_layer)
out = Dense(128, activation='relu', name='fc2')(out)
out = Dropout(0.5)(out)
out = Dense(n_classes, activation='softmax', name='fc3')(out)
model = Model(input=model.input, output=out)
sgd = SGD(lr=0.01, momentum=0, decay=0.002, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = model.fit(X_train, labels_train,
validation_data=(X_test, Y_test), nb_epoch=100, batch_size=5)
编辑#2
# -*- coding: utf-8 -*-
'''MusicTaggerCNN model for Keras.
# Reference:
- [Automatic tagging using deep convolutional neural networks](https://arxiv.org/abs/1606.00298)
- [Music-auto_tagging-keras](https://github.com/keunwoochoi/music-auto_tagging-keras)
'''
from __future__ import print_function
from __future__ import absolute_import
from keras import backend as K
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import ELU
from keras.utils.data_utils import get_file
from keras.layers import Input, Dense
TH_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_theano.h5'
TF_WEIGHTS_PATH = 'https://github.com/keunwoochoi/music-auto_tagging-keras/blob/master/data/music_tagger_cnn_weights_tensorflow.h5'
def MusicTaggerCNN(weights='msd', input_tensor=None,
include_top=True):
'''Instantiate the MusicTaggerCNN architecture,
optionally loading weights pre-trained
on Million Song Dataset. Note that when using TensorFlow,
for best performance you should set
`image_dim_ordering="tf"` in your Keras config
at ~/.keras/keras.json.
The model and the weights are compatible with both
TensorFlow and Theano. The dimension ordering
convention used by the model is the one
specified in your Keras config file.
For preparing mel-spectrogram input, see
`audio_conv_utils.py` in [applications](https://github.com/fchollet/keras/tree/master/keras/applications).
You will need to install [Librosa](http://librosa.github.io/librosa/)
to use it.
# Arguments
weights: one of `None` (random initialization)
or "msd" (pre-training on ImageNet).
input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
to use as image input for the model.
include_top: whether to include the 1 fully-connected
layer (output layer) at the top of the network.
If False, the network outputs 256-dim features.
# Returns
A Keras model instance.
'''
if weights not in {'msd', None}:
raise ValueError('The `weights` argument should be either '
'`None` (random initialization) or `msd` '
'(pre-training on Million Song Dataset).')
# Determine proper input shape
if K.image_dim_ordering() == 'th':
input_shape = (3, 640, 480)
else:
input_shape = (3, 640, 480)
if input_tensor is None:
melgram_input = Input(shape=input_shape)
else:
if not K.is_keras_tensor(input_tensor):
melgram_input = Input(tensor=input_tensor, shape=input_shape)
else:
melgram_input = input_tensor
# Determine input axis
if K.image_dim_ordering() == 'th':
channel_axis = 1
freq_axis = 2
time_axis = 3
else:
channel_axis = 3
freq_axis = 1
time_axis = 2
# Input block
x = BatchNormalization(axis=freq_axis, name='bn_0_freq')(melgram_input)
# Conv block 1
x = Convolution2D(64, 3, 3, border_mode='same', name='conv1')(x)
x = BatchNormalization(axis=channel_axis, mode=0, name='bn1')(x)
x = ELU()(x)
x = MaxPooling2D(pool_size=(2, 4), name='pool1')(x)
# Conv block 2
x = Convolution2D(128, 3, 3, border_mode='same', name='conv2')(x)
x = BatchNormalization(axis=channel_axis, mode=0, name='bn2')(x)
x = ELU()(x)
x = MaxPooling2D(pool_size=(2, 4), name='pool2')(x)
# Conv block 3
x = Convolution2D(128, 3, 3, border_mode='same', name='conv3')(x)
x = BatchNormalization(axis=channel_axis, mode=0, name='bn3')(x)
x = ELU()(x)
x = MaxPooling2D(pool_size=(2, 4), name='pool3')(x)
# Output
x = Flatten()(x)
if include_top:
x = Dense(50, activation='sigmoid', name='output')(x)
# Create model
model = Model(melgram_input, x)
if weights is None:
return model
else:
# Load input
if K.image_dim_ordering() == 'tf':
raise RuntimeError("Please set image_dim_ordering == 'th'."
"You can set it at ~/.keras/keras.json")
model.load_weights('data/music_tagger_cnn_weights_%s.h5' % K._BACKEND,
by_name=True)
return model
编辑#3
我尝试使用MusicTaggerCRNN作为melgrams的特征提取器的keras示例。然后我训练了一个简单的NN,它有2个密集层和一个二进制输出。我的示例中采用的示例不适用于您的情况,但它也是二进制分类器
我使用keras==1.2.2
和tensorflow-gpu==1.0.0
并为我工作。
以下是代码:
from keras.applications.music_tagger_crnn import MusicTaggerCRNN
from keras.applications.music_tagger_crnn import preprocess_input, decode_predictions
import numpy as np
from keras.layers import Input, Dense
from keras.models import Model
from keras.layers import Dense, Dropout, Flatten
from keras.optimizers import SGD
model = MusicTaggerCRNN(weights='msd', include_top=False)
#Samples simulation
audio_paths_train = ['data/genres/blues/blues.00000.au','data/genres/classical/classical.00000.au','data/genres/classical/classical.00002.au', 'data/genres/blues/blues.00003.au']
audio_paths_test = ['data/genres/blues/blues.00001.au', 'data/genres/classical/classical.00001.au', 'data/genres/blues/blues.00002.au', 'data/genres/classical/classical.00003.au']
labels_train = [0,1,1,0]
labels_test = [0, 1, 0, 1]
melgrams_train = [preprocess_input(audio_path) for audio_path in audio_paths_train]
melgrams_test = [preprocess_input(audio_path) for audio_path in audio_paths_test]
feats_train = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_train]
feats_test = [model.predict(np.expand_dims(melgram, axis=0)) for melgram in melgrams_test]
feats_train = np.array(feats_train)
feats_test = np.array(feats_test)
_input = Input(shape=(1,32))
x = Flatten(name='flatten')(_input)
x = Dense(128, activation='relu', name='fc6')(x)
x = Dense(64, activation='relu', name='fc7')(x)
x = Dense(1, activation='softmax', name='fc8')(x)
class_model = Model(_input, x)
sgd = SGD(lr=0.01, momentum=0, decay=0.02, nesterov=True)
class_model.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
history = class_model.fit(feats_train, labels_train, validation_data=(feats_test, labels_test), nb_epoch=100, batch_size=5, class_weight='auto')
print(history.history['acc'])
# Final evaluation of the model
scores = class_model.evaluate(feats_test, labels_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1] * 100))
答案 1 :(得分:0)
退出: 事实证明,简单的Dropout对CNN无效。尝试将第一个Dropout层更改为SpatialDropout2D。第二个Dropout仅用于标准Dense层,因此这是正确的Dropout。 CNN网络创建了一系列重叠的金字塔。标准辍学只是将其中一些金字塔变成了树桩。 SpatialDropout完成将一些金字塔清零,这是一种更好的“忘记”方式。
重复的结构: 使用Conv2D-> Conv2D-> SpatialDropout链的几个不同副本,以使用更多过滤器缩小尺寸的图像。所有成功的图像处理设计都使用多个重复块。
您的数据: 光谱图具有两个不同的度量,因为“图像”的尺寸不同。图像处理的常规设计是将所有相邻像素均视为对一个输出功能的贡献相同。可能是您希望一行中有很多像素对要素图有所贡献,但只有一两行。因此,也许每个金字塔都需要长而窄的基座。