我正在训练神经网络以进行语音情感识别:
100输入图层大小。
25隐藏图层大小。
6个标签(输出层)。
我已将数据集划分为训练集和测试集,然后使用MLFCC(梅尔频率倒谱系数)从语音中提取特征,其返回具有不同大小的矩阵。所以,我每次都使用它们的100个特征。
训练集的准确率为100%,但是当涉及到测试集时,它的准确率约为30-40%。
我仍然不太了解这个领域,但显然过度拟合了这个问题(也许不是,但这就是我所学到的)。我做了一些调整以避免这个问题:
增加lambda,减少特征数量,添加额外的隐藏层。准确度越来越好,但绝不会超过40%。
会出现什么问题?
以下是MLFCC的实施:
function [cepstra,aspectrum,pspectrum] = melfcc(samples, sr, varargin)
if nargin < 2; sr = 16000; end
% Parse out the optional arguments
[wintime, hoptime, numcep, lifterexp, sumpower, preemph, dither, ...
minfreq, maxfreq, nbands, bwidth, dcttype, fbtype, usecmp, modelorder, ...
broaden, useenergy] = ...
process_options(varargin, 'wintime', 0.025, 'hoptime', 0.010, ...
'numcep', 13, 'lifterexp', 0.6, 'sumpower', 1, 'preemph', 0.97, ...
'dither', 0, 'minfreq', 0, 'maxfreq', 4000, ...
'nbands', 40, 'bwidth', 1.0, 'dcttype', 2, ...
'fbtype', 'mel', 'usecmp', 0, 'modelorder', 0, ...
'broaden', 0, 'useenergy', 0);
if preemph ~= 0
samples = filter([1 -preemph], 1, samples);
end
% Compute FFT power spectrum
[pspectrum,logE] = powspec(samples, sr, wintime, hoptime, dither);
aspectrum = audspec(pspectrum, sr, nbands, fbtype, minfreq, maxfreq, sumpower, bwidth);
if (usecmp)
% PLP-like weighting/compression
aspectrum = postaud(aspectrum, maxfreq, fbtype, broaden);
end
if modelorder > 0
if (dcttype ~= 1)
disp(['warning: plp cepstra are implicitly dcttype 1 (not ', num2str(dcttype), ')']);
end
% LPC analysis
lpcas = dolpc(aspectrum, modelorder);
% convert lpc to cepstra
cepstra = lpc2cep(lpcas, numcep);
% Return the auditory spectrum corresponding to the cepstra?
% aspectrum = lpc2spec(lpcas, nbands);
% else return the aspectrum that the cepstra are based on, prior to PLP
else
% Convert to cepstra via DCT
cepstra = spec2cep(aspectrum, numcep, dcttype);
end
cepstra = lifter(cepstra, lifterexp);
if useenergy
cepstra(1,:) = logE;
end
这是我的实施:
clear ; close all; clc
[input,output]=gettingPatterns;
input_layer_size = 70;
hidden_layer_size = 100;
hidden2_layer_size = 25;
num_labels = 6;
fu = [input output];size(fu)
fu=fu(randperm(size(fu,1)),:);
input = fu(:,1:70);
output = fu (:,71:76);
crossIn = input(201:240,:);
crossOut=output(201:240,:);
trainIn = input(1:200,:);
trainOut=output(1:200,:);
Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
Theta2 = randInitializeWeights(hidden_layer_size,hidden2_layer_size);
Theta3 =randInitializeWeights(hidden2_layer_size,num_labels);
initial_nn_params = [Theta1(:) ; Theta2(:);Theta3(:)];
size(initial_nn_params)
options = optimset('MaxIter',1000);
% You should also try different values of lambda
lambda=1;
costFunction = @(p) nnCostFunction(p, ...
input_layer_size, ...
hidden_layer_size, ...
hidden2_layer_size,num_labels, trainIn, trainOut, lambda);
[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);
num_labels, (hidden_layer_size + 1));
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
hidden_layer_size, (input_layer_size + 1));
Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):( (hidden_layer_size * (input_layer_size + 1)))+(hidden2_layer_size*(hidden_layer_size+1))), ...
hidden2_layer_size, (hidden_layer_size + 1));
Theta3 = reshape(nn_params(((1 + (hidden_layer_size * (input_layer_size + 1)))+(hidden2_layer_size*(hidden_layer_size+1))):end), ...
num_labels, (hidden2_layer_size + 1));
%[error_train, error_val] = learningCurve(trainIn, trainOut, crossIn, crossOut, lambda,input_layer_size,hidden_layer_size,num_labels);
pred = predict(Theta1, Theta2,Theta3,trainIn);
[dummy, p] = max(trainOut, [], 2);
[pred trainOut]
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == p)) * 100);
pred = predict(Theta1, Theta2,Theta3,crossIn);
[pred crossOut]
[dummy, p] = max(crossOut, [], 2);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == p)) * 100);
以下是获取模式的代码:
function [ input,output ] = gettingPatterns()
myFolder='C:\Users\ahmed\Documents\MATLAB\New Folder (3)\homeWork\speech';
filePattern=fullfile(myFolder,'*.wav');
wavFiles=dir(filePattern);
output=[];
input=[];
for k = 1:length(wavFiles)
sampleOutput=zeros(1,6);
baseFileName = wavFiles(k).name;
if baseFileName(3:5)=='ang',sampleOutput(1)=1;,end;
if baseFileName(3:5)=='fea',sampleOutput(2)=1;,end;
if baseFileName(3:5)=='bor',sampleOutput(3)=1;,end;
if baseFileName(3:5)=='sad',sampleOutput(4)=1;,end;
if baseFileName(3:5)=='joy',sampleOutput(5)=1;,end;
if baseFileName(3:5)=='neu',sampleOutput(6)=1;,end;
output(k,:)=sampleOutput;
fullFileName = fullfile(myFolder, baseFileName);
wavArray = wavread(fullFileName);
[cepstra,xxx]=melfcc(wavArray);
[m,n]=size(cepstra);
reshapedArray=reshape(cepstra,m*n,1);
smalledArray=small(reshapedArray,70);
%Normalized Features
%x(i)=(x(i)-mean)/std;
normalizedFeatures=[];
for i=1:length(smalledArray)
normalizedFeatures(i)=(smalledArray(i)-mean(smalledArray))/std(smalledArray);
end
input(k,:)=normalizedFeatures;
end
请注意以下事项:
我在nn工具箱上完成了这个测试,得到了相同的结果。自己实现它的唯一原因是能够添加额外的隐藏层。
成本函数,前向传播和后向传播的实现是100%正确的,所以我没有在这个问题中包含它们。