情绪语音识别的准确度低

时间:2016-03-04 08:43:52

标签: matlab neural-network

我正在训练神经网络以进行语音情感识别:

100输入图层大小。

25隐藏图层大小。

6个标签(输出层)。

我已将数据集划分为训练集和测试集,然后使用MLFCC(梅尔频率倒谱系数)从语音中提取特征,其返回具有不同大小的矩阵。所以,我每次都使用它们的100个特征。

训练集的准确率为100%,但是当涉及到测试集时,它的准确率约为30-40%。

我仍然不太了解这个领域,但显然过度拟合了这个问题(也许不是,但这就是我所学到的)。我做了一些调整以避免这个问题:

增加lambda,减少特征数量,添加额外的隐藏层。准确度越来越好,但绝不会超过40%。

会出现什么问题?

以下是MLFCC的实施:

function [cepstra,aspectrum,pspectrum] = melfcc(samples, sr, varargin)


if nargin < 2;   sr = 16000;    end

% Parse out the optional arguments
[wintime, hoptime, numcep, lifterexp, sumpower, preemph, dither, ...
 minfreq, maxfreq, nbands, bwidth, dcttype, fbtype, usecmp, modelorder, ...
 broaden, useenergy] = ...
    process_options(varargin, 'wintime', 0.025, 'hoptime', 0.010, ...
          'numcep', 13, 'lifterexp', 0.6, 'sumpower', 1, 'preemph', 0.97, ...
      'dither', 0, 'minfreq', 0, 'maxfreq', 4000, ...
      'nbands', 40, 'bwidth', 1.0, 'dcttype', 2, ...
      'fbtype', 'mel', 'usecmp', 0, 'modelorder', 0, ...
          'broaden', 0, 'useenergy', 0);

if preemph ~= 0
  samples = filter([1 -preemph], 1, samples);
end

% Compute FFT power spectrum
[pspectrum,logE] = powspec(samples, sr, wintime, hoptime, dither);

aspectrum = audspec(pspectrum, sr, nbands, fbtype, minfreq, maxfreq, sumpower, bwidth);

if (usecmp)
  % PLP-like weighting/compression
  aspectrum = postaud(aspectrum, maxfreq, fbtype, broaden);
end

if modelorder > 0

  if (dcttype ~= 1) 
    disp(['warning: plp cepstra are implicitly dcttype 1 (not ', num2str(dcttype), ')']);
  end

  % LPC analysis 
  lpcas = dolpc(aspectrum, modelorder);

  % convert lpc to cepstra
  cepstra = lpc2cep(lpcas, numcep);

  % Return the auditory spectrum corresponding to the cepstra?
%  aspectrum = lpc2spec(lpcas, nbands);
  % else return the aspectrum that the cepstra are based on, prior to PLP

else

  % Convert to cepstra via DCT
  cepstra = spec2cep(aspectrum, numcep, dcttype);

end

cepstra = lifter(cepstra, lifterexp);

if useenergy
  cepstra(1,:) = logE;
end

这是我的实施:

clear ; close all; clc

[input,output]=gettingPatterns; 

input_layer_size  = 70; 
hidden_layer_size = 100; 
hidden2_layer_size = 25;                         
num_labels = 6;          

fu = [input output];size(fu)
fu=fu(randperm(size(fu,1)),:); 
input = fu(:,1:70);
output = fu (:,71:76);


crossIn = input(201:240,:);
crossOut=output(201:240,:);
trainIn = input(1:200,:);
trainOut=output(1:200,:);

Theta1 = randInitializeWeights(input_layer_size, hidden_layer_size);
Theta2 = randInitializeWeights(hidden_layer_size,hidden2_layer_size);
Theta3  =randInitializeWeights(hidden2_layer_size,num_labels);

initial_nn_params = [Theta1(:) ; Theta2(:);Theta3(:)];
size(initial_nn_params)
options = optimset('MaxIter',1000);

%  You should also try different values of lambda

   lambda=1;        

costFunction = @(p) nnCostFunction(p, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   hidden2_layer_size,num_labels, trainIn, trainOut, lambda);    

[nn_params, cost] = fmincg(costFunction, initial_nn_params, options);

               num_labels, (hidden_layer_size + 1));
Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):( (hidden_layer_size * (input_layer_size + 1)))+(hidden2_layer_size*(hidden_layer_size+1))), ...
                 hidden2_layer_size, (hidden_layer_size + 1));
Theta3 = reshape(nn_params(((1 + (hidden_layer_size * (input_layer_size + 1)))+(hidden2_layer_size*(hidden_layer_size+1))):end), ...
                num_labels, (hidden2_layer_size + 1));
 %[error_train, error_val] =  learningCurve(trainIn, trainOut, crossIn, crossOut, lambda,input_layer_size,hidden_layer_size,num_labels);    

pred = predict(Theta1, Theta2,Theta3,trainIn);
[dummy, p] = max(trainOut, [], 2);
[pred trainOut]
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == p)) * 100);    

pred = predict(Theta1, Theta2,Theta3,crossIn);

[pred crossOut]
[dummy, p] = max(crossOut, [], 2);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == p)) * 100);

以下是获取模式的代码:

function [ input,output ] = gettingPatterns()
myFolder='C:\Users\ahmed\Documents\MATLAB\New Folder (3)\homeWork\speech'; 
filePattern=fullfile(myFolder,'*.wav');
wavFiles=dir(filePattern);
output=[];
input=[];
for k = 1:length(wavFiles)
sampleOutput=zeros(1,6);

baseFileName = wavFiles(k).name;
if baseFileName(3:5)=='ang',sampleOutput(1)=1;,end;
if baseFileName(3:5)=='fea',sampleOutput(2)=1;,end;
if baseFileName(3:5)=='bor',sampleOutput(3)=1;,end;
if baseFileName(3:5)=='sad',sampleOutput(4)=1;,end;
if baseFileName(3:5)=='joy',sampleOutput(5)=1;,end;
if baseFileName(3:5)=='neu',sampleOutput(6)=1;,end;

output(k,:)=sampleOutput; 
fullFileName = fullfile(myFolder, baseFileName);
wavArray = wavread(fullFileName);
[cepstra,xxx]=melfcc(wavArray);  
[m,n]=size(cepstra);
reshapedArray=reshape(cepstra,m*n,1);

smalledArray=small(reshapedArray,70);
%Normalized Features
%x(i)=(x(i)-mean)/std;
normalizedFeatures=[];
for i=1:length(smalledArray)
normalizedFeatures(i)=(smalledArray(i)-mean(smalledArray))/std(smalledArray);
end
input(k,:)=normalizedFeatures;
end

请注意以下事项:

我在nn工具箱上完成了这个测试,得到了相同的结果。自己实现它的唯一原因是能够添加额外的隐藏层。

成本函数,前向传播和后向传播的实现是100%正确的,所以我没有在这个问题中包含它们。

0 个答案:

没有答案