Question

我正在建立一个基于时间对齐距离的语音识别，并且我有这样的数据：

tes = 1 x 160 double
refr = 1 x 54 double

我在

之后执行此操作

[rows,A] = size(tes);
[rows,B] = size(refr);    
matr = zeros(A);
for b = 1:A
    for e = b+1:A
        tes_pot = tes(1,b:e);
        matr(b,e) = TA(tes_pot,refr);
    end
end

其中TA是以下函数：

function ans = TA(test, ref)

[rows,N] = size(test);
[rows,M] = size(ref);

if(N>M)
    for ix = 1:N
    Y{ix} = fix(M/N*ix);
    end
else
    for ix = 1:M
    Y{ix} = fix(N/M*ix);
    end
end

Y = cell2mat(Y);
Y(Y == 0) = 1;

if(N>M)
    for j=1:N
    d(j)=abs(test(j)-ref(Y(1,j)));
    end
else
    for j=1:M
    d(j)=abs(ref(j)-test(Y(1,j)));
    end
end

ans = sum(d);

它运行良好，但是当我为许多refr数据执行此操作时，执行此操作需要很长时间（执行100 refr个数据大约超过15分钟）。有什么帮助来简化这段代码吗？

Answer 1

您的函数TA可以重写为

function val = TA(test, ref)

    N = size(test,2);
    M = size(ref,2);

    Y = fix( (1:max(M,N)) * min(M,N)/max(M,N) );
    Y(Y == 0) = 1;

    if (N>M)    
        d = abs(test-ref(Y));
    else    
        d = abs(ref-test(Y));    
    end

    val = sum(d);

end

这会更快，因为：

你没有在循环之前预先分配变量;这需要在每次迭代时重新调整变量的大小，这很慢
至少有一个if已被淘汰;在大多数现代CPU上，分支相对较慢。哪里容易避免，避免。
你认为自下而上（矩阵只不过是一个容器的值）而不是自上而下（矩阵是值，它恰好是复合的）。 MATLAB专注于后者;你可以做出更大的步骤。

Answer 2

如果你不确切地知道为什么写下你写的东西，这里有一些基本的东西可以加快它的速度：

只计算一次M/N或N/M（除法很慢）
vectorize而不是for for循环（for循环要慢得多）

通过这两项更改，我相信以下内容与原始代码具有相同的功能 - 但由于缺少for循环，可能会明显更快：

[rows,N]=size(test);
[rows,M]=size(ref);
if (N>M) 
  nn = N;
  mult = M/N;
else
  nn = M;
  mult = N/M;
end
Y = fix(mult * (1:nn));
Y(Y == 0) = 1;
ans = sum(abs(test - ref(Y(1,:)))); % confirm that the shape of these two is the same? I think it is...

也有可能加快你的外循环，但是有点难以弄清楚你在那里做了什么...但是因为你正在执行外循环160次而内循环“直到“160次，内循环的任何节省都应该有所帮助。有了这个，你可以用

多做一点

[rows,A]=size(tes);
[rows,B]=size(refr);    
matr=zeros(A);
for b = 1:A
    for e = b+1:A
        matr(b,e) = TA(tes(1,b:e),refr); % don't make a separate variable each time
    end
end

让我知道这是否更快！

Answer 3

这是一种不同的方法 - 不是计算你想要“手动”进行匹配的索引，而是计算参考信号的重采样形状一次（对于所有尺寸），然后匹配到所有可能的位置。信号。它在“精神上”做同样的事情，但显然不是一个相同的计算。但它只需要0.6秒。可能值得一看，看看这对你是否真的有用：

N = 160;
M = 54;
tes = rand(1, N);
refr = rand(1, M);
tes(35:35+M-1)=tes; % create a point where correlation is good

% every possible segment of refr - size 2 to N - is scaled to match tes
% then the mean of the absolute differences is taken

tic
matr = zeros(N, N);

% first - case where segment of tes is less than refr:
for e=2:N
    xx = linspace(1,M,e);
    rr = interp1(1:M, refr, xx); % compute linear interpolation just once for each size
    for b = 1:N-e                % match this size at each possible point in the signal
        matr(b,e) = mean(abs(tes(b+(1:e))-rr)); % using mean to remove difference due to # of samples
    end
end

figure
imagesc(matr)                    % show the distribution of values: a hot spot indicates a "match"
axis image; axis xy; axis off    % get "right way around", square, no labels
toc

定时：

Elapsed time is 0.551464 seconds

图像：

enter image description here

显然，如果信号之间有良好的相关性，你会在图像的相应位置看到一个“冷点” - 我通过将一些模板复制到信号中来模拟。

简化大数据的for循环

3 个答案: