如何在4级嵌套for循环块

时间:2015-09-28 12:19:16

标签: matlab for-loop parallel-processing

我必须针对相当多的模型计算大数据集的stdmean。最后的循环块嵌套到四个级别。

这就是它的样子:

count = 1;
alpha  = 0.5;
%%%Below if each individual block is to be posterior'd and then average taken 
c = 1;
for i = 1:numel(writers) %no. of writers
    for j = 1: numel(test_feats{i}) %no. of images
        for k = 1: numel(gmm) %no. of models
            for n = 1: size(test_feats{i}{j},1)
                [~, scores(c)] = posterior(gmm{k}, test_feats{i}{j}(n,:));
                c = c + 1;
            end
            c = 1;
            index_kek=find(abs(scores-mean(scores))>alpha*std(scores));
            avg = mean(scores(index_kek)); %using std instead of mean... beacause of ..reasons
            NLL(count) = avg;
            count = count + 1;
        end
        count = 1; %reset count
        NLL_scores{i}(j,:) = NLL; 

    end
    fprintf('***score for model_%d done***\n', i)
end

它可以工作并提供所需的结果,但即使在我的i7处理器上也需要3天才能给出最终计算结果。在处理过程中,任务管理器告诉我只有20%的cpu被使用,所以我宁愿在cpu上加载更多的负载以更快地获得结果。

通过官方帮助here,如果我想让最外面的循环成为一个parfor,同时保持其余的正常,我必须做的是插入整数限制而不是函数调用,如{{1 }或size

因此,上述代码进行这些更改将成为:

numel

在我的情况下,这是实施count = 1; alpha = 0.5; %%%Below if each individual block is to be posterior'd and then average taken c = 1; num_writers = numel(writers); num_images = numel(test_feats{1}); num_models = numel(gmm); num_feats = size(test_feats{1}{1},1); parfor i = 1:num_writers %no. of writers for j = 1: num_images %no. of images for k = 1: num_models %no. of models for n = 1: num_feats [~, scores(c)] = posterior(gmm{k}, test_feats{i}{j}(n,:)); c = c + 1; end c = 1; index_kek=find(abs(scores-mean(scores))>alpha*std(scores)); avg = mean(scores(index_kek)); %using std instead of mean... beacause of ..reasons NLL(count) = avg; count = count + 1; end count = 1; %reset count NLL_scores{i}(j,:) = NLL; end fprintf('***score for model_%d done***\n', i) end 的最佳方式吗?可以进一步改进或优化吗?

1 个答案:

答案 0 :(得分:0)

我现在无法在Matlab中测试,但它应该接近一个有效的解决方案。它减少了循环次数并改变了一些实现细节,但总体而言,它的执行速度与之前的代码一样快(甚至更慢)。

如果gmm和test_feats占用大量内存,那么parfor能够确定哪些数据的和平需要传递给哪些工作人员是很重要的。如果检测到低效的内存访问,IDE应警告您。如果num_writers远小于CPU中的内核数量,或者只是略大一些(例如4个内核的5个写入器需要大约8个写入器),则此修改尤其有用。

[i_writer i_image i_model] = ndgrid(1:num_writers, 1:num_images, 1:num_models);
idx_combined = [i_writer(:) i_image(:) i_model(:)];
n_combined = size(idx_combined, 1);

NLL_scores = zeros(n_combined, 1);

parfor i_for = 1:n_combined
    i = idx_combined(i_for, 1)
    j = idx_combined(i_for, 2)
    k = idx_combined(i_for, 3)

    % pre-allocate
    scores = zeros(num_feats, 1)

    for i_feat = 1:num_feats
        [~, scores(i_feat)] = posterior(gmm{k}, test_feats{i}{j}(i_feat,:));
    end

    % "find" is redundant here and performs a bit slower, might be insignificant though
    index_kek = abs(scores - mean(scores)) > alpha * std(scores);
    NLL_scores(i_for) = mean(scores(index_kek));
end