寻找矩阵子矩阵组合的最有效方法[matlab]

时间:2018-04-26 21:02:32

标签: matlab recursion matrix combinations

假设我们有一个0和0的矩阵P,我们希望找到P的非重叠子矩阵的最佳(*)组合,以便每个子矩阵:

  • 至少包含L个零和L个(最小面积= 2 * L);
  • 最多包含H个元素(最大面积= H)。

(*)最佳组合是最大化所有子矩阵中元素总数的组合。

请注意,最佳组合可能不是唯一的,并且并不总是可以用最佳组合的子矩阵覆盖P的所有元素,即

L = 1
H = 6
P =
     0     0     0     0
     1     1     0     0
     0     0     0     0

两个子矩阵给出了最佳组合之一

 1     2     2     4   % top left corner 1 2 - bottom right corner 2 4
 1     3     1     1   % top left corner 1 1 - bottom right corner 3 1

仅涵盖P的12个元素中的9个。

我为解决问题而编写的代码分为两部分:

  1. 首先使用内置函数conv2找到满足前两个属性(至少L个零和L个,最大H个元素)的P的所有可能子矩阵(下面的代码中有更多信息);这个部分只用了几秒钟就很容易了;
  2. 然后使用递归技术分析所有子矩阵的集合;这是问题的核心,可能需要数小时才能找到解决方案(除非计算机首先冻结)。
  3. 递归以这种方式完成:

    • 使用所有子矩阵的集合A和它的副本B作为输入来调用递归函数
    • 修复A的第一个子矩阵并将其添加到当前组合
    • 找到所有非重叠子矩阵的集合C
    • 使用A和C作为输入来调用递归函数
    • 修复C的第一个子矩阵并将其添加到当前组合
    • 找到所有非重叠子矩阵的集合D
    • 使用A和D
    • 作为输入来调用递归函数
    • 等等,直到非重叠子矩阵的集合Z为空;最终组合是一个包含所有子矩阵的向量,计数器(即组合中子矩阵的数量)和删除组合子矩阵中包含的元素后剩余的元素数量
    • 然后Y的第二个子矩阵被修复,前一个组合的最后一个子矩阵被替换为
    • 如果非重叠子矩阵的集合为空,则将新的最终组合与前一个子矩阵进行比较,如果非重叠子集的集合非空,则将其第一个子矩阵添加到组合中,依此类推
    • 当找到完美组合或者A的所有子矩阵的第一个循环结束时,脚本终止。

    这种方法对于小矩阵(小于10x10)来说很快,对于更大的矩阵来说可能是一场噩梦;我测试了一个200x200矩阵,几分钟后我的计算机就冻结了。问题是,如果所有子矩阵的集合包含数千个元素,那么递归将生成数百个嵌套的for循环,消耗大量的RAM和CPU。

    我想知道实现目标的最有效方法是什么,因为我的方法非常糟糕。

    这是我的代码:

    %% PROBLEM
    %
    %  let P be a matrix whose elements are zeros and ones
    %  find the best(*) combination of non-overlapping submatrices of P
    %  so that each submatrix respect these properties:
    %   - contains at least L zeros and L ones (min area=2*L)
    %   - contains at most H elements (max area=H)
    %
    %  (*) the best is the one which maximize the total number of elements in all the submatrices
    %
    %  notices: the best combination could be not unique
    %           is not always possibile to cover all the elements of P with the submatrices of the best combination
    %
    %% INPUT
    P=round(rand(8,8)); L=1; H=5;
    %P=dlmread('small.txt'); L=1; H=5;  % small can be found here https://pastebin.com/RTc5L8We
    %P=dlmread('medium.txt'); L=2; H=8; % medium can be found here https://pastebin.com/qXJEiZTX
    %P=dlmread('big.txt'); L=4; H=12;   % big can be found here https://pastebin.com/kBFFYg3K
    %P=[0 0 0 0 0 1;0 0 0 0 0 1;0 1 0 1 0 1;0 0 0 0 0 0;0 0 0 0 0 0]; L=1; H=6;
    P=[0 0 0 0 0;0 1 1 1 0;0 0 0 0 0]; L=1; H=6;
    %P=[1,0,0,0,0;1,1,1,1,1;1,0,0,0,0]; L=1; H=5;
    
    %% FIND ALL THE SUBMATRICES OF AREA >= 2*L & <= H
    %
    %  conv2(input_matrix,shape_matrix,'valid')
    %  creates a matrix, where each element is the sum of all the elements contained in
    %  the submatrix (contained in input_matrix and with the shape given by shape_matrix)
    %  having its top left corner at said element
    % 
    %  ex.  conv2([0,1,2;3,4,5;6,7,8],ones(2,2),'valid')
    %       ans =
    %             8    12
    %            20    24
    %       where 8=0+1+3+4 12=1+2+4+5  20=3+4+6+7  24=4+5+7+8
    %
    s=[]; % will contain the indexes and the area of each submatrix
          % i.e.  1 3 2 5 9  is the submatrix with area 9 and corners in 1 2 and in 3 5 
    for sH = H:-1:2*L
        div_sH = divisors(sH);
        fprintf('_________AREA %d_________\n',sH)
        for k = 1:length(div_sH)
            a = div_sH(k);
            b = div_sH(end-k+1);
            convP = conv2(P,ones(a,b),'valid');
            [i,j] = find((convP >= L) & (convP <= sH-L));
            if ~isempty([i,j])
                if size([i,j],1) ~= 1
    %                        rows     columns           area
                    s = [s;[i,i-1+a , j,j-1+b , a*b*ones(numel(i),1)]];
                else
                    s = [s;[i',i'-1+a,j',j'-1+b,a*b*ones(numel(i),1)]];
                end
                fprintf('[%dx%d] submatrices: %d\n',a,b,size(s,1))
            end
        end
    end
    fprintf('\n')
    s(:,6)=1:size(s,1);
    
    %% FIND THE BEST COMBINATION
    tic
    [R,C]=size(P); % rows and columns of P
    no_ones=sum(P(:)); % counts how many ones are in P
    % a combination of submatrices cannot contain more than max_no_subm submatrices
    if no_ones <= R*C-no_ones
        max_no_subm=floor(no_ones/L);
    else
        max_no_subm=floor(R*C-no_ones/L);
    end
    comb(2,1)=R*C+1; % will contain the best combination
    s_copy=s; % save a copy of s
    [comb,perfect]=recursion(s,s_copy,comb,0,0,R,C,0,false,H,[],size(s,1),false,[0,0,0],0,0,0,0,0,0,max_no_subm);
    fprintf('\ntime: %2.2fs\n\n',toc)
    if perfect
        disp('***********************************')
        disp('***  PERFECT COMBINATION FOUND  ***')
        disp('***********************************')
    end
    
    %% PRINT RESULTS
    if (R < 12 && C < 12)
        for i = 1:length(find(comb(2,3:end)))
            optimal_comb_slices(i,:)=s(comb(2,i+2),:);
        end
        optimal_comb_slices(:,1:5)
        P
    end
    

    使用

    给出的功能
    function [comb,perfect,counter,area,v,hold_on,ijk,printed,first_for_i,second_for_i,third_for_i] = recursion(s,s_copy,comb,counter,area,R,C,k,hold_on,H,v,size_s,perfect,ijk,size_s_ovrlppd,size_s_ovrlppd2,printed,third_for_i,second_for_i,first_for_i,max_no_subm)
    %
    % OUTPUT (that is actually going to be used in the main script)
    % comb [matrix] a matrix of two rows, the first one contains the current combination
    %        the second row contains the best combination found
    % perfect [boolean] says if the combination found is perfect (a combination is perfect if
    %           the submatrices cover all the elements in P and if it is made up with
    %           the minimum number of submatrices possible)
    %
    % OUTPUT (only needed in the function itself)
    % counter [integer] int that keeps track of how many submatrices are in the current combination
    % area [integer] area covered with all the submatrices of the current combination
    % v [vector] keeps track of the for loops that are about to end
    % hold_on [boolean] helps v to remove submatrices from the current combination
    %
    % OUTPUT (only needed to print results)
    % ijk [vector] contains the indexes of the first three nested for loops (useful to see where the function is working)
    % printed [boolean] used to print text on different lines
    % first_for_i second_for_i third_for_i [integers] used by ijk
    %
    %
    % INPUT
    % s [matrix] the set of all the submatrices of P
    % s_copy [matrix] the set of all the submatrices that don't overlap the ones in the current combination
    %                 (is equal to s when the function is called for the first time)
    % R,C [integers] rows and columns of P
    % k [integer] area of the current submatrix
    % H [integer] maximum number of cells that a submatrix can contains
    % size_s [integer] number of rows of s i.e. number of submatrices in s
    % size_s_ovrlppd [integer] used by ijk
    % size_s_ovrlppd2 [integer] used by ijk
    % max_no_subm [integer] maximum number of submatrices contained in a combination
    %
    %
    %  the function starts considering the first submatrix (call it sub1) in the set 's' of all the submatrices
    %  and adds it to the combination
    %  then it finds 's_ovrlppd' i.e. the set of all the submatrices that don't overlap sub1
    %  and the function calls itself considering the first submatrix (call it sub2) in the set 's_ovrlppd'
    %  and adds it to the combination
    %  then it finds the set of all the submatrices that don't overlap sub2 and
    %  so on until there are no more non-overlapping submatrices
    %  then it replaces the last submatrix in the combination with the second one of the last set of non-overlapping
    %  submatrices and saves the combination which covers more elements in P
    %  and so on for all the submatrices of the set 's'
    %
    %  DOWNSIDE OF THIS METHOD
    %    if 's' contains thousands of submatrices, the function will create hundreds of nested for loops
    %    so both time and space complexities can be really high and the computer might freeze
    %
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%%   SAVE AND RESET COMBINATIONS   %%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %   s_copy is empty when no more submatrices can be added to the current
    %   combination, in this case we have to check if this combination is
    %   better than the best combination previosly found, if so then we overwrite it
    %
    %   then we have to remove one or more submatrices from the combination (depending on
    %   how many nested for loops are about to be closed)
    %   and compute another combination
    %   to 'remove one or more submatrices from the combination' it is necessary to do these things:
    %    - reduce the area
    %    - reduce the combination
    %    - reduce the counter
    %
        if isempty(s_copy)
            comb(1,2)=counter;  % final no of submatrices in the combination
            comb(1,1)=R*C-area; % no. of cells remained in P after removing the cells contained in the submatrices of the combination
    %       if the combination just found is better than the previous overwrite it
            if comb(1,1)<comb(2,1) || (comb(1,1)==comb(2,1) && comb(1,2)<comb(2,2))
                comb(2,:)=comb(1,:);
                disp(['[area_left] ', num2str(comb(2,1)), ' [slices] ', num2str(comb(2,2))])
                printed=true;
            end
    
            area=area-k; % tot area of the combination excluding the last sumatrix
            if ~isempty(v) && ~hold_on % more than one submatrix must be removed
                i=size(v,2);
                if i>length(find(v)) % if v contains at least one 0
                    while v(i)==1 % find the index of the last 0
                        i=i-1;
                    end
                    last_i_counter=size(v(i+1:end),2); % no. of consecutive for loop that are about to end
                    v=v(1:i-1);
                else
                    last_i_counter=i;
                    v=[];
                end
                for i=1:last_i_counter
                    area=area-s(comb(1,2+counter-i),5); % reduce the area
                end
                comb(1,2+counter-last_i_counter:2+counter)=0; % remove submatrices from the combination
                counter=counter-(last_i_counter+1); % reduce the counter
                hold_on=true;
            else % exactly one submatrix must be removed
                comb(1,2+counter)=0;
                counter=counter-1;
            end
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%   FIND COMBINATIONS   %%%
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%
        else
            for i = 1:size(s_copy,1) % fix the i-th submatrix
    
    %           if the combination cover all elements of P & the no. of submatrices is the minimum possibile
                if comb(2,1)==0 && comb(2,2)==ceil(R*C/H)
                    perfect=true;
                    return
                end
    
                pos=s_copy(i,6); % position of current submatrix in s
                comb(1,3+counter)=pos; % add the position to the combination
                k=s(pos,5); % save the area of the current submatrix
                area=area+k; % area covered with all the submatrices of the combination up now
                counter=counter+1;
    
    %           if the area in P covered by the current combination could not
    %           be bigger than the best combination found up now, then discard
    %           the current combination and consider the next one
                if R*C-area-(max_no_subm-counter)*H > comb(2,1) && i < size(s_copy,1)
    %              R*C-area-(max_no_subm-counter)*H can be negative if ceil(R*C/H) < max_no_subm
                    counter=counter-1;
                    comb(1,3+counter)=0;
                    area=area-k;
                else
                    s_ovrlppd=s_copy; % initializing the set of non-overlapping submatrices
                    s_ovrlppd(s_copy(:,1)<=s_copy(i,2) & s_copy(:,2)>=s_copy(i,1) & s_copy(:,3)<=s_copy(i,4) & s_copy(:,4)>=s_copy(i,3),:)=[]; % delete submatrices that overlap the i-th one
                    s_ovrlppd(s_ovrlppd(:,6)<pos,:)=[]; % remove submatrices that will generate combinations already studied
    %               KEEP TRACK OF THE NESTED 'FOR' LOOPS ENDS REACHED
                    if i==size(s_copy,1) % if i is the last cycle of the current for loop
                        v(size(v,2)+1)=1; % a 1 means that the code entered the last i of a 'for' loop
                        if size(s_ovrlppd,1)~=0 % hold on until an empty s_ovrlppd is found
                            hold_on=true;
                        else
                            hold_on=false;
                        end
                    elseif ~isempty(v) && size(s_ovrlppd,1)~=0
                        v(size(v,2)+1)=0; % a 0 means that s_ovrlppd in the last i of a 'for' loop is not empty => a new 'for' loop is created
                    end
    %%%%%%%%%%%%%%%%%%%%%%%%
    %%%   PRINT STATUS   %%%
    %%%%%%%%%%%%%%%%%%%%%%%%
                    if size(s_copy,1)==size_s
                        ijk(1)=i;
                        ijk(2:3)=0;
                        fprintf('[%d,%d,%d]\n',ijk)
                        size_s_ovrlppd=size(s_ovrlppd,1);
                        first_for_i=i;
                        second_for_i=0;
                    elseif size(s_copy,1)==size_s_ovrlppd
                        ijk(2)=i;
                        ijk(3)=0;
                        if ~printed
                            fprintf(repmat('\b',1,numel(num2str(first_for_i))+numel(num2str(second_for_i))+numel(num2str(third_for_i))+2+2+1)) % [] ,, return
                        else
                            printed=false;
                        end
                        fprintf('[%d,%d,%d]\n',ijk)
                        size_s_ovrlppd2=size(s_ovrlppd,1);
                        second_for_i=i;
                        third_for_i=0;
                    elseif size(s_copy,1)==size_s_ovrlppd2
                        ijk(3)=i;
                        if ~printed
                            fprintf(repmat('\b',1,numel(num2str(first_for_i))+numel(num2str(second_for_i))+numel(num2str(third_for_i))+2+2+1))
                        else
                            printed=false;
                        end
                        fprintf('[%d,%d,%d]\n',ijk)
                        third_for_i=i;
                    end
                    [comb,perfect,counter,area,v,hold_on,ijk,printed,first_for_i,second_for_i,third_for_i]=recursion(s,s_ovrlppd,comb,counter,area,R,C,k,hold_on,H,v,size_s,perfect,ijk,size_s_ovrlppd,size_s_ovrlppd2,printed,third_for_i,second_for_i,first_for_i,max_no_subm);
                end
            end
        end
    end
    

1 个答案:

答案 0 :(得分:2)

你基本上是在尝试解决一个非线性整数规划问题,如果可能的话,它通常非常(非常!)很难解决。在这种情况下,200 * 200不是一个小问题,它非常大。

我最好的建议是使用一些减少搜索空间的方法或应用一些启发式方法,如果你能接受一个近似的解决方案。我没有对它进行过测试,但我相信一些树搜索算法可以很好地执行,因为很多子矩阵会重叠,因此可以从搜索树中删除。

我尝试使用遗传算法ga中的Matlab构建,它也可以执行,但可能存在更好的解决方案。

ga算法中,您可以定义目标函数:

function [left] = toMin(rect,use)
left = numel(rect(1).mat);
for i = 1:length(rect)
    left = left - use(i)*sum(rect(i).mat(:)==1); 
end

你可以最小化。约束函数

function [val,tmp] = constraint(rect,use)
tot = zeros(size(rect(1).mat));
totSum=0;
for i = 1:length(rect)
    if use(i)==1
        tot = tot|rect(i).mat;
        totSum = totSum + sum(rect(i).mat(:)==1);
    end
end
tmp = [];
val = abs(sum(tot(:)==1)-totSum);

我在rect处创建了一个带有字段.mat的矩形结构,这是表示它所在位置的矩阵。

要使用它(s与算法中的相同)

rect(size(s,1)).size = s(i,end);
rect(size(s,1)).mat = [];
for i = 1:size(s,1)
    rect(i).size = s(i,end);
    cmp = zeros(size(P));
    [x,y] = meshgrid(s(i,1):s(i,2),s(i,3):s(i,4));
    cmp(x,y) = 1;
    rect(i).mat=cmp;
end

现在您可以应用ga

sol = ga(@(use)toMin(rect,use),length(rect),[],[],[],[],zeros(length(rect),1),ones(length(rect),1),@(x) constraint(rect,x),1:length(rect));