矢量化这个strfind循环

时间:2014-03-08 01:02:43

标签: matlab vectorization string-matching template-matching string-math

我想要对这个循环进行矢量化:

needle = [1 2 3];

haystack  = [0 0 1 2 3 0 1 2 3;
             0 1 2 3 0 1 2 3 0;
             0 0 0 1 2 3 0 0 0];

for ii = 1:3

    indices{ii} = strfind (haystack(ii,:), needle);

end

indices{:}

indices然后在needle的每一行中包含haystack的起始位置(每行可能有不同的次数):

3 7
2 6
4

任何命令都可以,只要它是矢量化的,就不一定是strfind

4 个答案:

答案 0 :(得分:0)

如果您不想使用for循环,请按以下步骤操作:

 result = cellfun(@(row) strfind(row, needle), num2cell(haystack, 2), 'UniformOutput', 0);

答案 1 :(得分:0)

可以连接整个haystack变量,然后在其中找到needle,如下所示:

totalWhiteSpaces=isspace(haystack); %finds white space locations
totalWhiteSpaces=sum(totalWhiteSpaces(1,:),2); %Assumes that "haystack" has equal number 
                                     %of characters (including whitespaces) in each row.

realColumns=size(haystack,2)-totalWhiteSpaces; %gets how many characters are 
                                               %there in a row excluding whitespaces
needle(needle==' ')='';
haystack1=haystack';
haystack2=(haystack1(:))';
haystack2(haystack2==' ')='';  %removes whitespace
result=strfind(haystack2,needle);  %find the pattern
rowsOfResult=uint32(result/realColumns)+1; %necessary since we had concatenated the array.
                                          %It is kind of reshaping operation.
resultValue=mod(result,realColumns);

我猜你可以从这里形成你的最终矩阵。

计时结果:当haystack变大时,您可以看到此代码的优势。根据我的实验,对于300000x9的大小 - 您的代码大约需要0.38秒。我的代码大约需要0.23秒,使用cellfun的代码需要2.23秒。我想这是因为num2cell操作。另外,cellfun在内部使用for-loop,因此它不是真正的矢量化。

答案 2 :(得分:0)

如果您可以接受不同格式的结果(更适合矢量化):

[m n] = size(haystack);
haystackLin = haystack.';
haystackLin = haystackLin(:).'; %// linearize haystack row-wise
ind = strfind(haystackLin,needle); %// find matches
[jj ii] = ind2sub([n m],ind); %// convert to row and column
valid = jj<=n-numel(needle)+1; %// remove false matches (spanning several rows)
result = [ii(valid).' jj(valid).'];

结果格式为

result =

     1     3
     1     7
     2     2
     2     6
     3     4

答案 3 :(得分:0)

如果您可以根据路易斯的建议在不同的矢量中找到带有相应行号的列号,您也可以使用它 -

%// Main portion
haystack_t = haystack';
num1 = strfind(num2str(haystack_t(:))',num2str(needle(:))');
col = rem(num1,size(haystack,2));
ind = floor(num1/size(haystack,2))+1;

%// We need to remove indices that get into account because of concatenation of all the numbers into one big string
rm_ind = col> (size(haystack,2) - numel(needle))+1;
col(rm_ind)=[];
ind(rm_ind)=[];

使用各种针输入运行 -

RUN1 (Original values):
needle =
     1     2     3
haystack =
     0     0     1     2     3     0     1     2     3
     0     1     2     3     0     1     2     3     0
     0     0     0     1     2     3     0     0     0
col =
     3     7     2     6     4
ind =
     1     1     2     2     3

RUN2 :
needle =
     1     2     3     0     1
haystack =
     0     0     1     2     3     0     1     2     3
     0     1     2     3     0     1     2     3     0
     0     0     0     1     2     3     0     0     0
col =
     3     2
ind =
     1     2