Question

假设我有一个序列：

    Seq = 'hello my name'

和一个字符串：

    Str = 'hello hello my friend, my awesome name is John, oh my god!'

然后我在字符串中查找我的序列的匹配项，因此我得到单元格数组中序列的每个单词的每个匹配项的“单词”索引，因此第一个元素是包含''匹配项的单元格你好'，第二个元素包含'my'的匹配和'name'的第三个匹配。

    Match = {[1 2];      %'hello' matches
             [3 5 11];   %'my' matches
             [7]}        %'name' matches

我需要代码以某种方式得到一个答案，说可能的子序列匹配是：

    Answer = [1 3 7;     %[hello my name]
              1 5 7;     %[hello my name]
              2 3 7;     %[hello my name]
              2 5 7;]    %[hello my name]

以这种方式，“答案”包含所有可能的有序序列（这就是为什么我的（字11）永远不会出现在“答案”中，在第11位之后必须有一个“名字”匹配。

注意：“Seq”的匹配长度和数量可能会有所不同。

Answer 1

由于Matches的长度可能会有所不同，您需要使用comma-separated lists和ndgrid来生成所有组合（此方法类似于this other answer中使用的方法）。然后使用diff和logical indexing

过滤掉索引未增加的组合

cc = cell(1,numel(Match)); %// pre-shape to be used for ndgrid output
[cc{end:-1:1}] = ndgrid(Match{end:-1:1}); %// output is a comma-separated list
cc = cellfun(@(v) v(:), cc, 'uni', 0) %// linearize each cell
combs = [cc{:}]; %// concatenate into a matrix
ind = all(diff(combs.')>0); %'// index of wanted combinations
combs = combs(ind,:); %// remove unwanted combinations

期望的结果在变量combs中。在您的示例中，

combs =
     1     3     7
     1     5     7
     2     3     7
     2     5     7

如何计算匹配模式的可能单词子序列？

1 个答案: