Question

我有一个字符串的MATLAB单元格数组和一个带有部分字符串的第二个数组：

base = {'a','b','c','d'}
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2','q8','r15'}

输出结果为：

base = 

    'a'    'b'    'c'    'd'


all2 = 

    'a1'    'b1'    'c1'    'd1'    'a2'    'b2'    'c2'    'd2'    'q8'    'r15'

问题/需求

如果任何 'a1'，'b1'，'c1'，'d1'和任何 {{1} }，'a2'，'b2'，'c2'出现在'd2'数组中，然后返回变量all2。

如果任何 numb=2，'a1'，'b1'，'c1'和任何 {{1} }，'d1'，'a2'，'b2'和任何 'c2'，'d2'，'a3'，{{1 } {}存在于'b3'数组中，然后返回变量'c3'。

尝试

1

基于'd3'（this approach），我尝试all2，但我收到了此错误：

numb=3

2

使用strfind的这种other方法似乎更好但只是给了我

matches = strfind(all2,base);

这些尝试都没有奏效。我认为我的逻辑不正确。

3

This answer允许查找字符串数组中部分字符串数组的所有索引。它返回：

`Error using strfind`

`Input strings must have one row.`
....

输出：

strfind

我对此方法的问题：如何使用输出fun = @(s)~cellfun('isempty',strfind(all2,s)); out = cellfun(fun,base,'UniformOutput',false) idx = all(horzcat(out{:})); idx(1,1) out = [1x10 logical] [1x10 logical] [1x10 logical] [1x10 logical] ans = 0来计算base = regexptranslate('escape', base); matches = false(size(all2)); for k = 1:numel(all2) matches(k) = any(~cellfun('isempty', regexp(all2{k}, base))); end matches？我不确定这是否是我特定问题最相关的逻辑，因为它只提供匹配的索引。

问题

有没有办法在MATLAB中做到这一点？

修改

其他信息：

数组matches = 1 1 1 1 1 1 1 1 0 0将始终是连续的。 <{1}}的情况是不可能的。

Answer 1

使用正则表达式查找base元素的唯一后缀：

base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};

% Use sprintf to build the expression so we can concatenate all the values
% of base into a single string; this is the [c1c2c3] metacharacter.
% Assumes the values of base are going to be one character
%
% This regex looks for one or more digits preceeded by a character from
% base and returns only the digits that match this criteria.
regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);

% Use once to eliminate a cell array level
test = regexp(all2, regexstr, 'match', 'once');

% Convert the digits to a double array
digits = str2double(test);

% Return the number of unique digits. With isnan() we can use logical indexing
% to ignore the NaN values
num = numel(unique(digits(~isnan(digits))));

返回：

>> num

num =

     3

如果你需要连续数字，那么这样的东西应该是有效的：

base = {'a','b','c','d'};
all2 = {'a1','b1','c1','d1','a2','b2','c2','d2', 'a4', 'q8','r15'};

regexstr = sprintf('(?<=[%s])(\\d+)', [base{:}]);
test = regexp(all2, regexstr, 'match', 'once');
digits = str2double(test);

% Find the unique digits, with isnan() we can use logical indexing to ignore the
% NaN values
unique_digits = unique(digits(~isnan(digits)));

% Because unique returns sorted values, we can use this to find where the
% first difference between digits is greater than 1. Append Inf at the end to
% handle the case where all values are continuous.
num = find(diff([unique_digits Inf]) > 1, 1);  % Thanks @gnovice :)

返回：

>> num

num =

     2

分解regexp和sprintf行：因为我们知道base只包含单个字符，所以我们可以使用[c1c2c3] metacharacter，它将匹配内部的任何字符括号。因此，如果我们'[rp]ain'，我们将匹配'rain'或'pain'，而不是'gain'。

base{:}返回MATLAB称之为comma-separated list的内容。添加括号将结果连接成单个字符数组。

没有括号：

>> base{:}

ans =

    'a'


ans =

    'b'


ans =

    'c'


ans =

    'd'

括号：

>> [base{:}]

ans =

    'abcd'

我们可以使用sprintf将其插入到表达式字符串中。这为我们提供了(?<=[abcd])(\d+)，它匹配一个或多个数字，前面是a, b, c, d之一。

MATLAB在字符串的单元格数组中查找子串数组的重复次数

1 个答案: