例如,假设变量strings
是包含字符串的单元格,如下所示:
strings = {'alpha' 'basis' 'colic' 'druid' 'even' 'fluff' 'golf'};
我想过滤strings
,以便最终只使用匹配第一个和最后一个字符的字符串。 IOW,此操作的结果应为
{'alpha' 'colic' 'druid' 'fluff'}
更一般地说,
我想过滤一个字符串的单元格数组,以删除所有与正则表达式不匹配的字符串。
对于上面的示例,我可以使用以下逻辑数组
获得所需的结果~~cellfun(@numel, regexp(strings, '^(.).*\1$'))
IOW,
>> strings(~~cellfun(@numel, regexp(strings, '^(.).*\1$')))
ans =
'alpha' 'colic' 'druid' 'fluff'
但是~~cellfun(@numel, regexp(strings, '^(.).*\1$'))
是一种难以理解的怪物。
是否有更清晰的方法来过滤单元格数组,以便将匹配保留为正则表达式?
编辑:根据excaza的回答,我定义了以下功能:
% grep.m
function filtered = grep(pattern, cellarray)
%GREP find matches to PATTERN in a cell array of strings.
% GREP(PATTERN, CELLARRAY) returns a cell array
% containing all the strings in CELLARRAY that match the
% regular expression PATTERN. CELLARRAY is expected to
% be a cell array of strings.
filtered = cellarray(matchq(cellarray, pattern));
end
% matchq.m
function yn = matchq(string, pattern)
%MATCHQ predicate stating whether STRING matches PATTERN.
% If STRING is a single string, MATCHQ(STRING, PATTERN)
% returns a logical value corresponding to whether or not
% STRING matches pattern. If STRING is a cell array of
% strings, MATCHQ(STRING, PATTERN) returns a logical vector
% whose i-th entry equals MATCHQ(STRING{i}, PATTERN).
if ischar(string)
yn = ~isempty(regexp(reshape(string, 1, []), pattern, 'match'));
else
assert(iscellstr(string));
yn = cellfun(@(s) matchq(s, pattern), string);
end
end
有了这些定义,
>> grep('^(.).*\1$', strings)
ans =
'alpha' 'colic' 'druid' 'fluff'
FWIW,grep
仍然"工作"即使strings
由任意形状的字符向量组成:
>> grep('^(.).*\1$', {['aus';'tra';'lia'], ['basis']', ['ce';'lt';'ic'], ...
['dia';'led'], 'early', ['foo';'lpr';'oof'], ...
['gyp';'sum']})
ans =
[3x3 char] [3x2 char] [2x3 char] [3x3 char]
>> cellfun(@(c) reshape(c', [], 1)', ans, 'UniformOutput', false)
ans =
'australia' 'celtic' 'dialed' 'foolproof'
答案 0 :(得分:2)
根据regexp
's documentation,您可以使用'match'
output keyword仅请求返回与您的表达式匹配的文本。 regexp
本机操作单元格数组,因此无需使用cellfun
调用它。但是,为了确保regexp
的健壮性,它具有返回单元格单元格的(通常很烦人)behavior,其中每个单元格对应于输入单元格的regexp
输出
这导致以下情况:
strings = {'alpha' 'basis' 'colic' 'druid' 'even' 'fluff' 'golf'};
matches = regexp(strings, '^(.).*\1$', 'match');
返回:
matches =
1×7 cell array
{1×1 cell} {} {1×1 cell} {1×1 cell} {} {1×1 cell} {}
要摆脱空单元格,可以使用基本循环或cellfun(基本上等同于循环):
strings = {'alpha' 'basis' 'colic' 'druid' 'even' 'fluff' 'golf'};
matches = regexp(strings, '^(.).*\1$', 'match');
emptymask = cellfun('isempty', matches);
matches(emptymask) = [];
返回:
matches =
1×4 cell array
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
你需要再多一步来解开细胞。这可以通过简单的循环或cellfun
(基本上等同于循环)来完成:
strings = {'alpha' 'basis' 'colic' 'druid' 'even' 'fluff' 'golf'};
matches = regexp(strings, '^(.).*\1$', 'match');
emptymask = cellfun('isempty', matches);
matches(emptymask) = [];
matches = cellfun(@(x) x{:}, matches, 'UniformOutput', false);
返回:
matches =
1×4 cell array
'alpha' 'colic' 'druid' 'fluff'
如果您可以假设输入单元格(或字符串)数组的每个单元格应该只有一个匹配项,那么您可以使用'once'
search option来消除一个图层:
strings = {'alpha' 'basis' 'colic' 'druid' 'even' 'fluff' 'golf'};
matches = regexp(strings, '^(.).*\1$', 'match', 'once');
返回:
matches =
1×7 cell array
'alpha' '' 'colic' 'druid' '' 'fluff' ''
这可以通过与天真方法相同的掩码传递:
strings = {'alpha' 'basis' 'colic' 'druid' 'even' 'fluff' 'golf'};
matches = regexp(strings, '^(.).*\1$', 'match', 'once');
emptymask = cellfun('isempty', matches);
matches(emptymask) = [];
返回:
matches =
1×4 cell array
'alpha' 'colic' 'druid' 'fluff'