Question

我有以下时间序列：

b = [2 5 110 113 55 115 80 90 120 35 123];

b中的每个数字都是一个时刻的一个数据点。我从b计算了持续时间值。持续时间由b内的所有数字表示，大于或等于100并连续排列（所有其他数字都被丢弃）。允许一个小于100的最大间隙。这就是持续时间代码的样子：

 N = 2;     % maximum allowed gap     
 duration = cellfun(@numel, regexp(char((b>=100)+'0'), [repmat('0',1,N) '+'],    'split'));

为b提供以下持续时间值：

duration = [4 3];

我想在b中找到duration中每个值的位置（时间线）。接下来，我想用零替换位于duration之外的其他位置。结果如下：

result = [0 0 3 4 5 6 0 0 9 10 11];

如果有人可以提供帮助，那就太好了。

Answer 1

回答原始问题：模式最多只有一个值低于100

这是一种使用正则表达式检测所需模式的方法。我假设仅在（不是之后）值＆gt; = 100之间允许一个值<100。因此，模式是：一个或多个值＆gt; = 100，其中可能的值<100。

b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
[s, e] = regexp(B, '1+(.1+|)', 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(@ge, y, s(:)) & bsxfun(@le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result

这给出了

y =
     0     0     3     4     5     6     0     0     9    10    11

对编辑过的问题的回答：格式最多 n 的行数低于100

需要修改regexp，它必须作为 n 的函数动态构建：

b = [2 5 110 113 55 115 80 90 120 35 123]; %// data
n = 2;
B = char((b>=100)+'0'); %// convert to string of '0' and '1'
r = sprintf('1+(.{1,%i}1+)*', n); %// build the regular expression from n
[s, e] = regexp(B, r, 'start', 'end'); %// find pattern
y = 1:numel(B);
c = any(bsxfun(@ge, y, s(:)) & bsxfun(@le, y, e(:))); %// filter by locations of pattern
y = y.*c; %// result

Answer 2

这是另一种解决方案，而不是regexp。它自然地推广到任意间隙大小和阈值。不确定是否有更好的方法填补空白。评论中的解释：

% maximum step size and threshold
N = 2;
threshold = 100;
% data
b = [2 5 110 113 55 115 80 90 120 35 123];

% find valid data
B = b >= threshold;
B_ind = find(B);
% find lengths of gaps
step_size = diff(B_ind);
% find acceptable steps (and ignore step size 1)
permissible_steps = 1 < step_size & step_size <= N;
% find beginning and end of runs
good_begin = B_ind([permissible_steps, false]);
good_end = good_begin + step_size(permissible_steps);
% fill gaps in B
for ii = 1:numel(good_begin)
    B(good_begin(ii):good_end(ii)) = true;
end
% find durations of runs in B. This finds points where we switch from 0 to
% 1 and vice versa. Due to padding the first match is always a start of a
% run, the last one always an end. There will be an even number of matches,
% so we can reshape and diff and thus fidn the durations
durations = diff(reshape(find(diff([false, B, false])), 2, []));

% get positions of 'good' data
outpos = zeros(size(b));
outpos(B) = find(B);

在Matlab中查找持续时间值的时间线

2 个答案:

回答原始问题：模式最多只有一个值低于100

对编辑过的问题的回答：格式最多 n 的行数低于100