是否有可能找到载体的非纳米值,但也允许n个nans?例如,如果我有以下数据:
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array
thres = 1; % this is the number of nans to allow
我想只保留最长的值序列与非nans但允许' n'数据中保留的nans数。所以,假设我愿意保持1纳米,我会得到
的输出X_out = [8 10 11 nan 9 14 6 1 4 23 24]; %// output array
多数民众赞成,开始时的两个nans已被删除,因为它们超过了' thres'以上,但第三个纳米本身就可以保存在数据中。我想开发一种方法,可以将thres定义为任何值。
我可以用
找到非纳米值Y = ~isnan(X); %// convert to zeros and ones
有什么想法吗?
答案 0 :(得分:8)
为了找到最多包含threshold
次NaN
的最长序列,我们必须找到所述序列的开始和结束。
要生成所有可能的起点,我们可以使用hankel
:
H = hankel(X)
H =
18 3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24
3 NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0
NaN NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0
NaN 8 10 11 NaN 9 14 6 1 4 23 24 0 0 0
8 10 11 NaN 9 14 6 1 4 23 24 0 0 0 0
10 11 NaN 9 14 6 1 4 23 24 0 0 0 0 0
11 NaN 9 14 6 1 4 23 24 0 0 0 0 0 0
NaN 9 14 6 1 4 23 24 0 0 0 0 0 0 0
9 14 6 1 4 23 24 0 0 0 0 0 0 0 0
14 6 1 4 23 24 0 0 0 0 0 0 0 0 0
6 1 4 23 24 0 0 0 0 0 0 0 0 0 0
1 4 23 24 0 0 0 0 0 0 0 0 0 0 0
4 23 24 0 0 0 0 0 0 0 0 0 0 0 0
23 24 0 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0 0 0 0 0 0 0 0
现在我们需要找到每行中的最后一个有效元素。
为此,我们可以使用cumsum
:
C = cumsum(isnan(H),2)
C =
0 0 1 2 2 2 2 3 3 3 3 3 3 3 3
0 1 2 2 2 2 3 3 3 3 3 3 3 3 3
1 2 2 2 2 3 3 3 3 3 3 3 3 3 3
1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
每一行的结束点是C
中相应元素最多threshold
的结尾点:
threshold = 1;
T = C<=threshold
T =
1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
使用以下命令找到最后一个有效元素:
[~,idx]=sort(T,2);
lastone=idx(:,end)
lastone =
3 2 1 4 15 15 15 15 15 15 15 15 15 15 15
我们必须确保每行的实际长度得到尊重:
lengths = length(X):-1:1;
real_length = min(lastone,lengths);
[max_length,max_idx] = max(real_length)
max_length =
11
max_idx =
5
如果有更多相等的最大长度的序列,我们只需取第一个并显示它:
selected_max_idx = max_idx(1);
H(selected_max_idx, 1:max_length)
ans =
8 10 11 NaN 9 14 6 1 4 23 24
完整脚本
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24];
H = hankel(X);
C = cumsum(isnan(H),2);
threshold = 1;
T = C<=threshold;
[~,idx]=sort(T,2);
lastone=idx(:,end)';
lengths = length(X):-1:1;
real_length = min(lastone,lengths);
[max_length,max_idx] = max(real_length);
selected_max_idx = max_idx(1);
H(selected_max_idx, 1:max_length)
答案 1 :(得分:5)
一种可能的方法是将Y = double(~isnan(X));
与n
个窗口卷积,其中n
减少直到找到可接受的子序列。 &#34;可接受&#34;表示子序列至少包含n-thres
个,也就是说,卷积至少包含n-thres
。
Y = double(~isnan(X));
for n = numel(Y):-1:1 %// try all possible sequence lengths
w = find(conv(Y,ones(1,n),'valid')>=n-thres); %// is there any acceptable subsequence?
if ~isempty(w)
break
end
end
result = X(w:w+n-1);
将Y
与n
个窗口进行对话(如方法1中所示)相当于计算Y
的累积和,然后采用n
间距的差异。这在操作次数方面更有效。
Y = double(~isnan(X));
Z = cumsum(Y);
for n = numel(Y):-1:1
w = find([Z(n) Z(n+1:end)-Z(1:end-n)]>=n-thres);
if ~isempty(w)
break
end
end
result = X(w:w+n-1);
这实质上是计算方法1中循环的所有迭代。
Y = double(~isnan(X));
z = conv2(Y, tril(ones(numel(Y))));
[nn, ww] = find(bsxfun(@ge, z, (1:numel(Y)).'-thres)); %'
[n, ind] = max(nn);
w = ww(ind)-n+1;
result = X(w:w+n-1);
答案 2 :(得分:1)
让我们尝试我最喜欢的工具:RLE。 Matlab没有直接的功能,所以用我的“seqle”贴出来交换中心。 Seqle的默认设置是返回运行长度编码。所以:
>> foo = [ nan 1 2 3 nan nan 4 5 6 nan 5 5 5 ];
>> seqle(isnan(foo))
ans =
run: [1 3 2 3 1 3]
val: [1 0 1 0 1 0]
“run”表示当前运行的长度; “val”表示该值。在这种情况下,val==1
表示值为nan
,val==0
表示数值。您可以看到,提取满足条件val==0 | run < 2
的最长“运行”值序列相对容易,连续不超过一个nan
。然后只需获取该运行的累积索引,这就是您想要的foo
的子集。
编辑:
遗憾的是,通过代码提取的内容可能并不那么容易。我怀疑有一种更快的方法来使用longrun
标识的索引来获得所需的子序列。
>> foo = [ nan 1 2 3 nan nan 4 5 6 nan nan 5 5 nan 5 nan 4 7 4 nan ];
>> sfoo= seqle(isnan(foo))
sfoo =
run: [1 3 2 3 2 2 1 1 1 3 1]
val: [1 0 1 0 1 0 1 0 1 0 1]
>> longrun = sfoo.run<2 |sfoo.val==0
longlong =
run: [2 1 1 1 6]
val: [1 0 1 0 1]
% longrun identifies which indices might be part of a run
% longlong identifies the longest sequence of valid run
>> longlong = seqle(longrun)
>> lfoo = find(sfoo.run<2 |sfoo.val==0);
>> sbar = seqle(lfoo,1);
>> maxind=find(sbar.run==max(sbar.run),1,'first');
>> getlfoo = lfoo( sum(sbar.run(1:(maxind-1)))+1 );
% first value in longrun , which is part of max run
% getbar finds end of run indices
>> getbar = getlfoo:(getlfoo+sbar.run(maxind)-1);
>> getsbar = sfoo.run(getbar);
% retrieve indices of input vector
>> startit = sum(sfoo.run(1:(getbar(1)-1))) +1;
>> endit = startit+ ((sum(sfoo.run(getbar(1):getbar(end ) ) ) ) )-1;
>> therun = foo( startit:endit )
therun =
5 5 NaN 5 NaN 4 7 4 NaN
答案 3 :(得分:0)
X = [18 3 nan nan 8 10 11 nan 9 14 6 1 4 23 24]; %// input array
thresh =1;
X(isnan(X))= 0 ;
for i = 1:thresh
Y(i,:) = circshift(X',-i); %//circular shift
end
出于某种原因,Matlab反转&#34; &#39; &#34;使格式看起来很奇怪。
D = X + sum(Y,1);
Discard = find(D==0)+thresh; %//give you the index of the part that needs to be discarded
chunk = find(X==0); %//Segment the Vector into segments delimited by NaNs
seriesOfZero = circshift(chunk',-1)' - chunk;
bigchunk =[1 chunk( find(seriesOfZero ~= 1)) size(X,2)]; %//Convert series of NaNs into 1 chunk
[values,DiscardChunk] = intersect(bigchunk,Discard);
DiscardChunk = sort(DiscardChunk,'descend')
for t = 1:size(DiscardChunk,2)
X(bigchunk(DiscardChunk(t)-1):bigchunk(DiscardChunk(t))) = []; %//Discard the data
end
X(X == 0) = NaN
%//End of Code
8 10 11 NaN 9 14 6 1 4 23 24
当: X = [18 3 nan nan nan 8 10 11 nan nan 9 14 6 1 nan nan nan 4 23 24]; %//输入数组 thresh = 2;
8 10 11 NaN 4 23 24