合并由零分隔的组数

时间:2016-04-04 16:12:13

标签: matlab

我正在从模拟设备录制,并假设数据类似于示例矢量,如:

A = [1 4 2 0 4 5 8 8 1 0 0 0 4 7 1 9 0 0 0 8 1 2]

我想:

1)计算非零元素组的数量
2)合并可能属于相同条件的组

对于1),我们可能会将其拆分为

1 4 2
4 5 8 8 1
4 7 1 9
8 1 2

然而,对于2),有可能由第一个单个0分隔的值实际上来自相同的条件与由更多零分隔的值相比,这意味着该向量可能实际上被分割为

1 4 2 4 5 8 8 1
4 7 1 9
8 1 2

可以找到(1)的过去解决方案here,即:

count = sum(diff([A 0]==0)==1)
a0 = (A~=0);
d = diff(a0);
start = find([a0(1) d]==1)           % Start index of each group
len = find([d -a0(end)]==-1)-start+1 % Length, number of indexes in each group
finish = find([d -a0(end)]==-1)      % Last index of each group
count = length(start);
B = cell(count,1);
for i = 1:count
B{i} = A(start(i):finish(i));

由于我不想破坏旧线程,我想知道是否有办法使分组更加健壮,以便由单或双零分隔的值不会分裂成全新的基。

1 个答案:

答案 0 :(得分:2)

The case of integer values in range [0:9]

There is a very elegant one-line solution using regular expressions.
First of all convert the vector as a string:

A = [1 4 2 0 4 5 8 8 1 0 0 0 4 7 1 9 0 0 0 8 1 2];
As=num2str(A);
As(As==' ')=[];

the last line is mandatory due to the fact that num2str() also converts blank spaces between numbers. Therefore As will have the form:

As =

1420458810004719000812

and will be a string.

Now the regexp():

out = regexp(As,'0{2,}','split');

Such expression basically says: from As grab the indices in which there are two or more consecutive zeros and return (thanks to split) the non-matching sequences (i.e. we do not what the zeros, we want the non-zeros part of the sequence).

However, out will be a cell array due to the fact that As is a string. If you want it back to numeric, just add:

out=cellfun(@str2num,out);

in order to convert the cell array with strings into a matrix (with numbers of course). Indeed now out has the form:

out =

   142045881        4719         812 

The floating point case

A = [0.01 0.04 0.02 0.00 0.04 0.05 0.08 0.08 0.01 0.00 0.00 0.00 0.04 0.07 0.01 0.09 0.00 0.00 0.00 0.08 0.01 0.02];
As=num2str(A);
As(As==' ')=[];

Now As has the form:

As =

0.010.040.0200.040.050.080.080.010000.040.070.010.090000.080.010.02

The two or more zeros are now hard to find. However, patterns emerge: such set of zeros have a non-zero number before (the last decimal from previous number) and have another zero after (if it was a "normal" zero it'd have a decimal point)

[sID,eID]=regexp(As,'[1-9]00{2,}');

where sID and eID are the start and end indices of our substring target(s), respectively [1]. Now let's split As thanks to the above indices [2]:

C{1}=As(1:sID(1));
for ii=2:length(sID)
    C{end+1}=As(eID(ii-1):sID(ii));
end
C{end+1}=As(eID(end):end);

The cell array C now is rather messy due to the fact that there's no such thing as 0.00 or even 000 because Matlab treat 0.00 as simply 0 but we must append a .00 in order to rebuild the original sequence:

for i=1:length(C)
    idx=strfind(C{i},'00');
    if isempty(idx)==false
        C{i}=[C{i}(1:idx) '.00' C{i}(idx+1:end)];
    end
    C{i}=reshape(C{i},4,[])';
end

In the above code, we also reshaped the long strings into matrices, so now we can easily convert them into numeric

C1=cellfun(@str2num,C,'UniformOutput',0);

Now C1 is still a cell array where every cell is a chunk of sequence (in numeric array form). Obviously now we cannot rely on matrices and we are forced to use cell arrays due to the fact that chunks might have different lengths.

Final note

If A has numbers in range {0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08 0.09} you can as well multiply A by 100: in that case A will be a vector of integers and you can easily use the more elegant Integer approach. Then you can convert them back to floating point using this little snippet:

for ii=1:length(out)
    out{ii}=num2str(out{ii});                             %convert to string, so we can enumerate and treat digits separately
    out{ii}=[repmat('0.0',length(out{ii}),1) out{ii}(:)]; %put '0.0' in front of every number
    out{ii}=str2num(out{ii});                             %roll-back to numeric
end

It is finally worth noticing that given the A definition as in the beginning of The floating point case both the floating point case itself and the latter lead to the same results.

[1] suggested by @LuisMendo
[2] improved thanks to @Adiel