Question

假设我有一个长度为N的二进制向量，我正在寻找该向量中以下16个序列中每个序列的频率：

0000, 0001, 0010, 0011, ..., 1111

在向量中计算每个序列的这些频率的最简单方法是什么？理想情况下，我想知道如何在MatLab中执行此操作。

Answer 1

解决此问题的一种简单方法是将二进制数转换为十进制数，然后使用hist或accumarray来计算出现次数。我开始将数组重新整形为（N-3）-by-4数组，该数组允许对所有计算进行矢量化。

%# make up some test data
data = [0 0 1 1 0 1 0 1 1 1 1 1 0 0 1 1];

%# reshape into a (N-3)-by-4 array
%# idx is [1 2 3 4;2 3 4 5;...]
idx = bsxfun(@plus,(1:length(data)-3)',0:3); %'# 
data = data(idx);

%# convert binary numbers to decimals
%# use matrix multiplication
decData = data * [8;4;2;1];

%# count number of occurences - possible values are 0 through 15
counts = hist(decData,0:15);

counts(1)计算序列0 0 0 0出现在列表中的次数。

Answer 2

这些是数字0x0到0xF，只是将它们作为索引放入大小为0xF的数组中。求和数组元素，A [i] / N是你的频率。

Answer 3

count = zeros(1,16);
vector = [1 0 0 1 1 1 1 0 0];
N = length(vector);

for ii = 1:(N-3)
    cur_seq = vector(ii:ii+3);        % Grab the running set of four entries
    cur_num = cur_seq*[8; 4; 2; 1];   % Convert these entries to non-binary.

    % Update the count of the sequence that has cur_num
    % as its non-binary integer representation. Note that
    % you must use cur_num+1 as the index since Matlab is
    % 1-based, but the integers from your sequences are 0
    % through 15.

    count(cur_num+1) = count(cur_num+1) + 1;
end

现在count(1)计算[0,0,0,0]的出现次数，count(2)计算[0,0,0,1]的出现次数，依此类推。

Answer 4

将数据和块长度定义为

x = [ 1 0 1 0 0 0 0 0 1 1];
M = 4;

然后可以用如下一行获得结果：

result = histc(conv(x, 2.^(0:M-1), 'valid'), 0:2^M-1);

在此示例中，

result =
     2   1   0   1   1   0   0   0   1   0   1   0   0   0   0   0

表示2出现[0 0 0 0]次，1出现[0 0 0 1]等。

如何运作：

使用2的幂来计算卷积（使用conv），以查找每个滑动长度的十进制表示形式 - M二进制数。
计算步骤1中获得的每个数字的出现次数（使用histc）。

Answer 5

如果a持有您的数据：

c = []
for el = a,
  c = [c, sum(a==el)];
end

这是二次的，但计数与a的索引相同。如果您事先不知道范围，它也会起作用。

计算向量内序列的频率

5 个答案: