如何在一些数据集中有效地找到给定项目列表的出现次数,而不使用循环?

时间:2016-10-30 19:37:16

标签: matlab

我有一个数据集M,其中一些项目及其类别类型分别存储在第1列和第2列中。向量cat存储M中存在的唯一类别类型。向量YM中项目的子集。我想查找每个类别类型与Y中的项目关联的次数。这是我写的代码:

cat(:,1) = unique(M(:,2)); % Unique items in M
cat(:,2) = zeros(size(cat,1),1); % initialize column 2 of cat to 0s
N = size(Y,1);
for i=1:N
    item = Y(i,1);
    temp = M(M(:,1)==item,:);
    C(:,1) = unique(temp(:,2));
    C(:,2) = histc(temp(:,2), unique(temp(:,2))); % Frequency of items in temp(:,2)
    for j=1:size(cat,1)
        for k=1:size(C,1)
            if cat(j,1)==C(k,1)
                cat(j,2) = cat(j,2)+C(k,2);
            end
        end
    end
    clear C; clear temp; clear item;
end

但即使是中等大小的MYcat,这显然也很慢。如何让它更快?

举例说明:

M=[3    2
   4    12
   1    7
   3    4
   2    10
   1    6
   4    19
   4    6
   3    12
   1    10
   2    12];

和, Y=[2;3];

然后我希望输出cat如下:

cat=[2  1
     4  1
     6  0
     7  0
    10  1
    12  2
    19  0];

1 个答案:

答案 0 :(得分:1)

如果我正确理解您想要macro_rules! struct_bitflag_impl { // pub/pub (pub struct $name:ident ( pub $t:tt ) ) => { #[derive(PartialEq, Eq, Copy, Clone, Debug)] pub struct $name(pub $t); _struct_bitflag_gen_impls!($name, $t); }; // private/pub (struct $name:ident ( pub $t:tt ) ) => { #[derive(PartialEq, Eq, Copy, Clone, Debug)] struct $name(pub $t); _struct_bitflag_gen_impls!($name, $t); }; // pub/private (pub struct $name:ident ( $t:tt ) ) => { #[derive(PartialEq, Eq, Copy, Clone, Debug)] struct $name($t); _struct_bitflag_gen_impls!($name, $t); }; // private/private (struct $name:ident ( $t:tt ) ) => { #[derive(PartialEq, Eq, Copy, Clone, Debug)] struct $name($t); _struct_bitflag_gen_impls!($name, $t); } } macro_rules! _struct_bitflag_gen_impls { ($t:ident, $t_base:ident) => { impl ::std::ops::BitAnd for $t { type Output = $t; #[inline] fn bitand(self, _rhs: $t) -> $t { $t(self.0 & _rhs.0) } } impl ::std::ops::BitOr for $t { type Output = $t; #[inline] fn bitor(self, _rhs: $t) -> $t { $t(self.0 | _rhs.0) } } impl ::std::ops::BitXor for $t { type Output = $t; #[inline] fn bitxor(self, _rhs: $t) -> $t { $t(self.0 ^ _rhs.0) } } impl ::std::ops::Not for $t { type Output = $t; #[inline] fn not(self) -> $t { $t(!self.0) } } impl ::std::ops::BitAndAssign for $t { #[inline] fn bitand_assign(&mut self, _rhs: $t) { self.0 &= _rhs.0; } } impl ::std::ops::BitOrAssign for $t { #[inline] fn bitor_assign(&mut self, _rhs: $t) { self.0 |= _rhs.0; } } impl ::std::ops::BitXorAssign for $t { #[inline] fn bitxor_assign(&mut self, _rhs: $t) { self.0 ^= _rhs.0; } } /// Support for comparing with the base type, allows comparison with 0. /// /// This is used in typical expressions, eg: `if (a & FLAG) != 0 { ... }` /// Having to use MyFlag(0) all over is too inconvenient. impl PartialEq<$t_base> for $t { #[inline] fn eq(&self, other: &$t_base) -> bool { self.0 == *other } } } } categories的{​​{1}}的直方图,items也会出现在M中。

使用ismember,您可以找到同样显示在Y中的M项目的索引:

Y

使用该索引过滤掉所需的项目并将其保存到idx = ismember(M(:,1), Y);

temp

使用来自temp = M(idx, :); 的唯一值来形成临时图的直方图:

Cat(:,1)

避免保存中间结果可以简化上述代码:

Cat(:,2) = histc(temp(:, 2), Cat(:, 1));

或全部在一行:

idx = ismember(M(:,1),Y);
Cat(:,2) = histc(M(idx, 2), Cat(:,1));

注意:cat是MATLAB中内置函数的名称,因此我将变量Cat(:,2) = histc(M(ismember(M(:,1),Y), 2), Cat(:,1)); 重命名为cat