Question

我有.mat文件，其中有两列“Product”和“Customer”。购买不同产品时，客户编号重复多次。该表看起来像：

  Product Customer    
      114        1    
      112        2    
      112        1   
      113        4    
      115        3    
      113        2   
      111        2    
      113        3

我需要这样做：

    Customer 111 112 113 114 115
           1   0   1   0   1   0
           2   1   1   1   0   0
           3   0   0   1   0   1
           4   0   0   1   0   0

在新表中必须是“客户”列，每个产品还有5个列，如果客户“1”购买产品“112”，那么如果他没有购买它应该是1个ind应该是0。我怎样才能用MATLAB做到这一点？任何帮助都会非常好！

Answer 1

这是accumarray的经典案例。

>> product = [114, 112, 112, 113, 115, 113, 111, 113]';
>> customer = [1, 2, 1, 4, 3, 2, 2, 3]';
>> [~,~,ic] = unique(product);
>> accumarray([customer, ic], 1)

ans =

     0     1     0     1     0
     1     1     1     0     0
     0     0     1     0     1
     0     0     1     0     0

在这里，我们使用unique计算出唯一的产品ID，第三个输出是从product向量到唯一ID的映射。

Answer 2

说N_of_pr是产品总数，N_of_cus是客户总数，tab是您拥有的第一个双列表。得到的二进制矩阵是M

pr=zeros(1,N_of_pr);
cus=zeros(1,N_of_cus);

s=size(tab);

for j=1:s(1)
    pr(tab(j,1))=1;
    cus(tab(j,2))=1;
end;

[X,Y]=meshgrid(pr,cus);

M=X.*Y;

Answer 3

您可以使用基本的MATLAB命令，如sparse

table = sparse(Customer, Product, 1);

或统计工具箱中的grpstats等内容。

t = table(Product, Customer);
grpstats(t, {'Customer','Product'})

这并不完全符合您想要的表格，但我猜您仍然可以实现目标。

在文件交换中还有一个名为pivottable的提交，它将执行您想要的操作：

pivottable([Customer, Product, ones(size(Product))], 1, 2, 3, @sum)

如何根据MATLAB中的另一列聚合一列？

3 个答案: