什么是MATLAB-ish方法“直方图化”一个排序的单元格数组?

时间:2014-02-28 02:36:25

标签: matlab cell-array

假设我运行了7/11,并且按照第一列时间排序的以下100x3单元格数组是我的销售记录。

12:32:01 customer1 12
12:32:02 customer2 13
12:32:04 customer6 4
12:32:06 customer8 6
12:32:07 customer1 9
12:32:07 customer1 6
12:32:12 customer2 1
...

正如您所注意到的,每个客户都可以多次购物。例如客户1实际上做了三种不同的付款。

我现在希望计算每位客户的平均付款额。,例如我们假设客户1只进行了3次付款,如上所示。然后,他的平均付款金额为(12+9+6)/3=9

我可以写一个for循环来循环遍历所有条目并保持每个客户的轨道。但是,我觉得这不是用MATLAB完成的。

那么完成任务的MATLAB最多的方法是什么?

2 个答案:

答案 0 :(得分:5)

unique开始,为每位客户获取一个整数“关键字”,然后使用@mean函数句柄将其输入accumarray

data = {'12:32:01','customer1',12; '12:32:02','customer2',13;...
        '12:32:04','customer6',4; '12:32:06','customer8',6;...
        '12:32:07','customer1',9; '12:32:07','customer1',6;...
        '12:32:12','customer2',1};
[customers,~,ic] = unique(data(:,2));
avePayment = accumarray(ic,[data{:,3}],[],@mean);

然后汇编输出:

>> custAvgTab = [customers num2cell(avePayment)]

custAvgTab = 

    'customer1'    [9]
    'customer2'    [7]
    'customer6'    [4]
    'customer8'    [6]

恕我直言,这是相当MATLAB-ish,实际上非常直观。

注意:我将cell2mat(data(:,3))替换为[data{:,3}],因为我认为最好在可能的情况下使用内置MATLAB操作。

注2:对于大数据,sprintfc('%d',avePayment)可能比num2cell(avePayment)快。

答案 1 :(得分:0)

如果你的单元格数组如下,你可以这样做:

    dataC = { ...
        {'12:32:01', 'customer1', 12}, ...
        {'12:32:02', 'customer2', 13}, ...
        {'12:32:04', 'customer6', 4}, ...
        {'12:32:06', 'customer8', 6}, ...
        {'12:32:07', 'customer1', 9}, ...
        {'12:32:07', 'customer1', 6}, ...
        {'12:32:12', 'customer2', 1}, ...
    };



   % get unique costumer names
   uniqueCostumers = ...
       unique(cellfun(@(c) c{2},  dataC, 'UniformOutput', false));


   for i = 1:numel(uniqueCostumers)
       customer = uniqueCostumers{i};

       % payments for a given customer, including zero payments
       paymetsC = cellfun(@(c) strcmp(c{2}, customer) * c{3},  dataC, 'UniformOutput', false);

       %convert to vector
       paymetsV = [paymetsC{:}];

       % calculate mean manually
       meanValue = mean(paymetsV(paymetsV ~= 0));

       fprintf(1, 'Mean for %s is %.2f\n', customer, meanValue);
   end

这导致:

Mean for customer1 is 9.00
Mean for customer2 is 7.00
Mean for customer6 is 4.00
Mean for customer8 is 6.00