我有一个简单的矩阵,在某些列中有重复值。我需要按名称和星期对数据进行分组,并将每周花费的数量相加一周。这是一个例子:
name day week price
John 12 12 200
John 14 12 70
John 25 13 150
John 1 14 10
Ann 13 12 100
Ann 15 12 100
Ann 20 13 50
所需的输出是:
name week sum
John 12 270
John 13 150
John 14 10
Ann 12 200
Ann 13 50
有一个很好的方法吗?我用过循环,但不确定它是最好的方法:
names= unique(data(:,1)); % getting unique names from data
n=size(names, 1); % number of unique names
m=size(data(:,1),1); % number of total rows
sum=[]; % empty matrix for writing the results
for i = 1:n
temp=[]; % creating temporar matrix
k=1;
for j=1:m
if name(i)==data(j,1) % going through all the rows and getting the rows of
temp(k,:)=data(j,:); % the same name and putting in temporar matrix
k=k+1;
end
end
count=0;
s=1;
for l = 1:size(temp,1)-1 % going through temporar matrix of one name(e.g.John)
if temp(l,3)==temp(l+1,3) % checking if the day of current row is equal to the
count=count+temp(l,4); % date of the next row (the data is sorted by name
else % and date) and then summing the prices 4th column
sum(s, 1:3)=[names(i) temp(l,3) count];
count=0; % if the days are not equal, then writing the answer
s=s+1; % to the output matrix sum
end
end
end
答案 0 :(得分:3)
使用accumarray
。它会像这样分组和聚合值。您可以使用unique(data(:,1))
中的第三个otuput参数来将数字索引传递给subs
的{{1}}参数。有关详细信息,请参阅accumarray
。
答案 1 :(得分:1)
最简单的方法可能是使用统计工具箱中的GRPSTATS功能。您必须首先合并name
和week
才能生成群组:
[name_week priceSum] = grpstats(price, strcat(name(:), '@', week(:)), {'gname','sum'});