Question

说我有以下数据S =

Year    Week Postcode
2009    24  2035
2009    24  4114
2009    24  4127
2009    26  4114
2009    26  4556
2009    27  7054
2009    27  6061
2009    27  4114
2009    27  2092
2009    27  2315
2009    27  7054
2009    27  4217
2009    27  4551
2009    27  2035
2010    1   4132
2010    1   2155
2010    5   4114 ... (>60000 rows)

在Matlab中，我想创建一个矩阵：

第1栏：年（2006-2014）

第2栏：一周（每年1-52）

然后下一个n列是唯一的邮政编码，其中每列中的数据都会计算我的数据S中的匹配项。

例如：

year  week  2035    4114    4127    4556    7054
2009    24   1        1       1       0       0
2009    25   0        0       0       0       0
2009    26   0        1       0       1       0
2009    27   1        1       0       0       2
2009    28   0        0       0       0       0

谢谢，如果你能提供帮助！

Answer 1

这是一个实现此列表的工作脚本。输出位于data表中。你应该：

阅读unique，tables，logical indexing，sortrows上的文档。因为这些是我在下面使用的关键工具。
调整脚本以处理您的数据。这可能涉及将矩阵更改为单元阵列以处理字符串输入等。
如果定期/使用不同的数据，可能会将其作为一种功能，以便更清洁地使用。

代码，完整评论说明：

% Use rng for repeatability in rand, n = num data entries
rng('default')
n = 100;

% Set up test data. You would use 3 equal length vectors of real data here
years = floor(rand(n,1)*9 + 2006);        % random integer between 2006,2014
weeks = floor(rand(n,1)*52 + 1);          % random integer between 1, 52
postcodes = floor(rand(n,1)*10)*7 + 4000; % arbitrary integers over 4000

% Create year/week values like 2017.13, get unique indices
[~, idx, ~] = unique(years + weeks/100);

% Set up table with year/week data
data = table();
data.Year = years(idx);
data.Week = weeks(idx);
% Get columns
uniquepostcodes = unique(postcodes);
% Cycle over unique columns, assign data
for ii = 1:numel(uniquepostcodes)
    % Variable names cannot start with a numeric value, make start with 'p'
    postcode = ['p', num2str(uniquepostcodes(ii))];
    % Create data column variable for each unique postcode
    data.(postcode) = zeros(size(data.Year,1),1);
    % Count occurences of postcode in each date row
    % This uses logical indexing of original data, looking for all rows 
    % which satisfy year and week of current row, and postcode of column.
    for jj = 1:numel(data.Year)
        data.(postcode)(jj) = sum(years == data.Year(jj) & ...
                                  weeks == data.Week(jj) & ...
                                  postcodes == uniquepostcodes(ii));
    end
end

% Sort week/year data so all is chronological
data = sortrows(data, [1,2]);

% To check all original data was counted, you could run
% sum(sum(table2array(data(:,3:end))))
% ans = n, means that all data points were counted somewhere

在我的电脑上，n = 60,000只需不到2.4秒。几乎肯定可以进行优化，但对于可能不经常使用的东西，这似乎是可以接受的。

相对于唯一邮政编码的数量，处理时间呈线性增长。这是因为循环结构。因此，如果你将独特的邮政编码加倍（20而不是我的10个例子），则时间接近4.8秒 - 两倍长。

如果这样可以解决您的问题，请考虑接受此问题作为答案。

在matlab中多个计数ifs在矩阵中

1 个答案: