Question

（想要下面的Matlab代码的一种有效的替代方法。）我想从下面提到的Matlab代码中获取大小为（N-60 * 60）的矩阵“数据”。对于for循环，并且N具有非常大的值，这会花费大量的计算时间。有人可以推荐更快的方法来获取数据矩阵。

import random

line_we_want = random.randrange(5)
with open('keywords.txt', 'r') as input_file:
    content = input_file.readlines()
    for number_of_line, line in enumerate(content):
        if number_of_line == line_we_want:
            print(line)

谢谢！

Answer 1

正如Sardar所说，预分配数据将为您带来最大的改善。但是，您也可以使用聪明的索引删除for循环。这些评论应该可以解释我的大部分工作。

n = 1e6;
% Modified a to be a incrementing list to better understand how data is
% constructed
a = (1:n)';
order = 60;

%% Original code with pre allocation added
data = zeros(n-order+1, order);
for i = order:length(a)
    data(i-order+1,:) = a([i:-1:i-order+1])';
end

%% Vectorized code
% The above code was vectorized by building a large array to index into
% the a vector with.

% Get the indicies of a going down the first column of data
%   Went down the column instead of across the row to avoid a transpose
%   after the reshape function
idx = uint32(order:n);

% As we go across the columns we use the same indexing but -1 as we move to
% the right so create the simple offset using 0:-1:1-order. Then expand to
% the correct number of elements using kron
offset = kron(uint32(0:-1:1-order), ones(1, n-order+1, 'uint32'));

% Replicate the column indexing for how many columns we have and add the
% offset
fullIdx = repmat(idx, 1, order) + offset;

% Then use the large indexing array to get all the data as a large vector
% and then reshape it to the matrix
data2 = reshape(a(fullIdx), n-order+1, order);

% idx, offset, and fullIdx will take up a fair amount of memory so it is
% probably best to clear them
clear idx offset fullIdx;

assert(isequal(data, data2));

注意：并非必须使用uint32，但是确实可以节省内存使用量，并且对性能的改善不大。

如何有效地从一列中获取多个行？

1 个答案: