Question

我必须计算所有英国邮政编码相互之间的距离，然后计算1英里内所有邮政编码的总数。邮政编码＆amp;人口列表存储在文本文件中。我对matlab最熟悉，但我也有Stata＆amp; amp; PSPP可用。该计划目前计划大约需要2周。有什么办法可以加快这个过程吗???这是我的代码。 Matlab生成脚本以导入文本数据。距离函数来自映射工具箱，并执行大圆公式。

非常感谢任何帮助。

function pcdistance(postcode, pop, lat, lon)
%Finds total population for UK postcode within 1 mile radius


fid = fopen('PPC.txt','a');

n = length(postcode);

%Calculates distance of 1 postcode at a time, against all others
%All data that doesn't meet rules is deleted


for i = 1:n;

    dist = [];
    dist(:,1)= pop;

    for j = 1:n;
        dist(j,2) = distance(lat(i),lon(i),lat(j),lon(j),3963.17);
        good = dist(1:j,2)<= 1;
    end

    dist = dist(good,:);
    tot = sum(dist(:,1));


    fprintf(1,'%s,%d;',postcode{i},tot)

end

%Find sum of population within 1 mile

fclose(fid);

end

这是来自txt文件的一小部分输入示例。列分别是“postcode，pop，lat，long”。

“BD7 1DB”，749,53.79，-1.76
“M15 6AA”，748,53.46，-2.24
“WR2 6AJ”，748,52.19，-2.24
“M15 6PF”，745,53.46，-2.23
“IP7 7RA”，741,52.12,0.96
“CF62 4WA”，740,51.41，-3.41
“M1 2AR”，738,53.47，-2.22
“NG1 4BR”，737,52.95，-1.14
“ST16 3AW”，735,52.81，-2.11
“AB25 1LE”，733,57.15，-2.10
“WF2 9AG”，730,53.68，-1.50
“DT11 8RH”，730,50.86，-2.12
“CW1 5NP”，729,53.09，-2.41
“TR12 7RH”，724,50.08，-5.25
“ST5 5DY”，723,53.00，-2.27
“HA1 3HP”，723,51.57，-0.33
“DL10 7NP”，722,54.37，-1.62
“M1 7HR”，719,53.47，-2.23
“B18 4AS”，719,52.49，-1.93
“OX13 6JB”，716,51.68，-1.30

以下是更正后的代码。

function pcdistance4(postcode, pop, lat, lon)
%Finds total population for UK postcode within 1 mile radius


fid = fopen('PPC.txt','A');

n = length(postcode);


% Pre-allocation
dist = zeros(n,2);
tot = zeros(n,1);

tic

for i = 1:n;


    dist(:,1)= pop;

    dist(:,2) = distance(lat(i),lon(i),lat(:),lon(:),3963.17);

      good = dist(:, 2) <= 1 & dist(:,2) ~=0;

    tot(i) = sum(dist(good, 1));
    tot(i) = tot(i) + pop(i);

end

toc
tic

for j = 900001:n;
    fprintf(fid,'%s,%d;\n',postcode{j},tot(j));
end

toc

fclose(fid);

end

Answer 1

让您入门的一些常规提示：

对于您的个人教育：使用profiler运行您的代码，以查看大部分计算时间的花费。这将是开始优化的第一条线索。（link to doc）
你不应该在循环的每一步都写入硬盘驱动器，因为I \ O在计算时间上非常昂贵。相反，你应该将一堆字符串保存到内存中并写下这些＆＃34; chunks＆＃34;每过一段时间。 link1 link2
您可以尝试使用parfor代替for（link to doc）。或者甚至可能是CUDA，如果可以的话。（link to doc）
考虑使用Geodetic Toolbox。将lat \ lng坐标转换为UTM（即笛卡尔坐标）然后使用一些标准函数来查找距离可能更容易。

此外：

我的建议：不要使用i和j作为indices for your loops - 这通常被视为不良做法（由于可能与虚数混淆）。

Answer 2

您还应该考虑内存复杂性。考虑在for-loops之外预先分配变量dist并逐个元素地覆盖它。

例如（参见修改后的函数中的注释）：

function pcdistance(postcode, pop, lat, lon)

fid = fopen('PPC.txt','a');

n = length(postcode);

% Pre-allocation
dist = zeros(n,n);

for i = 1:n;

    % Avoid "deleting" the variable, you can overwrite it as the number of
    % elements is always the same
    % dist = [];
    dist(:,1)= pop;

    for j = 1:n;

        % Unfortunately I do tno have the mentioned toolbox, but there is a
        % high chance that you can avoid the for-loop. Probabily something
        % like:
        %    dist(:, 2) = distance(lat(i), lon(i), lat, lon, ...)
        % Try to vectorize it.
        dist(j,2) = distance(lat(i),lon(i),lat(j),lon(j),3963.17);

        % There is no need for this operation, is highly redundant and
        % computationally expensive:
        %   - in the first loop you will check 1
        %   - in the second loop you will check two elements (1 redundant)
        %   - in the jth loop you will check j elements (j-1 redundant)
        % The total redundant operations are 1+2+3+...+n-1.
        %good = dist(1:j,2)<= 1;
    end

    % better do this
    good = dist(:, 2) <= 1;

    % also memory expensive.
    % dist = dist(good,:);

    % Better do the indexing directly
    tot = sum(dist(good, 1));

end

% Write outside as recommended by Dev-iL

%Find sum of population within 1 mile

fclose(fid);

end

Answer 3

我无法相信distance（）只能在向量化一些正弦和余弦函数时没有问题的情况下比较两个点。所以这是一个修剪过的矢量化版本，我不久前为自己的目的写的。也许是因为我没有那个工具箱或我不知道它。坦率地说，它没有给出与我刚测试过的distance（）完全相同的结果。如果你需要distance（）的确切结果，最好不要使用这个矢量化版本。

function dist = distance_on_earth(lat0, lon0, lats, lons, radius)
degree2radians = pi/180;

% phi = 90 - latitude
phi0 = (90-lat0)*degree2radians;
phis = (90-lats)*degree2radians;

% theta = longitude
theta0 = lon0*degree2radians;
thetas = lons*degree2radians;

% sperical distance:
cosine = sin(phi0)*sin(phis)*cos(theta0-thetas)+cos(phi0)*cos(phis);
arc = acos(cosine);
dist = arc*radius;

除了Dev-iL建议的内容之外，你至少可以从内循环中取出以下内容：

good = dist(1:j,2)<= 1;

祝你好运！ NRAS

Answer 4

英国是如此之小，你仍然可以得到合理的结果，而不必担心地球的弯曲。您可以使用纬度和经度的差异来估算距离。

此示例有点过于简单，但建议一旦读入数据，您可以在一小时内完成实际计算。

x=rand(1.7e6,1);                %Fake x data
y=x;                            %Fake y data
tic
for t=1:1.7e3                   % One thousandst part of the work to be done
    (x-0.5).^2+(x-0.2).^2>0.01; %Simple distance calculation from a point (0.5,0.2), then comparing to treshold
end
toc                             %Runs for about 2 seconds

使用真实距离可能需要更长时间，但完成时间不应超过1或2小时。

优化将170万个条目与自身进行比较的函数

4 个答案: