Question

我正在尝试编写一个函数（请参阅下面的代码），该函数将创建一个机场列表，其中有1,000多个航班离开或到达，然后提供包含每个机场的指数的向量（x， y）格式。机场和飞行数据分别位于尺寸为1x6267和1x136010的单独结构中，并且每个机场和航班都具有与其结构中的列对应的唯一ID号。虽然我编写的代码确实成功识别了1000多个合并到达/离开的机场，但该功能需要近20分钟才能执行，并且只提供机场的ID号，而不是机场结构中的索引号。我需要对代码进行哪些更改才能使其运行时间低于5分钟，以及如何为机场创建（x，y）索引向量？任何帮助将不胜感激！谢谢！

P.S。我对MATLAB很新，所以如果这些问题看起来很愚蠢或明显，我会提前道歉。

function list = Problem10( flights, aircraft, airlines, airports )
a=zeros(1,(length(airports)));
for id=1:length(airports)
    a(id)= FlightsToFrom(flights, id);
end
a(a==0) = [];
[list]=[sort(a)]
end

function tofrom = FlightsToFrom(flights, ID)
sum=0;
for ii=1:length(flights)
    if (isequal (flights(1,ii).from_id, ID))||(isequal (flights(1,ii).to_id, ID))
        sum=sum+1;
        if sum > 1000
            break;
        end
    end
end
if sum <=1000
    tofrom=0;
else
    tofrom=ID;
end
end

以下是数据库外观/行为的一些示例：

(In Workspace)
airports <1x6267 struct>
aircraft <1x384 struct>
airlines <1x1559 struct>
flights <1x136010 struct>

(Inside of struct)
flights(1,6) <1x1 struct>
**Field**       **Value**
airline_id    60
from_id       967
to_id         6252
aircraft_id   18
distance      32
airtime       19
passengers    0
month         1

flights(1,6).from_id <1x1 double>
967


airport(1,176) <1x1 struct>
**Field**       **Value**
code          'AEX'
name          'Alexandria, LA: Alexandria International'

airport(1,176).name <1x40 char>
'Alexandria, LA: Alexandria International'

(In Command Window)

>> airports (2866)

ans = 

    code: 'LAX'
    name: 'Los Angeles, CA: Los Angeles International'

>> airports (1, 2866)

ans = 

    code: 'LAX'
    name: 'Los Angeles, CA: Los Angeles International'

>> airports (4703)

ans = 

    code: 'SEA'
    name: 'Seattle, WA: Seattle/Tacoma International'

>> airports (1, 4703)

ans = 

    code: 'SEA'
    name: 'Seattle, WA: Seattle/Tacoma International'

>> flights (4736)

ans = 

     airline_id: 31
        from_id: 1635
          to_id: 1062
    aircraft_id: 194
       distance: 118
        airtime: 1792
     passengers: 1657
          month: 1

>> flights (1, 4736)

ans = 

     airline_id: 31
        from_id: 1635
          to_id: 1062
    aircraft_id: 194
       distance: 118
        airtime: 1792
     passengers: 1657
          month: 1

>> flights(1,7369).to_id

ans =

   830

>> flights(1,7369).from_id

ans =

        1047

Answer 1

您的整个FlightsToFrom()函数可以完全向量化，从而产生以下结果（还有一些小的改进）：

function a = Problem10( flights, aircraft, airlines, airports )
    numAirports = length(airports);
    a(numAirports) = 0;  % preallocate: faster than zeros()
    from_ids = [flights.from_id];
    to_ids   = [flights.to_id];
    for id = 1 : numAirports
        a(id) = id * (sum(from_ids==id | to_ids==id) > 1000);
    end
    a(~a) = [];
    %a = sort(a);  % unnecessary - a is already sorted at this stage, by ascending ID!
end

这可能会进一步向量化，但我认为仅仅通过这些小变化产生的加速应该足以使性能调整的任何进一步投资成为学术而非实际问题。

Answer 2

为了加快速度，我建议

function tofrom = FlightsToFrom(flights, ID)
assert(~exist('sum','var'))
nflights=sum([flights.fromId]==ID)+sum([flights.toId]==ID);
if nflights <=1000
    tofrom=0;
else
    tofrom=ID;
end
end

关于第二个问题，机场是怎样的？目前您正在使用循环索引，这与机场ID相同吗？

如果可能，请提供一些示例数据。例如，这样一段代码可以生成与您的实际数据匹配的随机输入数据：

numOfAirports=100
numOfFlights=10000
for idx=1:numOfFlights
    flights(idx).fromId=randi(numOfAirports);flights(idx).toId=randi(numOfAirports);
end

Answer 3

您的代码需要花费时间的是在整个航班列表中循环6267次，因为您为每个机场调用FlightsToFrom功能...如果您只需循环一次航班即可解决此问题，程序将运行6267时间更快，这只是一小部分。我不完全确定你所说的“机场指数x，y的向量” - 这个向量应包含什么？无论如何，你至少可以这样做循环：

a=zeros(length(airports),length(airports));
for id = 1:length(flights)
     a[flights(1,id).from_id, flights(1,id).to_id] = a[flights(1,id).from_id, flights(1,id).to_id]  + 1; 
end

最后，a是一个矩阵，其中包含从不同机场发生航班的次数。例如。 a [5,6]将是从机场5到机场6的航班次数。然后，您可以使用MATLAB的内置功能对该矩阵进行操作，例如。

[row, col] = find(a>1000);

将为您提供发生超过1000次的航班的往返坐标。

highdepartures = find(sum(a,1)>1000)); 
highdarrivals = find(sum(a,2)>1000));

将分别为您提供出发/到达次数最多的机场坐标列表。

MATLAB功能：提高代码的速度/性能

3 个答案: