我有一个项目,我正在尝试创建一个程序,该程序将从www.transtats.gov获取csv数据集,这是美国航空公司航班的数据集。我的目标是找到从一个机场到另一个机场的航班总体上最严重的延误,这意味着它是“最糟糕的航班”。到目前为止,我有这个:
`import csv
with open('826766072_T_ONTIME.csv') as csv_infile: #import and open CSV
reader = csv.DictReader(csv_infile)
total_delay = 0
flight_count = 0
flight_numbers = []
delay_totals = []
dest_list = [] #create empty list of destinations
for row in reader:
if row['ORIGIN'] == 'BOS': #only take flights leaving BOS
if row['FL_NUM'] not in flight_numbers:
flight_numbers.append(row['FL_NUM'])
if row['DEST'] not in dest_list: #if the dest is not already in the list
dest_list.append(row['DEST']) #append the dest to dest_list
for number in flight_numbers:
for row in reader:
if row['ORIGIN'] == 'BOS': #for flights leaving BOS
if row['FL_NUM'] == number:
if float(row['CANCELLED']) < 1: #if the flight is not cancelled
if float(row['DEP_DELAY']) >= 0: #and the delay is greater or equal to 0 (some flights had negative delay?)
total_delay += float(row['DEP_DELAY']) #add time of delay to total delay
flight_count += 1 #add the flight to total flight count
for row in reader:
for number in flight_numbers:
delay_totals.append(sum(row['DEP_DELAY']))`
我原以为我可以创建一个航班号列表和这些航班号的总延误列表,然后比较两者,看看哪个航班的延误总数最高。比较两个列表的最佳方法是什么?
答案 0 :(得分:2)
我不确定我是否理解正确,但我认为您应该使用'FL_NUM'
来实现此目的,其中key为{{1}}且值为总延迟。
答案 1 :(得分:1)
一般来说,我想消除Python代码中的循环。对于不大的文件,我通常会读取一次数据文件并构建一些我可以在最后分析的dict
。以下代码未经过测试,因为我没有原始数据,但遵循我将使用的一般模式。
由于航班由目的地,目的地和航班号确定,我会将其捕获为tuple
并将其作为我的字典中的关键字。
from collections import defaultdict
flight_delays = defaultdict(list) # look this up if you aren't familiar
for row in reader:
if row['ORIGIN'] == 'BOS': #only take flights leaving BOS
if row['CANCELLED'] > 0:
flight = (row['ORIGIN'], row['DEST'], row['FL_NUM'])
flight_delays[flight].append(float(row['DEP_DELAY']))
# Finished reading through data, now I want to calculate average delays
worst_flight = ""
worst_delay = 0
for flight, delays in flight_delays.items():
average_delay = sum(delays) / len(delays)
if average_delay > worst_delay:
worst_flight = flight[0] + " to " + flight[1] + " on FL#" + flight[2]
worst_delay = average_delay
答案 2 :(得分:0)
一个非常简单的解决方案。添加两个新变量:
max_delay = 0
delay_flight = 0
# Change: if float(row['DEP_DELAY']) >= 0: FOR:
if float(row['DEP_DELAY']) > max_delay:
max_delay = float(row['DEP_DELAY'])
delay_flight = #save the row number or flight number for reference.