我正在使用.csv文件,我编写了这段代码来计算 year 列中每个值在csv数据集中出现的次数。
每当在个人系统上运行代码时,我都会在IndexError: list out of range
的第10行中获得row_year = suspension[5]
,但是当我在dataquest网站上运行代码时,代码运行良好。
csv数据集有7列,第5列代表年。
import csv
file = open("nfl_suspensions_data.csv")
nfl_suspensions = list(csv.reader(file))
nfl_suspensions = nfl_suspensions[1:]
years = {}
for suspension in nfl_suspensions:
row_year = suspension[5]
if row_year in years:
years[row_year] = years[row_year] + 1
else:
years[row_year] = 1
print(years)
答案 0 :(得分:0)
您的数据太短-您在其后面的列表中建立了索引。如果您确实有year
作为第5列,则应使用column[4]
来访问它-python索引基于0。
import csv file = open("nfl_suspensions_data.csv") nfl_suspensions = list(csv.reader(file)) nfl_suspensions = nfl_suspensions[1:] years = {}
for line_nr, suspension in enumerate(nfl_suspensions):
try:
row_year = suspension[5]
except IndexError:
# 0 based line_nr, line_nr + 1 due to removed header line
print("Data corrupt: less then 6 entries. Line:", line_nr+1)
print(suspension)
# skip this data
continue
if row_year in years: years[row_year] = years[row_year] + 1 else: years[row_year] = 1 print(years)
这遵循python Ask forgiveness not permission的哲学。
您还应该切换到
with open("nfl_suspensions_data.csv") as file:
nfl_suspensions = list(csv.reader(file))[1:]
这是读取文件的首选方式。请参见python.org - reading and writing files(请参见第二代码示例块)
您也可以利用collections.defaultdict
:
years = defaultdict(int) # above
并删除围绕
的if# if row_year in years:
years[row_year] += 1 # this should work using a defaultdict(int)
# else:
# years[row_year] = 1
可以完成任务的包括代码生成在内的短代码(年份位于[5] ==第6列):
import csv
from collections import Counter
# Create a demo data file with errors:
with open("nfl_suspensions_data.csv","w") as f:
for inter in range(1,10):
for y in range(1980,2001,inter):
f.write(f"na,na,na,na,na,{y},na,na\n")
# corrupt line
f.write(f"na,na,na,na\n")
# process and count the years:
with open("nfl_suspensions_data.csv") as file:
nfl_suspensions = list(csv.reader(file))[1:]
as_columns = list(zip(*[l for l in nfl_suspensions if len(l) > 6]))
print(Counter(as_columns[5]))
输出:
Counter({'1980': 8, '1992': 5, '1998': 5, '1986': 4, '1988': 4, '1996': 4,
'2000': 4, '1984': 3, '1989': 3, '1990': 3, '1994': 3, '1995': 3,
'1982': 2, '1983': 2, '1985': 2, '1987': 2, '1981': 1, '1991': 1,
'1993': 1, '1997': 1, '1999': 1})
您的逻辑已固定,适用于上面生成的数据:
def your_code_fixed(sus):
years = {}
for line_nr, suspension in enumerate(sus):
try:
row_year = suspension[5]
except IndexError:
# 0 based line_nr, line_nr + 1 due to removed header line
print("Data corrupt: less then 6 entries. Line:", line_nr+1)
print(suspension)
# skip this data
continue
if row_year in years:
years[row_year] = years[row_year] + 1
else:
years[row_year] = 1
print(years)
with open("nfl_suspensions_data.csv") as file:
nfl_suspensions = list(csv.reader(file))[1:]
your_code_fixed(nfl_suspensions)
输出上述数据文件:
Data corrupt: less then 6 entries. Line: 21
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 33
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 41
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 48
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 54
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 59
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 63
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 67
['na', 'na', 'na', 'na']
Data corrupt: less then 6 entries. Line: 71
['na', 'na', 'na', 'na']
{'1981': 1, '1982': 2, '1983': 2, '1984': 3, '1985': 2, '1986': 4, '1987': 2,
'1988': 4, '1989': 3, '1990': 3, '1991': 1, '1992': 5, '1993': 1, '1994': 3,
'1995': 3, '1996': 4, '1997': 1, '1998': 5, '1999': 1, '2000': 4, '1980': 8}