我正在努力做回归年,平均每年的成绩。 我试图做的是创建一个适合年份的字典:年级,然后获得另一个年份字典:sum_of_grade然后等等。
来自csv文件,它有两个标题:年份和年级
Year Grade
2001 100
2002 99
2001 88
2003 11
2005 55
还有更多,但我认为有必要获得整个数据。
def construct_values(file):
"""
Construct the values needed to graph the average grade of the class over time
Parameters
----------
file_path: A string. Absolute path to file.
Returns
-------
years: array of integers
average_grades: array of floats
"""
years, average_grades = [], []
grades = []
d = {}
with open(file,'r') as f:
next(f)
for line in f:
year, grade = (s.strip() for s in line.split(','))
years.append(year) # array year
grades.append(grade) # array grade
d = dict(zip(years,grades)) # dict year:grade
for i,j in d:
# i for count frequencies of years
# j for summation of grades
# j/i for average grade and extract as array
return years, average_grades
我试图说清楚,但如果还不清楚,请告诉我。
答案 0 :(得分:1)
使用时出现问题:
d = dict(zip(years,grades)) # dict year:grade
以输入数据为例,它会生成一个像:
这样的字典{2001: 88, 2002: 99, 2003:11, 2005: 55}
因为在构造字典中存在重复键时,值为覆盖。
所以,要实现这一点,我建议使用另一个dict生成方法,做这样的事情:
def construct_values(file):
"""
Construct the values needed to graph the average grade of the class over time
Parameters
----------
file_path: A string. Absolute path to file.
Returns
-------
years: array of integers
average_grades: array of floats
"""
years, average_grades = [], []
# grades = [] This variable don't need anymore
d = {}
with open(file,'r') as f:
next(f)
for line in f:
year, grade = (s.strip() for s in line.split(','))
# here is the begin line difference from your code
if year not in d:
d[year] = [int(grade), 1]
else:
d[year][0] += int(grade)
d[year][1] += 1
for year, grade_info in d.items():
years.append(year)
average_grades.append(grade_info[0] / grade_info[1])
# end difference from your code
return years, average_grades
在中间字典 d 中,值保存有关[sum_of_grade,times_appeared_in_the_year]的信息,因此当您迭代字典时,可以轻松使用 sum_of_grade / times_appeared_in_the_year 计算平均值。
因此,您无需使用其他变量成绩
答案 1 :(得分:1)
一旦你看到一张桌子(csv文件就是一张)你就应该想到熊猫(我的意见)。
这是一个熊猫解决方案:
-invitePeer:toSession:withContext:timeout:
<强> year_grade:强>
import pandas as pd
import io
csv = """Year,Grade
2001,100
2002,99
2001,88
2003,11
2005,55"""
df = pd.read_csv(io.StringIO(csv))
year_grade = {k: list(v) for k,v in df.groupby("Year")["Grade"]}
year_avg_grade = df.groupby("Year")["Grade"].mean().to_dict()
<强> year_avg_grade:强>
{2001: [100, 88], 2002: [99], 2003: [11], 2005: [55]}
答案 2 :(得分:0)
创建为dict(zip(years,grades))
时,重复键将不允许在字典中。所以最好使用除字典之外的替代方法。
有点像这样。
from itertools import groupby
combined = zip(year,grade)
for n,g in groupby(sorted(combined, key = lambda x:x[0]),key=lambda x:x[0]):
grades = [int(i[1])for i in g]
print 'year : %s average : %s' %(n,sum(grades)/len(grades))
结果:
year : 2001 average : 94
year : 2002 average : 99
year : 2003 average : 11
year : 2005 average : 55