创建字典,提取平均数组

时间:2017-10-16 06:53:18

标签: python dictionary

我正在努力做回归年,平均每年的成绩。 我试图做的是创建一个适合年份的字典:年级,然后获得另一个年份字典:sum_of_grade然后等等。

来自csv文件,它有两个标题:年份和年级

Year  Grade
2001  100
2002  99
2001  88
2003  11
2005  55

还有更多,但我认为有必要获得整个数据。

def construct_values(file):
    """
    Construct the values needed to graph the average grade of the class over time

    Parameters
    ----------
    file_path: A string. Absolute path to file.

    Returns
    -------
    years: array of integers
    average_grades: array of floats
    """
    years, average_grades = [], []
    grades = []
    d = {}
    with open(file,'r') as f:
        next(f)
        for line in f:
            year, grade = (s.strip() for s in line.split(','))
            years.append(year) # array year
            grades.append(grade) # array grade
            d = dict(zip(years,grades)) # dict year:grade

        for i,j in d:
            # i for count frequencies of years
            # j for summation of grades
            # j/i for average grade and extract as array


        return years, average_grades

我试图说清楚,但如果还不清楚,请告诉我。

3 个答案:

答案 0 :(得分:1)

使用时出现问题:

d = dict(zip(years,grades)) # dict year:grade

以输入数据为例,它会生成一个像:

这样的字典
{2001: 88, 2002: 99, 2003:11, 2005: 55}

因为在构造字典中存在重复键时,值为覆盖。

所以,要实现这一点,我建议使用另一个dict生成方法,做这样的事情:

def construct_values(file):
    """
    Construct the values needed to graph the average grade of the class over time

    Parameters
    ----------
    file_path: A string. Absolute path to file.

    Returns
    -------
    years: array of integers
    average_grades: array of floats
    """
    years, average_grades = [], []
    # grades = []      This variable don't need anymore
    d = {}
    with open(file,'r') as f:
        next(f)
        for line in f:
            year, grade = (s.strip() for s in line.split(','))

            # here is the begin line difference from your code
            if year not in d:
                d[year] = [int(grade), 1]
            else:
                d[year][0] += int(grade)
                d[year][1] += 1

        for year, grade_info in d.items():
            years.append(year)
            average_grades.append(grade_info[0] / grade_info[1])
            # end difference from your code

        return years, average_grades

在中间字典 d 中,值保存有关[sum_of_grade,times_appeared_in_the_year]的信息,因此当您迭代字典时,可以轻松使用 sum_of_grade / times_appeared_in_the_year 计算平均值。

因此,您无需使用其他变量成绩

答案 1 :(得分:1)

一旦你看到一张桌子(csv文件就是一张)你就应该想到熊猫(我的意见)。

这是一个熊猫解决方案:

-invitePeer:toSession:withContext:timeout:

<强> year_grade:

import pandas as pd
import io

csv = """Year,Grade
2001,100
2002,99
2001,88
2003,11
2005,55"""

df = pd.read_csv(io.StringIO(csv))

year_grade = {k: list(v) for k,v in df.groupby("Year")["Grade"]}
year_avg_grade = df.groupby("Year")["Grade"].mean().to_dict()

<强> year_avg_grade:

{2001: [100, 88], 2002: [99], 2003: [11], 2005: [55]}

答案 2 :(得分:0)

创建为dict(zip(years,grades))时,重复键将不允许在字典中。所以最好使用除字典之外的替代方法。

有点像这样。

from itertools import groupby
combined = zip(year,grade)    
for n,g in groupby(sorted(combined, key = lambda x:x[0]),key=lambda x:x[0]):
    grades = [int(i[1])for i in g]
    print 'year : %s average : %s' %(n,sum(grades)/len(grades))

结果:

year : 2001 average : 94
year : 2002 average : 99
year : 2003 average : 11
year : 2005 average : 55