Python:跨年提取相同的值

时间:2017-10-03 14:50:35

标签: python

问题:编写一个函数,在多年内提取相同的值,并计算连续值之间的差异,以显示出生数是增加还是减少。例如,星期六的出生人数在1994年至2003年间每年如何变化?

我明智地提取了一周中的数据,也提到了明年的数据(请参阅下面的代码)。但是,我需要它:每周/每周/出生人数。得到这个输出后,我希望看到多年来明智的变化(1995年的星期日出生与1994年的星期日出生相比)。

资料来源:(https://raw.githubusercontent.com/fivethirtyeight/data/master/births/US_births_1994-2003_CDC_NCHS.csv

例如:

         {1994, 1, 124567,
          1994, 2, 524652...
      ....2003, 7, 452456} 

...早期工作......

代码:本周的日子

def dow_births(lst_of_lsts):
    birth_per_day = dict()

    for row in lst_of_lsts:
        day_of_week = row[3]
        births = row[4]

        if day_of_week in birth_per_day:
            birth_per_day[day_of_week] += births
        else:
            birth_per_day[day_of_week] = births

    return birth_per_day  

cdc_day_births = dow_births(cdc_list)  

输入

cdc_day_births

输出:

{1: 5789166,
 2: 6446196,
 3: 6322855,
 4: 6288429,
 5: 6233657,
 6: 4562111,
 7: 4079723}

代码:YEAR WISE

def calc_counts(data, column):
    sum_dict = dict()

    for row in data:
        column_value = row[column]
        births = row[4]
        if column_value in sum_dict:
            sum_dict[column_value] += births
        else:
            sum_dict[column_value] = births

    return sum_dict

输入:

cdc_year_births

输出:

{1994: 3952767,
 1995: 3899589,
 1996: 3891494,
 1997: 3880894,
 1998: 3941553,
 1999: 3959417,
 2000: 4058814,
 2001: 4025933,
 2002: 4021726,
 2003: 4089950}

1 个答案:

答案 0 :(得分:0)

如果我理解正确,你想要按一年中某一天给出的出生日总和。

这可以让你得到你想要的东西

from collections import defaultdict

saturday_births = defaultdict(list)                                                           
for row in data:                                                                           
    if int(row[3]) == 6:  # day of the week
        # Create a dict where year is the key and the births are the values                                         
        saturday_births[int(row[0])].append(int(row[4]))
        # Aggregate the values with sum()                                                                                                
        sum_births_per_year = [[year, sum(births)] for year,births in saturday_births.items()]

输出:

[[1994, 474732],
[1995, 459580],
[1996, 456261], 
[1997, 450840], 
[1998, 453776], 
[1999, 449985], 
[2000, 469794], 
[2001, 453928], 
[2002, 445770], 
[2003, 447445]]