总结Python中数据框的所有先前值

时间:2018-11-08 21:01:17

标签: python python-3.x pandas dataframe pandas-groupby

我的数据如下:

Year         Month          Region           Value
1978           1             South             1
1990           1             North             22
1990           2             South             33
1990           2             Mid W             12
1998           1             South             1
1998           1             North             12
1998           2             South             2
1998           3             South             4
1998           1             Mid W             2
.
.

up to
2010
2010

我的结束日期是2010年,但是我想通过将所有以前的年份值加在一起来汇总地区月份的所有。 / p>

我不希望有一个常规的累积总和,而是一个按区域划分的每月累积总金额,其中,区域南的第1个月是该区域前一个月所有前1个月的区域南的累积第1个月,等等....

所需的输出类似于:

Month          Region        Cum_Value
 1             South            2
 2             South            34
 3             South            4
 .
 .
 1             North            34
 2             North            10
 .
 .
 1             MidW              2
 2             MidW              12

2 个答案:

答案 0 :(得分:1)

pd.DataFrame.groupbypd.DataFrame.cumsum一起使用

df1['cumsum'] = df1.groupby(['Month', 'Region'])['Value'].cumsum()

结果:

   Year  Month Region  Value  cumsum
0  1978      1  South    1.0     1.0
1  1990      1  North   22.0    22.0
2  1990      2  South   33.0    33.0
3  1990      2  Mid W   12.0    12.0
4  1998      1  South    1.0     2.0
5  1998      1  North   12.0    34.0
6  1998      2  South    2.0    35.0
7  1998      3  South    4.0     4.0
8  1998      1  Mid W    2.0     2.0

答案 1 :(得分:1)

这是另一个与您的预期输出更相符的解决方案。

df = pd.DataFrame({'Year': [1978,1990,1990,1990,1998,1998,1998,1998,1998],
              'Month': [1,1,2,2,1,1,2,3,1],
              'Region': ['South','North','South','Mid West','South','North','South','South','Mid West'],
              'Value' : [1,22,33,12,1,12,2,4,2]})

#DataFrame Result
    Year  Month Region  Value
0   1978    1   South    1
1   1990    1   North    22
2   1990    2   South    33
3   1990    2   Mid West 12
4   1998    1   South    1
5   1998    1   North    12
6   1998    2   South    2
7   1998    3   South    4
8   1998    1   Mid West 2

要运行的代码:

df1 = df.groupby(['Month','Region']).sum()
df1 = df1.drop('Year',axis=1)
df1 = df1.sort_values(['Month','Region'])

#Final Result

Month   Region  Value
1      Mid West  2
1      North     34
1      South     2
2      Mid West  12
2      South     35
3      South     4