我的数据如下:
Year Month Region Value
1978 1 South 1
1990 1 North 22
1990 2 South 33
1990 2 Mid W 12
1998 1 South 1
1998 1 North 12
1998 2 South 2
1998 3 South 4
1998 1 Mid W 2
.
.
up to
2010
2010
我的结束日期是2010年,但是我想通过将所有以前的年份值加在一起来汇总地区和月份的所有值。 / p>
我不希望有一个常规的累积总和,而是一个按区域划分的每月累积总金额,其中,区域南的第1个月是该区域前一个月所有前1个月的区域南的累积第1个月,等等....
所需的输出类似于:
Month Region Cum_Value
1 South 2
2 South 34
3 South 4
.
.
1 North 34
2 North 10
.
.
1 MidW 2
2 MidW 12
答案 0 :(得分:1)
将pd.DataFrame.groupby
与pd.DataFrame.cumsum
一起使用
df1['cumsum'] = df1.groupby(['Month', 'Region'])['Value'].cumsum()
结果:
Year Month Region Value cumsum
0 1978 1 South 1.0 1.0
1 1990 1 North 22.0 22.0
2 1990 2 South 33.0 33.0
3 1990 2 Mid W 12.0 12.0
4 1998 1 South 1.0 2.0
5 1998 1 North 12.0 34.0
6 1998 2 South 2.0 35.0
7 1998 3 South 4.0 4.0
8 1998 1 Mid W 2.0 2.0
答案 1 :(得分:1)
这是另一个与您的预期输出更相符的解决方案。
df = pd.DataFrame({'Year': [1978,1990,1990,1990,1998,1998,1998,1998,1998],
'Month': [1,1,2,2,1,1,2,3,1],
'Region': ['South','North','South','Mid West','South','North','South','South','Mid West'],
'Value' : [1,22,33,12,1,12,2,4,2]})
#DataFrame Result
Year Month Region Value
0 1978 1 South 1
1 1990 1 North 22
2 1990 2 South 33
3 1990 2 Mid West 12
4 1998 1 South 1
5 1998 1 North 12
6 1998 2 South 2
7 1998 3 South 4
8 1998 1 Mid West 2
要运行的代码:
df1 = df.groupby(['Month','Region']).sum()
df1 = df1.drop('Year',axis=1)
df1 = df1.sort_values(['Month','Region'])
#Final Result
Month Region Value
1 Mid West 2
1 North 34
1 South 2
2 Mid West 12
2 South 35
3 South 4