我有一个看起来像这样的数据框:
YEAR | REGION | POWER |
2009 | West | 1.66 |
2009 | West | 1.77 |
2009 | East | 10.6 |
2009 | East | 8.7 |
2010 | West | 11.9 |
2010 | North | 14.8 |
2010 | North | 4.6 |
2010 | West | 3.0 |
2011 | East | 7.0 |
2011 | East | 9.66 |
我想对年份和区域分组的 POWER 的数值求和,以便得到类似的东西:>
YEAR | REGION | POWER |
2009 | West | 3.43 |
2009 | East | 19.3 |
2010 | West | 11.9 |
2010 | North | 19.4 |
2010 | West | 3.0 |
2011 | East | 16.66 |
我尝试过:
df.groupby(['YEAR', 'REGION'])['POWER'].sum()
但是我得到了一个与POWER并排的值而不是求和的系列。
任何人都可以帮助执行此操作吗?
答案 0 :(得分:2)
在sum
上运行groupby
,然后将reset_index()
展平。像这样:
df.groupby(['YEAR', 'REGION']).sum().reset_index()
# YEAR REGION POWER
# 0 2009 East 19.30
# 1 2009 West 3.43
# 2 2010 North 19.40
# 3 2010 West 14.90
# 4 2011 East 16.66
答案 1 :(得分:0)
使用shift
和cumsum
创建一个分组列表列:
df['grp'] = df.groupby(['YEAR'])['REGION'].apply(lambda x: (x != x.shift(1).bfill()).cumsum())
df_out = df.groupby(['YEAR','REGION','grp'], sort=False).sum().reset_index()
df_out = df_out.drop('grp', axis=1)
输出:
YEAR REGION POWER
0 2009 West 3.43
1 2009 East 19.30
2 2010 West 11.90
3 2010 North 19.40
4 2010 West 3.00
5 2011 East 16.66
详细说明聚集之前的grouper列,grp的外观。对于每年,请检查到以前记录的区域,如果不同,则增加1。然后,在该年的总和中创建组。
YEAR REGION POWER grp
0 2009 West 1.66 0
1 2009 West 1.77 0
2 2009 East 10.60 1
3 2009 East 8.70 1
4 2010 West 11.90 0
5 2010 North 14.80 1
6 2010 North 4.60 1
7 2010 West 3.00 2
8 2011 East 7.00 0
9 2011 East 9.66 0