我想按国家/地区按年份对数据进行分组,并使用pandas汇总值列。目前我正在阅读csv文件并使用以下内容:
data_cleaned= df.groupby(['Country', 'year'], as_index=False).sum()
以下是我的数据集示例:
Country year value
Angola 2009 0
Angola 2009 0
Angola 2010 0
Angola 2010 0
Angola 2010 0
Angola 2010 0
Angola 2011 0
Angola 2011 0
Angola 2011 0
Angola 2011 0
Angola 2012 118
Angola 2012 0
Angola 2012 0
Angola 2012 0
Angola 2013 0
Angola 2013 0
Angola 2013 0
Angola 2013 0
Angola 2014 0
Angola 2014 0
Angola 2014 0
Angola 2014 0
Angola 2015 0
Angola 2015 0
Angola 2015 0
Angola 2015 0
Angola 2016 0
Angola 2016 0
Angola 2016 0
Angola 2016 0
Angola 2017 0
Australia 2009 0
Australia 2009 14
Australia 2009 0
Australia 2009 12
Australia 2010 0
Australia 2010 0
Australia 2010 54
Australia 2010 6
Australia 2011 0
Australia 2011 4
Australia 2011 17
Australia 2011 13
Australia 2012 8
Australia 2012 2
Australia 2012 4
Australia 2012 105
Australia 2013 0
Australia 2013 5
Australia 2013 0
Australia 2013 0
Australia 2014 0
Australia 2014 0
Australia 2014 0
Australia 2014 0
Australia 2015 0
Australia 2015 0
Australia 2015 0
Australia 2015 0
Australia 2016 0
Australia 2016 0
Australia 2016 0
Australia 2016 0
Australia 2017 0
但我得到以下结果:
Partner Country year value
0 Angola 2009 0.00
1 Angola 2010 0.00
2 Angola 2011 0.00
3 Angola 2012 86,280.00
4 Angola 2013 0.00
5 Angola 2014 0.00
6 Angola 2015 0.00
7 Angola 2016 0.00
8 Angola 2017 0.00
9 Australia 2009 54,879.00
10 Australia 2010 67,899.00
11 Australia 2011 50,965.00
12 Australia 2012 332,128.00
13 Australia 2013 16,515.00
14 Australia 2014 0.00
15 Australia 2015 0.00
16 Australia 2016 0.00
17 Australia 2017 0.00
这显然是错误的,因为安哥拉在2012年只有一个非零价值,这是正确的年份,但是我预计会有118而不是86,280.00。有人可能会指出我做错了什么以及如何根据国家和年份列正确地对值列进行求和?