按2类分组然后取总和

时间:2017-11-18 06:34:09

标签: python

df.groupby('croho subonderdeel').sum()

以上输出来自:

df.groupby('croho subonderdeel').sum()

我拿了每个类别的毕业生总数,但我也希望每栏都这样做。例如,只接收第一列'2011 MAN'的输出。

我尝试了以下内容:

df.groupby('croho subonderdeel','2011 MAN').sum() 

然后我收到以下错误:

ValueError: No axis named 2011 MAN for object type <class 'pandas.core.frame.DataFrame'>

然后我想也许不是分组两次,我需要拿“2011 MAN”的总和。所以我试过了:

df.groupby('croho subonderdeel').sum('2011 MAN')

然后我收到此错误:

TypeError: f() takes 1 positional argument but 2 were given

有人可以向我解释,为什么我尝试的两种方法都不可能?也许我可以自己解决这个问题。

1 个答案:

答案 0 :(得分:1)

您需要在QF145 Sydney 18 Nov 05:05pm Est 05:32pm Processing AA7377 QF145 Sydney 18 Nov 05:05pm Est 05:32pm Processing BA7421 QF145 Sydney 18 Nov 05:05pm Est 05:32pm Processing CZ7575 QF145 Sydney 18 Nov 05:05pm Est 05:32pm Processing 中指定列,如:

[]

您还可以指定多个列:

df.groupby('croho subonderdeel')['2011 MAN'].sum() 

如果需要df.groupby('croho subonderdeel')['2011 MAN', '2012 MAN'].sum() 输出添加参数2 columns

as_index=False

或者:

df.groupby('croho subonderdeel', as_index=False)['2011 MAN'].sum() 

但如果想按2个类别(2列​​)分组,请df.groupby('croho subonderdeel')['2011 MAN'].sum().reset_index() 添加[]

groupby

样品:

df.groupby(['croho subonderdeel', 'another col'])['2011 MAN'].sum()
df = pd.DataFrame({'another col':list('efefef'),
                   '2011 MAN':[4,5,4,5,5,4],
                   '2011 WROUW':[7,8,9,4,2,3],
                   '2012 MAN':[1,3,5,7,1,0],
                   '2012 WROUW':[5,3,6,9,2,4],
                   'croho subonderdeel':list('aaabbb')})

print (df)
   2011 MAN  2011 WROUW  2012 MAN  2012 WROUW another col croho subonderdeel
0         4           7         1           5           e                  a
1         5           8         3           3           f                  a
2         4           9         5           6           e                  a
3         5           4         7           9           f                  b
4         5           2         1           2           e                  b
5         4           3         0           4           f                  b

print(df.groupby('croho subonderdeel')['2011 MAN'].sum())
croho subonderdeel
a    13
b    14
Name: 2011 MAN, dtype: int64

print(df.groupby('croho subonderdeel', as_index=False)['2011 MAN'].sum())
  croho subonderdeel  2011 MAN
0                  a        13
1                  b        14

print(df.groupby('croho subonderdeel')['2011 MAN'].sum().reset_index())
  croho subonderdeel  2011 MAN
0                  a        13
1                  b        14
print(df.groupby('croho subonderdeel')['2011 MAN', '2012 WROUW'].sum())
                    2011 MAN  2012 WROUW
croho subonderdeel                      
a                         13          14
b                         14          15

print(df.groupby('croho subonderdeel', as_index=False)['2011 MAN', '2012 WROUW'].sum())
  croho subonderdeel  2011 MAN  2012 WROUW
0                  a        13          14
1                  b        14          15
print (df.groupby(['croho subonderdeel', 'another col'])['2011 MAN'].sum())
croho subonderdeel  another col
a                   e              8
                    f              5
b                   e              5
                    f              9
Name: 2011 MAN, dtype: int64

print (df.groupby(['croho subonderdeel', 'another col'], as_index=False)['2011 MAN'].sum())
  croho subonderdeel another col  2011 MAN
0                  a           e         8
1                  a           f         5
2                  b           e         5
3                  b           f         9