熊猫:如何用groupby值求和

时间:2018-04-23 13:20:25

标签: python pandas dataframe pandas-groupby multi-index

使用此:

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

df.groupby(['Team',"Rank"]).sum()

返回。

             Points
Team   Rank        
Devils 2        863
       3        673
Kings  1       1544
       3        741
       4        812
Riders 1        876
       2       2173
Royals 1        804
       4        701

如何提取等于'1'的值(点数),所以1544 + 876 + 804。 等级为2和3时相同。

6 个答案:

答案 0 :(得分:3)

我认为需要DataFrame.xs

print (df.xs(1, level=1))

        Points
Team          
Kings     1544
Riders     876
Royals     804

print (df.xs(2, level=1))

        Points
Team          
Devils     863
Riders    2173

要按多个条件选择,请使用slicers

idx = pd.IndexSlice
print (df.loc[idx[:, [1,2]], :])

             Points
Team   Rank        
Devils 2        863
Kings  1       1544
Riders 1        876
       2       2173
Royals 1        804
print (df.loc[idx['Riders', [1,2]], :])

             Points
Team   Rank        
Riders 1        876
       2       2173

如果希望在Rank之前将所有群组的总和从['Team',"Rank"]更改为Rank

s = df.groupby("Rank")['Points'].sum()
print (s)
Rank
1    3224
2    3036
3    1414
4    1513
Name: Points, dtype: int64

如果还需要df1,请按sum使用level=1

df1 = df.groupby(['Team',"Rank"]).sum()
print (df1)
             Points
Team   Rank        
Devils 2        863
       3        673
Kings  1       1544
       3        741
       4        812
Riders 1        876
       2       2173
Royals 1        804
       4        701

s1 = df1.sum(level=1)
print (s1)
      Points
Rank        
2       3036
3       1414
1       3224
4       1513

答案 1 :(得分:1)

一个选项

>>> df_group = df.groupby(['Team',"Rank"]).sum().reset_index()
     Team  Rank  Points
0  Devils     2     863
1  Devils     3     673
2   Kings     1    1544
3   Kings     3     741
4   Kings     4     812
5  Riders     1     876
6  Riders     2    2173
7  Royals     1     804
8  Royals     4     701

现在您只需过滤'Rank'

>>> df_group.loc[df_group['Rank']==1,'Points']
2    1544
5     876
7     804

另一个选项是再次按Rank进行分组,然后汇总为列表:

>>> df.groupby(['Team','Rank']).sum().reset_index().groupby('Rank')['Points'].agg(lambda x: list(x))
Rank
1    [1544, 876, 804]
2         [863, 2173]
3          [673, 741]
4          [812, 701]

或许你只是想按等级排序,很难分辨,因为你还没有提供所需的输出:

>>> df.groupby(['Team','Rank']).sum().reset_index().sort_values('Rank')
     Team  Rank  Points
2   Kings     1    1544
5  Riders     1     876
7  Royals     1     804
0  Devils     2     863
6  Riders     2    2173
1  Devils     3     673
3   Kings     3     741
4   Kings     4     812
8  Royals     4     701

答案 2 :(得分:1)

df[df['Rank'] == 1] # Filter by rank before summing

答案 3 :(得分:1)

我喜欢使用axis argument in .loc

df.groupby(['Team',"Rank"]).sum().loc(axis=0)[:,1]

输出:

             Points
Team   Rank        
Kings  1       1544
Riders 1        876
Royals 1        804

df.groupby(['Team',"Rank"]).sum().loc(axis=0)[:,2]

             Points
Team   Rank        
Devils 2        863
Riders 2       2173

或者@Jezrael没有pd.Slicers

df.groupby(['Team',"Rank"]).sum().loc(axis=0)[:,[1,2]]

             Points
Team   Rank        
Devils 2        863
Kings  1       1544
Riders 1        876
       2       2173
Royals 1        804

答案 4 :(得分:1)

您可以在求和后按等级重新排序:

import pandas as pd

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',
         'Kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
         'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
         'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

result = df.groupby(['Team', 'Rank']).sum().swaplevel().sort_index()
# Or just:
result = df.groupby(['Rank', 'Team']).sum()

print(result)

输出:

Rank Team
1    Kings     1544
     Riders     876
     Royals     804
2    Devils     863
     Riders    2173
3    Devils     673
     Kings      741
4    Kings      812
     Royals     701

答案 5 :(得分:1)

您可以尝试将Mo, Tu, We, Th, Fr, Sa 10:00-18:00 Su 12:00-17:00中的列交换为groupby

["Rank", "Team"]

结果:

grouped = df.groupby(["Rank", "Team"]).sum()
print(grouped)

然后,要获得任何等级的总和,您可以使用 Points Rank Team 1 Kings 1544 Riders 876 Royals 804 2 Devils 863 Riders 2173 3 Devils 673 Kings 741 4 Kings 812 Royals 701 。对于例如第一等级将是:

loc

结果:

grouped.loc[1].Points.sum()