python - 使用columnB中的条件从columnA汇总新列的值

时间:2016-11-07 02:11:18

标签: python pandas

我有一个像这样的pandas数据框:

    team  W   L  GF  GA       date  home_ind  last10
67   ARI  1   0   3   2 2016-11-01         1       1
99   ARI  1   0   2   2 2016-11-03         1       1
129  ARI  1   0   4   3 2016-10-15         1       1
171  ARI  1   0   5   4 2016-10-27         0       1
241  ARI  0  10   1   5 2016-11-04         0       0
316  ARI  0  10   3   5 2016-10-25         0       1
331  ARI  0  10   2   3 2016-10-21         0       1
334  ARI  0  10   2   3 2016-10-29         1       1
335  ARI  0  10   2   5 2016-10-20         0       1
340  ARI  0  10   4   7 2016-10-18         0       1
341  ARI  0  10   2   3 2016-10-23         0       1

我有30个不同团队的这些信息。

我想要做的是根据其他列的条件,将一列中的值加起来。

例如,我想要一个新的列来添加GF的值,但仅当home_ind = 1且last10 = 1 AND team = ARI时。结果的值与每个团队的列的值相同。因此,对于我列出的示例,结果将如下所示:

    team  W   L  GF  GA       date  home_ind  last10   GF_H_10
67   ARI  1   0   3   2 2016-11-01         1       1        11
99   ARI  1   0   2   2 2016-11-03         1       1        11
129  ARI  1   0   4   3 2016-10-15         1       1        11
171  ARI  1   0   5   4 2016-10-27         0       1         0
241  ARI  0  10   1   5 2016-11-04         0       0         0
316  ARI  0  10   3   5 2016-10-25         0       1         0
331  ARI  0  10   2   3 2016-10-21         0       1         0
334  ARI  0  10   2   3 2016-10-29         1       1        11
335  ARI  0  10   2   5 2016-10-20         0       1         0
340  ARI  0  10   4   7 2016-10-18         0       1         0
341  ARI  0  10   2   3 2016-10-23         0       1         0

2 个答案:

答案 0 :(得分:0)

怎么样:

首先制作一个名为criteria的布尔切片器,然后使用赋值:

criteria = (df['home_ind'] == 1) & (df['last10'] == 1) &  (df['team'] == 'ARI')
df.loc[criteria,'GF_H_10'] = df[criteria]['GF'].sum()

给出:

     GA  GF   L  W        date  home_ind  last10 team  GF_H_10
67    2   3   0  1  2016-11-01         1       1  ARI  11.0000
99    2   2   0  1  2016-11-03         1       1  ARI  11.0000
129   3   4   0  1  2016-10-15         1       1  ARI  11.0000
171   4   5   0  1  2016-10-27         0       1  ARI      nan
241   5   1  10  0  2016-11-04         0       0  ARI      nan
316   5   3  10  0  2016-10-25         0       1  ARI      nan
331   3   2  10  0  2016-10-21         0       1  ARI      nan
334   3   2  10  0  2016-10-29         1       1  ARI  11.0000
335   5   2  10  0  2016-10-20         0       1  ARI      nan
340   7   4  10  0  2016-10-18         0       1  ARI      nan
341   3   2  10  0  2016-10-23         0       1  ARI      nan

然后使纳米变为0.0:

df['GF_H_10'].fillna(0.0,inplace=True)

答案 1 :(得分:0)

此处的其他解决方案特定于ARI团队。这会在团队中执行groupby,允许其他30个团队完成操作。我不确定你要追哪。

在团队中执行groupby,然后将结果加入原始数据框是此解决方案背后的主要思想。之后会根据您定义的资格标准进行清理。

import pandas as pd

# sample data
df = pd.DataFrame({'team':['ARI']*11+['BWI']*4,
                   'W':[1]*4+[0]*7+[1,1,0,0],
                   'GF':[3,2,4,5,1,3,2,2,2,4,2,2,2,2,2],
                   'GA':[2,2,3,4,5,5,3,3,5,7,3,1,1,1,1],
                   'home_ind':[1,1,1,0,0,0,0,1,0,0,0,1,1,0,0],
                   'last10':[1]*4+[0]+[1]*6+[1,0,1,1]})

# define a mask
df2 = df.assign(elig=(df['home_ind'] == 1) & (df['last10'] == 1))


# group on team and join the results to the original dataframe
df2 = df2.join(df2[df2['elig']].groupby('team')['GF'].sum(), on='team', rsuffix='_H_10')

# clean up the result column
df2.loc[~df2['elig'], 'GF_H_10'] = 0

给定数据框

    GA  GF  W  home_ind  last10 team
0    2   3  1         1       1  ARI
1    2   2  1         1       1  ARI
2    3   4  1         1       1  ARI
3    4   5  1         0       1  ARI
4    5   1  0         0       0  ARI
5    5   3  0         0       1  ARI
6    3   2  0         0       1  ARI
7    3   2  0         1       1  ARI
8    5   2  0         0       1  ARI
9    7   4  0         0       1  ARI
10   3   2  0         0       1  ARI
11   1   2  1         1       1  BWI
12   1   2  1         1       0  BWI
13   1   2  0         0       1  BWI
14   1   2  0         0       1  BWI

输出

    GA  GF  W  home_ind  last10 team   elig  GF_H_10
0    2   3  1         1       1  ARI   True       11
1    2   2  1         1       1  ARI   True       11
2    3   4  1         1       1  ARI   True       11
3    4   5  1         0       1  ARI  False        0
4    5   1  0         0       0  ARI  False        0
5    5   3  0         0       1  ARI  False        0
6    3   2  0         0       1  ARI  False        0
7    3   2  0         1       1  ARI   True       11
8    5   2  0         0       1  ARI  False        0
9    7   4  0         0       1  ARI  False        0
10   3   2  0         0       1  ARI  False        0
11   1   2  1         1       1  BWI   True        2
12   1   2  1         1       0  BWI  False        0
13   1   2  0         0       1  BWI  False        0
14   1   2  0         0       1  BWI  False        0