我正在分析Excel中包含多个列的数据。我已经从正在分析的那些列中提取出来。根据现有列的某些条件,我想创建一些新列。
首先,我的示例数据框架如下:
df = pd.DataFrame()
df['Match'] = ['A','A','A','A','A','B','B','B','B','B',]
df['HomeGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df['AwayGoal'] = [ 0,1,2,3,4,0,1,2,3,4]
df ['AOS'] = [0.12,0.12,0.12,0.12,0.12,0.06,0.06,0.06,0.06,0.06]
df ['% Prob'] = [0.15,0.12,0.10,0.08,0.05,0.18,0.15,0.10,0.08,0.05]
数据框包含匹配, HomeGoal , AwayGoal , AOS 和%Prob 。
我要创建以下列
Homegoal <1
HomeGoal <2
HomeGoal <3
HomeGoal >=1
HomeGoal >=2
HomeGoal >=3
每列包含满足以下条件的%prob的总和:
Homegoal <1 ==> sum of the colums % Prob where Homegoal less than 1
HomeGoal <2 ==> sum of the colums % Prob where Homegoal less than 2
HomeGoal <3 ==> sum of the colums % Prob where Homegoal less than 3
HomeGoal >=1 ==> sum of the colums % Prob and AOS where Homegoal 1 goals and above
HomeGoal >=2 ===> sum of the colums % Prob and AOS where Homegoal 2 goals and above
HomeGoal >=3 ==> sum of the colums % Prob and AOS where Homegoal 2 goals and above
上述所有这些计算都是基于每个匹配项。
请问您如何做?
我附上了预期的结果,如下:
答案 0 :(得分:2)
使用:
L = [1,2,3]
for v in L:
#new column name
col = 'HG>={}'.format(v)
#filter by condition
df1 = df[df['HomeGoal'] >= v]
#new Series filled by aggregated values per groups and added column AOS
df[col] = df1.groupby('Match')['% Prob'].transform('sum') + df['AOS']
#only first non missing value per group
mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match'])
df[col] = df[col].mask(~mask, 0)
for v in L:
col = 'HG>{}'.format(v)
df[col] = df[df['HomeGoal'] < v].groupby('Match')['% Prob'].transform('sum')
mask = ~df.dropna(subset=[col]).duplicated(subset=[col, 'Match'])
df[col] = df[col].mask(~mask, 0)
print (df)
Match HomeGoal AwayGoal AOS % Prob HG>=1 HG>=2 HG>=3 HG>1 HG>2 \
0 A 0 0 0.12 0.15 0.00 0.00 0.00 0.15 0.27
1 A 1 1 0.12 0.12 0.47 0.00 0.00 0.00 0.00
2 A 2 2 0.12 0.10 0.00 0.35 0.00 0.00 0.00
3 A 3 3 0.12 0.08 0.00 0.00 0.25 0.00 0.00
4 A 4 4 0.12 0.05 0.00 0.00 0.00 0.00 0.00
5 B 0 0 0.06 0.18 0.00 0.00 0.00 0.18 0.33
6 B 1 1 0.06 0.15 0.44 0.00 0.00 0.00 0.00
7 B 2 2 0.06 0.10 0.00 0.29 0.00 0.00 0.00
8 B 3 3 0.06 0.08 0.00 0.00 0.19 0.00 0.00
9 B 4 4 0.06 0.05 0.00 0.00 0.00 0.00 0.00
HG>3
0 0.37
1 0.00
2 0.00
3 0.00
4 0.00
5 0.43
6 0.00
7 0.00
8 0.00
9 0.00
答案 1 :(得分:-1)
您可以使用numpy.where
1的示例:
import numpy as np
df['HG>=1'] = np.where(df['HomeGoal']>=1,
'insert your pass condition logic calculation here',
'insert your fail condition logic calculation here')
我不了解您的通过/失败逻辑计算,因此您将必须提供或自己输入。