我有一张包含以下数据的表格:
team1,team2,outcome
AA,BB,BB won by 90 runs
AA,CC,AA won by 19 runs (D/L method)
CC,BB,CC won by 26 runs (D/L method)
AA,BB,BB won by 56 runs
CC,BB,CC won by 18 runs
我需要选择数值并计算按team1,team2分组的平均值。
这是迄今为止所拥有的。因此很多垃圾数据我只过滤有需要的记录!
df[df['outcome'].str.contains('runs',na=False)].head()
我想要的结果:
team1 , team2 , AVG(NUMERIC COLUMN FROM 'OUTCOME')
请建议!
答案 0 :(得分:1)
您可以先使用extract
转换为int
,然后groupby
并合并mean
:
df.outcome = df.outcome.str.extract('(\d+)', expand=False).astype(int)
print (df.groupby(['team1','team2'], as_index=False)['outcome'].mean())
team1 team2 outcome
0 AA BB 73
1 AA CC 19
2 CC BB 22
类似的解决方案:
s = df.outcome.str.extract('(\d+)', expand=False).astype(int)
print (s.groupby([df['team1'],df['team2']]).mean().reset_index())
team1 team2 outcome
0 AA BB 73
1 AA CC 19
2 CC BB 22