从句子中获取数字并计算平均值。

时间:2017-04-10 09:18:56

标签: python python-2.7 pandas numpy

我有一张包含以下数据的表格:

team1,team2,outcome
AA,BB,BB won by 90 runs
AA,CC,AA won by 19 runs (D/L method)
CC,BB,CC won by 26 runs (D/L method)
AA,BB,BB won by 56 runs
CC,BB,CC won by 18 runs

我需要选择数值并计算按team1,team2分组的平均值。

这是迄今为止所拥有的。因此很多垃圾数据我只过滤有需要的记录!

 df[df['outcome'].str.contains('runs',na=False)].head()

我想要的结果:

team1 , team2 , AVG(NUMERIC COLUMN FROM 'OUTCOME')

请建议!

1 个答案:

答案 0 :(得分:1)

您可以先使用extract转换为int,然后groupby并合并mean

df.outcome = df.outcome.str.extract('(\d+)', expand=False).astype(int)
print (df.groupby(['team1','team2'], as_index=False)['outcome'].mean())
  team1 team2  outcome
0    AA    BB       73
1    AA    CC       19
2    CC    BB       22

类似的解决方案:

s = df.outcome.str.extract('(\d+)', expand=False).astype(int)
print (s.groupby([df['team1'],df['team2']]).mean().reset_index())
  team1 team2  outcome
0    AA    BB       73
1    AA    CC       19
2    CC    BB       22