将计算列添加到pandas中的Dataframe中

时间:2015-09-05 01:09:22

标签: python pandas dataframe mean calculated-columns

导入了一个大型csv文件。以下是输出,其中Flavor_ScoreOverall_Score是在众多测试人员中应用df.groupby('beer_name').mean()的结果。我想为平均列右侧的每个Flavor_ScoreOverall_Score添加一列Std Deviation。功能很清楚,但如何添加列显示?当然,我可以生成一个数组并附加它(对吗?)但这似乎是一种麻烦的方式。

  Beer_name        Beer_Style     Flavor_Score         Overall_Score

  Coors               Light          2.0                    3.0
  Sam Adams           Dark           4.0                    4.5
  Becks               Light          3.5                    3.5
  Guinness            Dark           2.0                    2.2
  Heineken            Light          3.5                    3.7

2 个答案:

答案 0 :(得分:0)

您可以使用

df.groupby('Beer_name').agg(['mean','std'])

这计算每组的平均值和标准值。

例如,

import numpy as np
import pandas as pd
np.random.seed(2015)

N = 100
beers = ['Coors', 'Sam Adams', 'Becks', 'Guinness', 'Heineken']
style = ['Light', 'Dark', 'Light', 'Dark', 'Light']
df = pd.DataFrame({'Beer_name': np.random.choice(beers, N),
                   'Flavor_Score': np.random.uniform(0, 10, N),
                   'Overall_Score': np.random.uniform(0, 10, N)})
df['Beer_Style'] = df['Beer_name'].map(dict(zip(beers, style)))

print(df.groupby('Beer_name').agg(['mean','std']))

产量

          Flavor_Score           Overall_Score          
                  mean       std          mean       std
Beer_name                                               
Becks         5.779266  3.033939      6.995177  2.697787
Coors         6.521966  2.008911      4.066374  3.070217
Guinness      4.836690  2.644291      5.577085  2.466997
Heineken      4.622213  3.108812      6.372361  2.904932
Sam Adams     5.443279  3.311825      4.697961  3.164757

答案 1 :(得分:0)

groupby.agg([fun1, fun2])一步计算任意数量的函数:

from random import choice, random
import pandas as pd
import numpy as np

beers = ['Coors', 'Sam Adams', 'Becks', 'Guinness', 'Heineken']
styles = ['Light', 'Dark']

def generate():
    for i in xrange(0, 100):
        yield dict(beer=choice(beers), style=choice(styles), 
                   flavor_score=random()*10.0,  
                   overall_score=random()*10.0)

pd.options.display.float_format = ' {:,.1f}  '.format
df = pd.DataFrame(generate())
print df.groupby(['beer', 'style']).agg([np.mean, np.std])

=>

               flavor_score        overall_score       
                        mean    std          mean    std
beer      style                                         
Becks     Dark         7.1    3.6           1.9    1.6  
          Light        4.7    2.4           2.0    1.0  
Coors     Dark         5.5    3.2           2.6    1.1  
          Light        5.3    2.5           1.9    1.1  
Guinness  Dark         3.3    1.4           2.1    1.1  
          Light        4.7    3.6           2.2    1.1  
Heineken  Dark         4.4    3.0           2.7    1.0  
          Light        6.0    2.3           2.1    1.3  
Sam Adams Dark         3.4    3.0           1.7    1.2  
          Light        5.2    3.6           1.6    1.3  
  

如果我需要将用户定义的函数用于flavor_score列,该怎么办?让我说我想从flavor_score列中减去0.5(来自所有行,除了Heineken,我想要添加0.25)

grouped[grouped.beer != 'Heineken']['flavor_score']['mean'] - 0.5
grouped[grouped.beer == 'Heineken']['flavor_score']['mean'] + 0.25