Question

我有一个Pandas DataFrame如下：

   a      b      c      d
0  Apple  3      5      7
1  Banana 4      4      8
2  Cherry 7      1      3
3  Apple  3      4      7

我想按行对行进行分组＆＃39; a＆＃39;同时替换列中的值＆＃39; c＆＃39;通过分组行中的值的平均值，并添加另一列，其中列的值的std偏差为＆＃39; c＆＃39;其平均值已经计算出来。列＆＃39; b＆＃39;中的值或者＆＃39; d＆＃39;对于所有分组的行，它们是常量。因此，所需的输出将是：

   a      b      c      d      e
0  Apple  3      4.5    7      0.707107
1  Banana 4      4      8      0
2  Cherry 7      1      3      0

实现这一目标的最佳方法是什么？

Answer 1

您可以使用groupby-agg operation：

In [38]: result = df.groupby(['a'], as_index=False).agg(
                      {'c':['mean','std'],'b':'first', 'd':'first'})

然后重命名并重新排序列：

In [39]: result.columns = ['a','c','e','b','d']

In [40]: result.reindex(columns=sorted(result.columns))
Out[40]: 
        a  b    c  d         e
0   Apple  3  4.5  7  0.707107
1  Banana  4  4.0  8       NaN
2  Cherry  7  1.0  3       NaN

Pandas默认计算样本std。要计算人口std：

def pop_std(x):
    return x.std(ddof=0)

result = df.groupby(['a'], as_index=False).agg({'c':['mean',pop_std],'b':'first', 'd':'first'})

result.columns = ['a','c','e','b','d']
result.reindex(columns=sorted(result.columns))

产量

        a  b    c  d    e
0   Apple  3  4.5  7  0.5
1  Banana  4  4.0  8  0.0
2  Cherry  7  1.0  3  0.0

Groupby Pandas DataFrame并计算一列的mean和stdev，并使用reset_index将std添加为新列

1 个答案: