Question

我的目标是执行简单的groupby，然后使用.transform('mean')将组平均值存储为新列。然后事情变得复杂了。问题是我真正想要的是平均n-1值，其中'n'是属于每个组的行数。示例数据，其中 RESULT 列是我想要的输出：

import pandas as pd

list_of_tuples = [('A', 3, 4.5),
                  ('A', 2, 4.75),
                  ('A', 5, 4),
                  ('A', 4, 4.25),
                  ('A', 7, 3.5),
                  ('B', 6, 6.75),
                  ('B', 9, 6),
                  ('B', 8, 6.25),
                  ('B', 4, 7.25),
                  ('B', 6, 6.75)]

df = pd.DataFrame.from_records(data=list_of_tuples, columns=['ID', 'VALUE', 'RESULT'])

>>> df
  ID  VALUE  RESULT
0  A      3    4.50
1  A      2    4.75
2  A      5    4.00
3  A      4    4.25
4  A      7    3.50
5  B      6    6.75
6  B      9    6.00
7  B      8    6.25
8  B      4    7.25
9  B      6    6.75

您可以看到，在第一行中， RESULT 的值是[2,5,4,7]的平均值，即4.5。同样，最后一行的 RESULT 值是[6,9,8,4]的平均值，即6.75。

因此，对于每一行，结果的值应为 VALUE 的组平均值（ ID 分组）该特定行的 VALUE 中的数字。

Answer 1

从上面的评论中得到答案。

list_of_tuples = [('A', 3, 4.5),
                  ('A', 2, 4.75),
                  ('A', 5, 4),
                  ('A', 4, 4.25),
                  ('A', 7, 3.5),
                  ('B', 6, 6.75),
                  ('B', 9, 6),
                  ('B', 8, 6.25),
                  ('B', 4, 7.25),
                  ('B', 6, 6.75)]

df = pd.DataFrame(list_of_tuples)

df.drop(2, axis = 1, inplace = True)

n = df.groupby(0)[1].transform('count')
m = df.groupby(0)[1].transform('mean')
df['result'] = (m*n - df[1])/(n-1)

df

如何从pandas中的groupby（）。transform（）中排除单行？

1 个答案: