Question

我正在尝试在pandas中实现行的均值标准化。找到pandas中每一行的平均值，减去特定行的每个元素的平均值。

代码：

df = pd.DataFrame(np.random.randint(0,100,size=(4, 5)), columns=list('ABCDE'))
print (df)


    A   B   C   D   E
0  53  77  34  51  41
1  44  46   6  70  31
2  52  22  95  88  13
3  77  18  88  86  20


x = pd.DataFrame(df.mean(axis = 1),columns=['mean'])

for index,rows in df.iterrows():
  for i in range(len(x)):
     df.loc[index] = df.loc[index] - x.loc[i]
print (df)


op:

     A   B   C   D   E
  0 NaN NaN NaN NaN NaN
  1 NaN NaN NaN NaN NaN
  2 NaN NaN NaN NaN NaN
  3 NaN NaN NaN NaN NaN

关于错误是什么的任何建议

Answer 1

您可以这样使用apply：

df = df.apply(lambda x: x - df.mean(axis = 1))

输出：

      A     B     C     D     E
0   1.8  25.8 -17.2  -0.2 -10.2
1   4.6   6.6 -33.4  30.6  -8.4
2  -2.0 -32.0  41.0  34.0 -41.0
3  19.2 -39.8  30.2  28.2 -37.8

Answer 2

您可以使用numpy：

以矢量化方式执行此计算

A = df.values
A = A - A.mean(axis=1)[:, None]

res = pd.DataFrame(A, index=df.index, columns=df.columns)

print(A)

array([[11, 31, 78, 55, 71],
       [89, 39, 39, 16, 45],
       [26, 10, 85, 68, 93],
       [55, 19, 78, 30, 41]])

print(res)

      A     B     C     D     E
0 -38.2 -18.2  28.8   5.8  21.8
1  43.4  -6.6  -6.6 -29.6  -0.6
2 -30.4 -46.4  28.6  11.6  36.6
3  10.4 -25.6  33.4 -14.6  -3.6

大熊猫的平均归一化

2 个答案: