Question

我想在pandas数据帧上应用简单操作来获取每行的百分比变化。我有这样的数据

colA    colB    colC    colD
39.5    41      41.5    40.5
15.5    17      17.5    16.5
21.5    23      23.5    22.5
40.5    42      42.5    41.5
9.5     11      11.5    10.5
26.5    28      28.5    27.5

我的代码

import pandas as pd
import numpy as np

df = pd.read_csv('data.csv')
print(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)

df['change'] = df.apply(lambda x: (((x.colA/ np.mean(x.iloc[:,2:], axis=1))-1)*100))

当我打印结果时，它给了我我想要的东西，但是当我df.apply创建一个列时，它给了我以下错误

Traceback (most recent call last):
  File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
  File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8543)
TypeError: an integer is required

有什么建议吗？我做错了什么？

Answer 1

您的函数输出为Series，其索引与df相同，因此只需将其指定给新列：

df['change'] = (((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)
print (df)
   colA  colB  colC  colD     change
0  39.5    41  41.5  40.5  -3.658537
1  15.5    17  17.5  16.5  -8.823529
2  21.5    23  23.5  22.5  -6.521739
3  40.5    42  42.5  41.5  -3.571429
4   9.5    11  11.5  10.5 -13.636364
5  26.5    28  28.5  27.5  -5.357143

或使用assign：

df = df.assign(change=(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100))
print (df)
   colA  colB  colC  colD     change
0  39.5    41  41.5  40.5  -3.658537
1  15.5    17  17.5  16.5  -8.823529
2  21.5    23  23.5  22.5  -6.521739
3  40.5    42  42.5  41.5  -3.571429
4   9.5    11  11.5  10.5 -13.636364
5  26.5    28  28.5  27.5  -5.357143

也可以仅使用pandas功能 - div + iloc + mean + sub + mul：

df['change'] = df.colA.div(df.iloc[:,2:].mean(1)).sub(1).mul(100)
print (df)
   colA  colB  colC  colD     change
0  39.5    41  41.5  40.5  -3.658537
1  15.5    17  17.5  16.5  -8.823529
2  21.5    23  23.5  22.5  -6.521739
3  40.5    42  42.5  41.5  -3.571429
4   9.5    11  11.5  10.5 -13.636364
5  26.5    28  28.5  27.5  -5.357143

使用pandas在Python中计算新列

1 个答案: