我想在pandas数据帧上应用简单操作来获取每行的百分比变化。我有这样的数据
colA colB colC colD
39.5 41 41.5 40.5
15.5 17 17.5 16.5
21.5 23 23.5 22.5
40.5 42 42.5 41.5
9.5 11 11.5 10.5
26.5 28 28.5 27.5
我的代码
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv')
print(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)
df['change'] = df.apply(lambda x: (((x.colA/ np.mean(x.iloc[:,2:], axis=1))-1)*100))
当我打印结果时,它给了我我想要的东西,但是当我df.apply创建一个列时,它给了我以下错误
Traceback (most recent call last):
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)
File "pandas\src\hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:8543)
TypeError: an integer is required
有什么建议吗?我做错了什么?
答案 0 :(得分:2)
您的函数输出为Series
,其索引与df
相同,因此只需将其指定给新列:
df['change'] = (((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100)
print (df)
colA colB colC colD change
0 39.5 41 41.5 40.5 -3.658537
1 15.5 17 17.5 16.5 -8.823529
2 21.5 23 23.5 22.5 -6.521739
3 40.5 42 42.5 41.5 -3.571429
4 9.5 11 11.5 10.5 -13.636364
5 26.5 28 28.5 27.5 -5.357143
或使用assign
:
df = df.assign(change=(((df.colA/ np.mean(df.iloc[:,2:], axis=1))-1)*100))
print (df)
colA colB colC colD change
0 39.5 41 41.5 40.5 -3.658537
1 15.5 17 17.5 16.5 -8.823529
2 21.5 23 23.5 22.5 -6.521739
3 40.5 42 42.5 41.5 -3.571429
4 9.5 11 11.5 10.5 -13.636364
5 26.5 28 28.5 27.5 -5.357143
也可以仅使用pandas功能 - div
+ iloc
+ mean
+ sub
+
mul
:
df['change'] = df.colA.div(df.iloc[:,2:].mean(1)).sub(1).mul(100)
print (df)
colA colB colC colD change
0 39.5 41 41.5 40.5 -3.658537
1 15.5 17 17.5 16.5 -8.823529
2 21.5 23 23.5 22.5 -6.521739
3 40.5 42 42.5 41.5 -3.571429
4 9.5 11 11.5 10.5 -13.636364
5 26.5 28 28.5 27.5 -5.357143