如何在Dataframe上应用多个函数:
我想做类似的事情:
features_df[features_columns].apply(lambda x: np.mean(x), lambda x: np.std(x), lambda x: np.skew(x))
由于
答案 0 :(得分:4)
我认为您需要DataFrame.aggregate
(pandas 0.20.0+
)或DataFrame.apply
:
features_df[features_columns].agg(lambda x: pd.Series([np.mean(x),np.std(x)]))
features_df[features_columns].apply(lambda x: pd.Series([np.mean(x),np.std(x)]))
df = features_df[features_columns].agg(['mean', 'std', 'skew'])
df = features_df[features_columns].apply(['mean', 'std', 'skew'])
样品:
features_df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (features_df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
features_columns = ['B','C']
print (features_df[features_columns].agg(lambda x: pd.Series([np.mean(x),np.std(x)])))
B C
0 4.5 5.500000
1 0.5 2.629956
print (features_df[features_columns].apply(lambda x: pd.Series([np.mean(x),np.std(x)])))
B C
0 4.5 5.500000
1 0.5 2.629956
print (features_df[features_columns].agg(['mean', 'std', 'skew']))
B C
mean 4.500000 5.500000
std 0.547723 2.880972
skew 0.000000 0.000000
print (features_df[features_columns].apply(['mean', 'std', 'skew']))
B C
mean 4.500000 5.500000
std 0.547723 2.880972
skew 0.000000 0.000000
std
函数在numpy
和pandas
中有不同的默认ddof
,因此输出不同。
同样np.skew
返回:
AttributeError:module' numpy'没有属性'倾斜'