一次在数据框列上应用多个函数

时间:2017-07-31 10:01:38

标签: python pandas dataframe

如何在Dataframe上应用多个函数:

我想做类似的事情:

features_df[features_columns].apply(lambda x: np.mean(x), lambda x: np.std(x), lambda x: np.skew(x))

由于

1 个答案:

答案 0 :(得分:4)

我认为您需要DataFrame.aggregatepandas 0.20.0+)或DataFrame.apply

features_df[features_columns].agg(lambda x: pd.Series([np.mean(x),np.std(x)]))

features_df[features_columns].apply(lambda x: pd.Series([np.mean(x),np.std(x)]))
df = features_df[features_columns].agg(['mean', 'std', 'skew'])

df = features_df[features_columns].apply(['mean', 'std', 'skew'])

样品:

features_df = pd.DataFrame({'A':list('abcdef'),
                           'B':[4,5,4,5,5,4],
                           'C':[7,8,9,4,2,3],
                           'D':[1,3,5,7,1,0],
                           'E':[5,3,6,9,2,4],
                           'F':list('aaabbb')})

print (features_df)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b
features_columns = ['B','C']


print (features_df[features_columns].agg(lambda x: pd.Series([np.mean(x),np.std(x)])))
     B         C
0  4.5  5.500000
1  0.5  2.629956

print (features_df[features_columns].apply(lambda x: pd.Series([np.mean(x),np.std(x)])))
     B         C
0  4.5  5.500000
1  0.5  2.629956

print (features_df[features_columns].agg(['mean', 'std', 'skew']))
             B         C
mean  4.500000  5.500000
std   0.547723  2.880972
skew  0.000000  0.000000

print (features_df[features_columns].apply(['mean', 'std', 'skew']))
             B         C
mean  4.500000  5.500000
std   0.547723  2.880972
skew  0.000000  0.000000

std函数在numpypandas中有不同的默认ddof,因此输出不同。

同样np.skew返回:

  

AttributeError:module' numpy'没有属性'倾斜'