I want to calculate some features for a collection of time series, or columns if you want.
I know I can use pandas.DataFrame.agg
for that but I can't seem to able to give custom names to the resulting rolumns/rows of the DataFrame.
The code below does what I want:
Note: This is just an example. I know I can pass
['sum', 'std', 'mean']
etc. to agg but I'd like to do this for arbitrary aggregation functions.
import pandas as pd
import numpy as np
n_series = 5
n_time_samples = 10
data = np.random.rand(n_time_samples, n_series)
columns = ['s{:d}'.format(i) for i in range(n_series)]
df = pd.DataFrame(data, columns=columns)
df.agg([lambda x: x.mean(),
lambda x: x.std()], axis=0).T
The result is a feature vector for each time series:
<lambda> <lambda>
s0 0.406411 0.330624
s1 0.446666 0.301839
s2 0.498958 0.159052
s3 0.613881 0.353684
s4 0.455623 0.287457
However, I'd like to have a proper name for the features. It is not possible to pass a dictionary in order to do that:
# Throws KeyError
df.agg({'f1': lambda x: x.mean(),
'f2': lambda x: x.std()}, axis=0).T
I know I can just rename the columns by setting df.columns
but I was wondering if I can solve this be using agg
only.
As a side note: setting axis=1
will also fail:
df.agg([lambda x: x.mean(),
lambda x: x.std()], axis=1).T
this will throw
TypeError: ("'list' object is not callable", 'occurred at index 0')
but
# Note transpose
df.T.agg([lambda x: x.mean(),
lambda x: x.std()], axis=0).T
will work?
答案 0 :(得分:0)
Here's one way.
In [1023]: def f1(x):
...: return x.mean()
...:
In [1024]: def f2(x):
...: return x.std()
...:
In [1025]: df.agg([f1, f2], axis=0).T
Out[1025]:
f1 f2
s0 0.593445 0.282322
s1 0.554996 0.247396
s2 0.441740 0.321923
s3 0.379589 0.295618
s4 0.602647 0.259439
To use lambda
funcs, set the __name__
In [1042]: f1_ = lambda x: x.mean()
In [1043]: f2_ = lambda x: x.std()
In [1044]: f1_.__name__ = 'f1x'
In [1045]: f2_.__name__ = 'f2x'
In [1046]: df.agg([f1_, f2_], axis=0).T
Out[1046]:
f1x f2x
s0 0.593445 0.282322
s1 0.554996 0.247396
s2 0.441740 0.321923
s3 0.379589 0.295618
s4 0.602647 0.259439