Question

我有一个数据框包含的股价为micro-seconds：

In [48]: fdf.iloc[:5]
Out[55]:
                                         #RIC    ...         Volume
Date-Time                                        ...
2019-03-05 09:30:06.283715885+08:00  .SSE100I    ...      8805000.0
2019-03-05 09:30:12.827067475+08:00  .SSE100I    ...      7843100.0
2019-03-05 09:30:18.388287730+08:00  .SSE100I    ...      7228800.0
2019-03-05 09:30:20.995625330+08:00  .SSE100I    ...      2471700.0
2019-03-05 09:30:25.450852863+08:00  .SSE100I    ...       929400.0

[5 rows x 7 columns]

In [56]: fdf.columns
Out[59]: Index(['#RIC', 'Domain', 'Date-Time', 'GMT Offset', 'Type', 'Price', 'Volume'], dtype='object')

我想按minute频率对这个数据帧进行子集化，并每分钟计算一些统计数据。这是我正在尝试的代码：

def min_stats(df):
  import ipdb; ipdb.set_trace(context=7)

fdf.resample('T').apply(df)

但是，即使fdf有7列，df中的min_stats还是pd.Series，仅包含第一列#RIC。我如何将所有列传递给df？

Answer 1

您可以尝试Resampler.agg：

fdf.resample('T').agg(['min','max'])

或者将GroupBy.agg与Grouper一起使用，也可以使用DataFrameGroupBy.describe：

fdf.groupby(pd.Grouper(freq='T')).agg(['min','max'])

fdf.groupby(pd.Grouper(freq='T')).describe()

但所有非数字列均被排除。

您的功能应类似于：

import ipdb; ipdb.set_trace(context=7)

def min_stats(x):
    print (x)


fdf.resample('T').apply(min_stats)

或者：

def min_stats(x):
    print (x)


fdf.groupby(pd.Grouper(freq='T')).apply(min_stats)

熊猫将时间序列的采样时间从微秒降低到分钟，并按分钟进行处理

1 个答案: