Question

我在以下形式的pandas中有一个数据框：

                       timestap  price    bid    ask  volume
0       2014-06-04 12:11:03.058  21.11  41.12   0.00       0
1       2014-06-04 12:11:03.386  21.17  41.18   0.00       0
2       2014-06-04 12:11:03.435  21.20  41.21   0.00       0
3       2014-06-04 12:11:04.125  21.17  41.19   0.00       0
4       2014-06-04 12:11:04.245  21.16  41.17   0.00       0

我应该做什么：

设置时间戳而不是索引
使用groupby重新采样时间戳（时间戳应按秒分组）
在相同的日期和时间显示每列的第一个和最后一个数字

最终的数据框应如下所示：

                            price           bid         ask    volume
           timestap    min    max    min    max   min   max  min  max
2014-06-04 12:11:03  21.11  21.20  41.12  41.21  0.00  0.00    0    0
2014-06-04 12:11:04  21.16  21.17  41.17  41.19  0.00  0.00    0    0

我现在拥有的：

import pandas as pd
data = pd.read_csv('table.csv')
data.columns = ['timestap', 'bid', 'ask', 'price', 'volume']
data = data.set_index(data.time)
bydate = data.groupby(pd.TimeGrouper(freq='s'))

我的代码出了问题而且我没有想法，如何完成最后一项任务。你能救我吗？

Answer 1

使用agg函数，并使用resample或pd.TimeGrouper向其传递聚合函数列表：

# make sure the timestamp column is of date time type
df['timestap'] = pd.to_datetime(df['timestap'])

df.set_index('timestap').resample("s").agg(["min", "max"])

或使用TimeGrouper：

df.set_index('timestap').groupby(pd.TimeGrouper(freq='s')).agg(['min', 'max'])

Pandas Dataframe使用groupby重新采样

1 个答案: