如何重新组合数据框并累积colume的值?

时间:2017-10-20 04:19:57

标签: python pandas dataframe

我有一个csv文件,如下所示:

date                     price       volume
2017-10-17 01:00:11.031  51.91       1
2017-10-17 01:00:11.828  51.91       1
2017-10-17 01:00:12.640  51.91       1
2017-10-17 01:00:13.140  51.90      -9
2017-10-17 01:00:15.328  51.90      -5
2017-10-17 01:00:16.531  51.90       1
2017-10-17 01:00:16.531  51.89      -2
2017-10-17 01:00:19.937  51.90       1
2017-10-17 01:00:24.546  51.90       1
2017-10-17 01:00:25.250  51.90       1
2017-10-17 01:00:32.843  51.89      -9
2017-10-17 01:00:42.859  51.89      -5
2017-10-17 01:00:43.453  51.89      -1
2017-10-17 01:00:43.546  51.90       1
2017-10-17 01:00:45.953  51.90       7
...

我想制作一个数据框,显示每个价格水平每5分钟累计累计数量。

例如,如果2017-10-17 00:00~2017-10-17 00:05之间的最高价和最低价分别是51.21和51.11,结果将是:

datetime                 price       pos_volume     neg_volume
2017-10-17 00:00         51.21       3              4
                         51.20       21             23
                         51.19       44             21
                         51.18       31             33
                         ...
                         51.14       14             21
                         51.13       30             29
                         51.12       2              3
                         51.11       5              1

有两列可区分正负卷。

如果我使用很多条件循环,我想我可以这样做,但我想知道是否有更多的pythonic简单方法来做到这一点。感谢您阅读本文!

1 个答案:

答案 0 :(得分:2)

您可以使用import { Inject } from '@angular/core'; import { DOCUMENT } from '@angular/platform-browser'; export class Component { constructor( @Inject(DOCUMENT) private document: Document) { } this.route.params.subscribe(item => { this.item = item; // successfully retrieves param content this.document.body.scrollTop = 0; }) } 分隔正值和负值,然后使用带有索引为np.where的数据透视表,其频率为5分钟,然后使用grouper作为aggfunc(它忽略南方价值观)。

count

输出:

                           neg_vol  pos_vol
date                price                  
2017-10-17 01:00:00 51.89        4        0
                    51.90        2        6
                    51.91        0        3

对于排序索引,您可以使用df['pos_vol'] = np.where(df['volume']>0,df['volume'],np.nan) df['neg_vol'] = np.where(df['volume']<0,df['volume'],np.nan) ndf = df.pivot_table(values=['pos_vol','neg_vol'],index=[pd.Grouper(key='date', freq='5min'),'price'],aggfunc='count')

输出:

                          neg_vol  pos_vol
date                price                  
2017-10-17 01:00:00 51.91        0        3
                    51.90        2        6
                    51.89        4        0