根据列的值连续性,按行拆分/分组pandas DataFrame

时间:2017-04-21 18:28:21

标签: python pandas numpy dataframe stocks

如果价格超过移动平均线(由"趋势"列中的更改给出),我需要一些帮助来创建不同的行组。我将通过例子来解释它。以下是我的数据:

            close        avg      diff  trend
date                                        
2017-02-22  13.78  13.578652  0.201348     1
2017-02-23  13.80  13.580854  0.219146     1
2017-02-24  13.67  13.581741  0.088259     1
2017-03-01  13.65  13.582421  0.067579     1
2017-03-02  13.67  13.583292  0.086708     1
2017-03-03  13.60  13.583458  0.016542     1
2017-03-06  13.40  13.581633 -0.181633    -1
2017-03-07  13.48  13.580621 -0.100621    -1
2017-03-08  13.25  13.577332 -0.327332    -1
2017-03-09  12.95  13.571090 -0.621090    -1
2017-03-10  13.40  13.569387 -0.169387    -1
2017-03-13  13.35  13.567204 -0.217204    -1
2017-03-14  13.19  13.563451 -0.373451    -1
2017-03-15  13.85  13.566302  0.283698     1
2017-03-16  13.91  13.569722  0.340278     1
2017-03-17  13.40  13.568033 -0.168033    -1
2017-03-21  13.19  13.565079 -0.375079    -1
2017-03-22  12.95  13.558959 -0.608959    -1
2017-03-23  12.82  13.551606 -0.731606    -1
2017-03-24  12.93  13.545421 -0.615421    -1
2017-03-27  12.90  13.538999 -0.638999    -1
2017-03-28  13.20  13.535626 -0.335626    -1
2017-03-29  13.22  13.532485 -0.312485    -1
2017-03-30  13.20  13.529177 -0.329177    -1
2017-03-31  13.28  13.526698 -0.246698    -1
2017-04-03  13.27  13.524143 -0.254143    -1
2017-04-04  13.20  13.520918 -0.320918    -1
2017-04-05  13.30  13.518720 -0.218720    -1
2017-04-06  13.73  13.520822  0.209178     1
2017-04-07  13.61  13.521710  0.088290     1
2017-04-10  13.78  13.524280  0.255720     1
2017-04-11  13.66  13.525630  0.134370     1

当趋势列等于1时,我想要不同的组,趋势的不同组等于-1。像这样:

            close        avg      diff  trend
date                                        
------- GROUP 1 for trend == 1: --------
2017-02-22  13.78  13.578652  0.201348     1
2017-02-23  13.80  13.580854  0.219146     1
2017-02-24  13.67  13.581741  0.088259     1
2017-03-01  13.65  13.582421  0.067579     1
2017-03-02  13.67  13.583292  0.086708     1
2017-03-03  13.60  13.583458  0.016542     1
------- GROUP 2 for trend == 1: --------
2017-03-15  13.85  13.566302  0.283698     1
2017-03-16  13.91  13.569722  0.340278     1
------- GROUP 3 for trend == 1: --------
2017-04-06  13.73  13.520822  0.209178     1
2017-04-07  13.61  13.521710  0.088290     1
2017-04-10  13.78  13.524280  0.255720     1
2017-04-11  13.66  13.525630  0.134370     1

趋势== -1:

也一样
            close        avg      diff  trend
date                                        
------- GROUP 1 for trend == -1: --------
2017-03-06  13.40  13.581633 -0.181633    -1
2017-03-07  13.48  13.580621 -0.100621    -1
2017-03-08  13.25  13.577332 -0.327332    -1
2017-03-09  12.95  13.571090 -0.621090    -1
2017-03-10  13.40  13.569387 -0.169387    -1
2017-03-13  13.35  13.567204 -0.217204    -1
2017-03-14  13.19  13.563451 -0.373451    -1
------- GROUP 2 for trend == -1: --------
2017-03-17  13.40  13.568033 -0.168033    -1
2017-03-21  13.19  13.565079 -0.375079    -1
2017-03-22  12.95  13.558959 -0.608959    -1
2017-03-23  12.82  13.551606 -0.731606    -1
2017-03-24  12.93  13.545421 -0.615421    -1
2017-03-27  12.90  13.538999 -0.638999    -1
2017-03-28  13.20  13.535626 -0.335626    -1
2017-03-29  13.22  13.532485 -0.312485    -1
2017-03-30  13.20  13.529177 -0.329177    -1
2017-03-31  13.28  13.526698 -0.246698    -1
2017-04-03  13.27  13.524143 -0.254143    -1
2017-04-04  13.20  13.520918 -0.320918    -1
2017-04-05  13.30  13.518720 -0.218720    -1

任何有关如何实现这一目标的提示都将受到欢迎。如果有程序解决方案(不使用循环),我会很高兴。

1 个答案:

答案 0 :(得分:2)

使用shift()cumsum()以及bools等于1的事实,您可以构建一个检查连续性的组:

<强>代码:

df.groupby((df.trend != df.trend.shift()).cumsum())

测试代码:

df = pd.read_fwf(StringIO(u"""
    date         close        avg      diff  trend
    2017-03-01  13.65  13.582421  0.067579     1
    2017-03-02  13.67  13.583292  0.086708     1
    2017-03-03  13.60  13.583458  0.016542     1
    2017-03-06  13.40  13.581633 -0.181633    -1
    2017-03-07  13.48  13.580621 -0.100621    -1
    2017-03-08  13.25  13.577332 -0.327332    -1
    2017-03-09  12.95  13.571090 -0.621090     1
    2017-03-10  13.40  13.569387 -0.169387     1
    2017-03-13  13.35  13.567204 -0.217204    -1
    2017-03-14  13.19  13.563451 -0.373451    -1
    2017-03-15  13.85  13.566302  0.283698     1
    2017-03-16  13.91  13.569722  0.340278     1
    2017-03-17  13.40  13.568033 -0.168033     1"""),
                 header=1).set_index(['date'])

for group in df.groupby((df.trend != df.trend.shift()).cumsum()):
    print(group)

<强>结果:

(1,             close        avg      diff  trend
date                                         
2017-03-01  13.65  13.582421  0.067579      1
2017-03-02  13.67  13.583292  0.086708      1
2017-03-03  13.60  13.583458  0.016542      1)
(2,             close        avg      diff  trend
date                                         
2017-03-06  13.40  13.581633 -0.181633     -1
2017-03-07  13.48  13.580621 -0.100621     -1
2017-03-08  13.25  13.577332 -0.327332     -1)
(3,             close        avg      diff  trend
date                                         
2017-03-09  12.95  13.571090 -0.621090      1
2017-03-10  13.40  13.569387 -0.169387      1)
(4,             close        avg      diff  trend
date                                         
2017-03-13  13.35  13.567204 -0.217204     -1
2017-03-14  13.19  13.563451 -0.373451     -1)
(5,             close        avg      diff  trend
date                                         
2017-03-15  13.85  13.566302  0.283698      1
2017-03-16  13.91  13.569722  0.340278      1
2017-03-17  13.40  13.568033 -0.168033      1)