如何增加累积最大值

时间:2019-04-04 17:08:04

标签: python pandas

我有一列(价格),其值随时间变化。从一行到另一行,该值增加,减少或保持不变。我想记录该值达到新高的次数。

因此,我添加了一个列currenthigh,该列跟踪到目前为止的最高值。然后,我添加了另一列currenthigh_prev,这是currenthigh列移动了一行。这样,我可以比较两个值:当前值和上一个值。如果currenthigh > currenthigh_prev,那么我有一个新高,记录在newhighscount中。

我一直试图为此使用.cummax()

df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = df['newhighscount'].cummax() + 1

我期望如此:

              datetime      last  currenthigh  currenthigh_shift  **newhighscount** 
31 2019-04-02 07:57:33  389.8400       389.84                NaN              0 
32 2019-04-02 07:57:33  389.8400       389.84             389.84              0 
33 2019-04-02 07:57:33  389.8700       389.87             389.84              **1** 
34 2019-04-02 07:57:33  389.8800       389.88             389.87              **2** 
35 2019-04-02 07:57:33  389.9000       389.90             389.88              **3** 
36 2019-04-02 07:57:33  389.9600       389.96             389.90              **4** 
37 2019-04-02 07:57:35  389.9000       389.96             389.96              **4** 
38 2019-04-02 07:57:36  389.9000       389.96             389.96              **4** 
39 2019-04-02 08:00:00  389.3603       389.96             389.96              **4** 
40 2019-04-02 08:00:00  388.8500       389.96             389.96              **4** 
41 2019-04-02 08:00:00  390.0000       390.00             389.96              **5** 
42 2019-04-02 08:00:01  389.7452       390.00             390.00              **5** 
43 2019-04-02 08:00:01  389.4223       390.00             390.00              5 
44 2019-04-02 08:00:01  389.8000       390.00             390.00              5 

我正在得到这个:

              datetime      last  currenthigh  currenthigh_shift  newhighscount 
31 2019-04-02 07:57:33  389.8400       389.84                NaN              0 
32 2019-04-02 07:57:33  389.8400       389.84             389.84              0 
33 2019-04-02 07:57:33  389.8700       389.87             389.84              1 
34 2019-04-02 07:57:33  389.8800       389.88             389.87              1 
35 2019-04-02 07:57:33  389.9000       389.90             389.88              1 
36 2019-04-02 07:57:33  389.9600       389.96             389.90              1 
37 2019-04-02 07:57:35  389.9000       389.96             389.96              0 
38 2019-04-02 07:57:36  389.9000       389.96             389.96              0 
39 2019-04-02 08:00:00  389.3603       389.96             389.96              0 
40 2019-04-02 08:00:00  388.8500       389.96             389.96              0 
41 2019-04-02 08:00:00  390.0000       390.00             389.96              1 
42 2019-04-02 08:00:01  389.7452       390.00             390.00              0 
43 2019-04-02 08:00:01  389.4223       390.00             390.00              0 
44 2019-04-02 08:00:01  389.8000       390.00             390.00              0 

基本上,df['newhighscount'].cummax()似乎什么也不返回。

3 个答案:

答案 0 :(得分:2)

df['newhighscount'] = df['last'].cummax().diff().gt(0).cumsum()

这将计算最后一列的累计最大值,计算出差异(cummax_t-cummax_ {t-1}),检查差异是否大于零,并计算该次数为真的次数。

答案 1 :(得分:0)

您要标记唯一的'currenthigh'值。有很多方法可以做到这一点:

ngroup

df['NewCount'] = df.groupby('currenthigh', sort=False).ngroup()

rank

因为cummax被保证单调增加,所以它将在这里工作。

df['NewCount'] = (df.currenthigh.rank(method='dense')-1).astype(int)

map

import pandas as pd

arr = pd.Series.unique(df.currenthigh) # Preserves order
df['NewCount'] = df.currenthigh.map(dict((arr[i], i) for i in range(len(arr))))

输出:

                         last  currenthigh  NewCount
datetime                                            
2019-04-02 07:57:33  389.8400       389.84         0
2019-04-02 07:57:33  389.8400       389.84         0
2019-04-02 07:57:33  389.8700       389.87         1
2019-04-02 07:57:33  389.8800       389.88         2
2019-04-02 07:57:33  389.9000       389.90         3
2019-04-02 07:57:33  389.9600       389.96         4
2019-04-02 07:57:35  389.9000       389.96         4
2019-04-02 07:57:36  389.9000       389.96         4
2019-04-02 08:00:00  389.3603       389.96         4
2019-04-02 08:00:00  388.8500       389.96         4
2019-04-02 08:00:00  390.0000       390.00         5
2019-04-02 08:00:01  389.7452       390.00         5
2019-04-02 08:00:01  389.4223       390.00         5
2019-04-02 08:00:01  389.8000       390.00         5

答案 2 :(得分:0)

编辑 根据您的数据,下面的单个命令就足够了

df['newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()

原始
您的逻辑仍然有效,但它不如其他答案那么优雅。它只需要一点扭曲。

In [983]: df
Out[983]:
               datetime      last  currenthigh  currenthigh_shift   newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN               0
32 2019-04-02  07:57:33  389.8400       389.84             389.84               0
33 2019-04-02  07:57:33  389.8700       389.87             389.84               0
34 2019-04-02  07:57:33  389.8800       389.88             389.87               0
35 2019-04-02  07:57:33  389.9000       389.90             389.88               0
36 2019-04-02  07:57:33  389.9600       389.96             389.90               0
37 2019-04-02  07:57:35  389.9000       389.96             389.96               0
38 2019-04-02  07:57:36  389.9000       389.96             389.96               0
39 2019-04-02  08:00:00  389.3603       389.96             389.96               0
40 2019-04-02  08:00:00  388.8500       389.96             389.96               0
41 2019-04-02  08:00:00  390.0000       390.00             389.96               0
42 2019-04-02  08:00:01  389.7452       390.00             390.00               0
43 2019-04-02  08:00:01  389.4223       390.00             390.00               0
44 2019-04-02  08:00:01  389.8000       390.00             390.00               0

In [985]: df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()
In [989]: df['newhighscount'] = df['newhighscount'].cummax()
In [990]: df
Out[990]:
               datetime      last  currenthigh  currenthigh_shift  newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN              0
32 2019-04-02  07:57:33  389.8400       389.84             389.84              0
33 2019-04-02  07:57:33  389.8700       389.87             389.84              1
34 2019-04-02  07:57:33  389.8800       389.88             389.87              2
35 2019-04-02  07:57:33  389.9000       389.90             389.88              3
36 2019-04-02  07:57:33  389.9600       389.96             389.90              4
37 2019-04-02  07:57:35  389.9000       389.96             389.96              4
38 2019-04-02  07:57:36  389.9000       389.96             389.96              4
39 2019-04-02  08:00:00  389.3603       389.96             389.96              4
40 2019-04-02  08:00:00  388.8500       389.96             389.96              4
41 2019-04-02  08:00:00  390.0000       390.00             389.96              5
42 2019-04-02  08:00:01  389.7452       390.00             390.00              5
43 2019-04-02  08:00:01  389.4223       390.00             390.00              5
44 2019-04-02  08:00:01  389.8000       390.00             390.00              5