Question

我有一列（价格），其值随时间变化。从一行到另一行，该值增加，减少或保持不变。我想记录该值达到新高的次数。

因此，我添加了一个列currenthigh，该列跟踪到目前为止的最高值。然后，我添加了另一列currenthigh_prev，这是currenthigh列移动了一行。这样，我可以比较两个值：当前值和上一个值。如果currenthigh > currenthigh_prev，那么我有一个新高，记录在newhighscount中。

我一直试图为此使用.cummax()。

df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = df['newhighscount'].cummax() + 1

我期望如此：

              datetime      last  currenthigh  currenthigh_shift  **newhighscount** 
31 2019-04-02 07:57:33  389.8400       389.84                NaN              0 
32 2019-04-02 07:57:33  389.8400       389.84             389.84              0 
33 2019-04-02 07:57:33  389.8700       389.87             389.84              **1** 
34 2019-04-02 07:57:33  389.8800       389.88             389.87              **2** 
35 2019-04-02 07:57:33  389.9000       389.90             389.88              **3** 
36 2019-04-02 07:57:33  389.9600       389.96             389.90              **4** 
37 2019-04-02 07:57:35  389.9000       389.96             389.96              **4** 
38 2019-04-02 07:57:36  389.9000       389.96             389.96              **4** 
39 2019-04-02 08:00:00  389.3603       389.96             389.96              **4** 
40 2019-04-02 08:00:00  388.8500       389.96             389.96              **4** 
41 2019-04-02 08:00:00  390.0000       390.00             389.96              **5** 
42 2019-04-02 08:00:01  389.7452       390.00             390.00              **5** 
43 2019-04-02 08:00:01  389.4223       390.00             390.00              5 
44 2019-04-02 08:00:01  389.8000       390.00             390.00              5

我正在得到这个：

              datetime      last  currenthigh  currenthigh_shift  newhighscount 
31 2019-04-02 07:57:33  389.8400       389.84                NaN              0 
32 2019-04-02 07:57:33  389.8400       389.84             389.84              0 
33 2019-04-02 07:57:33  389.8700       389.87             389.84              1 
34 2019-04-02 07:57:33  389.8800       389.88             389.87              1 
35 2019-04-02 07:57:33  389.9000       389.90             389.88              1 
36 2019-04-02 07:57:33  389.9600       389.96             389.90              1 
37 2019-04-02 07:57:35  389.9000       389.96             389.96              0 
38 2019-04-02 07:57:36  389.9000       389.96             389.96              0 
39 2019-04-02 08:00:00  389.3603       389.96             389.96              0 
40 2019-04-02 08:00:00  388.8500       389.96             389.96              0 
41 2019-04-02 08:00:00  390.0000       390.00             389.96              1 
42 2019-04-02 08:00:01  389.7452       390.00             390.00              0 
43 2019-04-02 08:00:01  389.4223       390.00             390.00              0 
44 2019-04-02 08:00:01  389.8000       390.00             390.00              0

基本上，df['newhighscount'].cummax()似乎什么也不返回。

Answer 1

df['newhighscount'] = df['last'].cummax().diff().gt(0).cumsum()

这将计算最后一列的累计最大值，计算出差异（cummax_t-cummax_ {t-1}），检查差异是否大于零，并计算该次数为真的次数。

Answer 2

您要标记唯一的'currenthigh'值。有很多方法可以做到这一点：

`ngroup`

df['NewCount'] = df.groupby('currenthigh', sort=False).ngroup()

`rank`：

因为cummax被保证单调增加，所以它将在这里工作。

df['NewCount'] = (df.currenthigh.rank(method='dense')-1).astype(int)

`map`

import pandas as pd

arr = pd.Series.unique(df.currenthigh) # Preserves order
df['NewCount'] = df.currenthigh.map(dict((arr[i], i) for i in range(len(arr))))

输出：

                         last  currenthigh  NewCount
datetime                                            
2019-04-02 07:57:33  389.8400       389.84         0
2019-04-02 07:57:33  389.8400       389.84         0
2019-04-02 07:57:33  389.8700       389.87         1
2019-04-02 07:57:33  389.8800       389.88         2
2019-04-02 07:57:33  389.9000       389.90         3
2019-04-02 07:57:33  389.9600       389.96         4
2019-04-02 07:57:35  389.9000       389.96         4
2019-04-02 07:57:36  389.9000       389.96         4
2019-04-02 08:00:00  389.3603       389.96         4
2019-04-02 08:00:00  388.8500       389.96         4
2019-04-02 08:00:00  390.0000       390.00         5
2019-04-02 08:00:01  389.7452       390.00         5
2019-04-02 08:00:01  389.4223       390.00         5
2019-04-02 08:00:01  389.8000       390.00         5

Answer 3

编辑：根据您的数据，下面的单个命令就足够了

df['newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()

原始：
您的逻辑仍然有效，但它不如其他答案那么优雅。它只需要一点扭曲。

In [983]: df
Out[983]:
               datetime      last  currenthigh  currenthigh_shift   newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN               0
32 2019-04-02  07:57:33  389.8400       389.84             389.84               0
33 2019-04-02  07:57:33  389.8700       389.87             389.84               0
34 2019-04-02  07:57:33  389.8800       389.88             389.87               0
35 2019-04-02  07:57:33  389.9000       389.90             389.88               0
36 2019-04-02  07:57:33  389.9600       389.96             389.90               0
37 2019-04-02  07:57:35  389.9000       389.96             389.96               0
38 2019-04-02  07:57:36  389.9000       389.96             389.96               0
39 2019-04-02  08:00:00  389.3603       389.96             389.96               0
40 2019-04-02  08:00:00  388.8500       389.96             389.96               0
41 2019-04-02  08:00:00  390.0000       390.00             389.96               0
42 2019-04-02  08:00:01  389.7452       390.00             390.00               0
43 2019-04-02  08:00:01  389.4223       390.00             390.00               0
44 2019-04-02  08:00:01  389.8000       390.00             390.00               0

In [985]: df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()
In [989]: df['newhighscount'] = df['newhighscount'].cummax()
In [990]: df
Out[990]:
               datetime      last  currenthigh  currenthigh_shift  newhighscount
31 2019-04-02  07:57:33  389.8400       389.84                NaN              0
32 2019-04-02  07:57:33  389.8400       389.84             389.84              0
33 2019-04-02  07:57:33  389.8700       389.87             389.84              1
34 2019-04-02  07:57:33  389.8800       389.88             389.87              2
35 2019-04-02  07:57:33  389.9000       389.90             389.88              3
36 2019-04-02  07:57:33  389.9600       389.96             389.90              4
37 2019-04-02  07:57:35  389.9000       389.96             389.96              4
38 2019-04-02  07:57:36  389.9000       389.96             389.96              4
39 2019-04-02  08:00:00  389.3603       389.96             389.96              4
40 2019-04-02  08:00:00  388.8500       389.96             389.96              4
41 2019-04-02  08:00:00  390.0000       390.00             389.96              5
42 2019-04-02  08:00:01  389.7452       390.00             390.00              5
43 2019-04-02  08:00:01  389.4223       390.00             390.00              5
44 2019-04-02  08:00:01  389.8000       390.00             390.00              5

如何增加累积最大值

3 个答案:

`ngroup`

`rank`：

`map`

输出：

如何增加累积最大值

3 个答案:

ngroup

rank：

map

输出：

`ngroup`

`rank`：

`map`