基于偶发触发器限制值

时间:2015-06-27 22:13:28

标签: python pandas triggers

我想提前提到这个问题非常接近问题#30990147

Capping values after a trigger level in a different variable _after GroupBy

不同之处在于,在这种情况下,当我们进一步远离数据中心时,触发加盖的变量不会继续触发。在问题30990147中,一旦触发器被触发,所有后续值也将触发相同的上限。

在这种情况下,我需要检查从min(dist)向两个方向命中的触发器。在下面的例子中,我使用了min(dist)两侧的第一个负数的触发器。 min(dist)在第一个" city / date"的索引= 7。基。

我手动添加了列is_capped和new_b来说明触发级别发生的位置

In [6]:

df
Out[6]:
       City       date  dist     a   b is_capped  new_b
0   Chicago  5/25/2015  6.55  0.10  36      True     37
1   Chicago  5/25/2015  3.93  0.16  21      True     37
2   Chicago  5/25/2015  3.27  0.06  32      True     37
3   Chicago  5/25/2015  2.62 -0.28  35      True     37
4   Chicago  5/25/2015  1.96  0.09  37     False     37
5   Chicago  5/25/2015  1.31  0.04  39     False     39
6   Chicago  5/25/2015  0.65  0.02  34     False     34
7   Chicago  5/25/2015  0.03  0.09  23     False     23
8   Chicago  5/25/2015  0.58  0.03  36     False     36
9   Chicago  5/25/2015  1.16  0.06  35     False     35
10  Chicago  5/25/2015  2.31  0.05  36     False     36
11  Chicago  5/25/2015  2.89 -0.41  20      True     36
12  Chicago  5/25/2015  3.47 -0.38  35      True     36
13  Chicago  6/16/2015  6.55  0.30  36      True     37
14  Chicago  6/16/2015  3.93  0.16  21      True     37
15  Chicago  6/16/2015  3.27  0.06  32      True     37
16  Chicago  6/16/2015  2.62 -0.28  35      True     37
17  Chicago  6/16/2015  1.96  0.09  37     False     37
18  Chicago  6/16/2015  1.31  0.04  39     False     39
19  Chicago  6/16/2015  0.65  0.02  34     False     34
20  Chicago  6/16/2015  0.03  0.09  23     False     23
21  Chicago  6/16/2015  0.58  0.03  36     False     36
22  Chicago  6/16/2015  1.16  0.06  35     False     35
23  Chicago  6/16/2015  2.31  0.05  36     False     36
24  Chicago  6/16/2015  2.89 -0.41  20      True     36
25  Chicago  6/16/2015  3.47 -0.38  35      True     36
26      NYC  2/22/2015  6.55  0.10  36      True     37
27      NYC  2/22/2015  3.93  0.16  21      True     37
28      NYC  2/22/2015  3.27  0.06  32      True     37
29      NYC  2/22/2015  2.62 -0.28  35      True     37
30      NYC  2/22/2015  1.96  0.09  37     False     37
31      NYC  2/22/2015  1.31  0.04  39     False     39
32      NYC  2/22/2015  0.65  0.02  34     False     34
33      NYC  2/22/2015  0.03  0.09  23     False     23
34      NYC  2/22/2015  0.58  0.03  36     False     36
35      NYC  2/22/2015  1.16  0.06  35     False     35
36      NYC  2/22/2015  2.31  0.05  36     False     36
37      NYC  2/22/2015  2.89 -0.41  20      True     36
38      NYC  2/22/2015  3.47 -0.38  35      True     36
39      NYC   5/5/2015  6.55  0.30  36      True     37
40      NYC   5/5/2015  3.93  0.16  21      True     37
41      NYC   5/5/2015  3.27  0.06  32      True     37
42      NYC   5/5/2015  2.62 -0.28  35      True     37
43      NYC   5/5/2015  1.96  0.09  37     False     37
44      NYC   5/5/2015  1.31  0.04  39     False     39
45      NYC   5/5/2015  0.65  0.02  34     False     34
46      NYC   5/5/2015  0.03  0.09  23     False     23
47      NYC   5/5/2015  0.58  0.03  36     False     36
48      NYC   5/5/2015  1.16  0.06  35     False     35
49      NYC   5/5/2015  2.31  0.05  36     False     36
50      NYC   5/5/2015  2.89 -0.41  20      True     36
51      NYC   5/5/2015  3.47 -0.38  35      True     36

然后我将数据分组如下:

gb = df.groupby(['City','date'])

一切似乎都很好:

In [6]:
gb.City.count()

Out[6]:
City     date     
Chicago  5/25/2015    13
         6/16/2015    13
NYC      2/22/2015    13
         5/5/2015     13
Name: City, dtype: int64

我需要的是在第一次出现< 0时查看min(dist)的两侧。在第一组(2015年5月25日芝加哥),下行触发发生在指数= 3,因此下行的所有值(abs(dist))将与触发前的值具有相同的b水平(指数= 4)。从min(dist)开始向上发生同样的事情。上行触发位于指数水平= 11,进一步上行的new_b值将全部设定为指数= 10时的b值

此外,根本不需要触发器。

感谢您的帮助

约翰

1 个答案:

答案 0 :(得分:1)

import pandas as pd
import numpy as np
import io  # I use py3.4

# your data
raw_data = ',City,date,dist,a,b\n0,Chicago,5/25/2015,6.55,0.1,36\n1,Chicago,5/25/2015,3.93,0.16,21\n2,Chicago,5/25/2015,3.27,0.06,32\n3,Chicago,5/25/2015,2.62,-0.28,35\n4,Chicago,5/25/2015,1.96,0.09,37\n5,Chicago,5/25/2015,1.31,0.04,39\n6,Chicago,5/25/2015,0.65,0.02,34\n7,Chicago,5/25/2015,0.03,0.09,23\n8,Chicago,5/25/2015,0.58,0.03,36\n9,Chicago,5/25/2015,1.16,0.06,35\n10,Chicago,5/25/2015,2.31,0.05,36\n11,Chicago,5/25/2015,2.89,-0.41,20\n12,Chicago,5/25/2015,3.47,-0.38,35\n13,Chicago,6/16/2015,6.55,0.3,36\n14,Chicago,6/16/2015,3.93,0.16,21\n15,Chicago,6/16/2015,3.27,0.06,32\n16,Chicago,6/16/2015,2.62,-0.28,35\n17,Chicago,6/16/2015,1.96,0.09,37\n18,Chicago,6/16/2015,1.31,0.04,39\n19,Chicago,6/16/2015,0.65,0.02,34\n20,Chicago,6/16/2015,0.03,0.09,23\n21,Chicago,6/16/2015,0.58,0.03,36\n22,Chicago,6/16/2015,1.16,0.06,35\n23,Chicago,6/16/2015,2.31,0.05,36\n24,Chicago,6/16/2015,2.89,-0.41,20\n25,Chicago,6/16/2015,3.47,-0.38,35\n26,NYC,2/22/2015,6.55,0.1,36\n27,NYC,2/22/2015,3.93,0.16,21\n28,NYC,2/22/2015,3.27,0.06,32\n29,NYC,2/22/2015,2.62,-0.28,35\n30,NYC,2/22/2015,1.96,0.09,37\n31,NYC,2/22/2015,1.31,0.04,39\n32,NYC,2/22/2015,0.65,0.02,34\n33,NYC,2/22/2015,0.03,0.09,23\n34,NYC,2/22/2015,0.58,0.03,36\n35,NYC,2/22/2015,1.16,0.06,35\n36,NYC,2/22/2015,2.31,0.05,36\n37,NYC,2/22/2015,2.89,-0.41,20\n38,NYC,2/22/2015,3.47,-0.38,35\n39,NYC,5/5/2015,6.55,0.3,36\n40,NYC,5/5/2015,3.93,0.16,21\n41,NYC,5/5/2015,3.27,0.06,32\n42,NYC,5/5/2015,2.62,-0.28,35\n43,NYC,5/5/2015,1.96,0.09,37\n44,NYC,5/5/2015,1.31,0.04,39\n45,NYC,5/5/2015,0.65,0.02,34\n46,NYC,5/5/2015,0.03,0.09,23\n47,NYC,5/5/2015,0.58,0.03,36\n48,NYC,5/5/2015,1.16,0.06,35\n49,NYC,5/5/2015,2.31,0.05,36\n50,NYC,5/5/2015,2.89,-0.41,20\n51,NYC,5/5/2015,3.47,-0.38,35\n'

df = pd.read_csv(io.StringIO(raw_data), index_col=[0])

   Out[105]: 
          City       date  dist     a   b
   0   Chicago  5/25/2015  6.55  0.10  36
   1   Chicago  5/25/2015  3.93  0.16  21
   2   Chicago  5/25/2015  3.27  0.06  32
   3   Chicago  5/25/2015  2.62 -0.28  35
   4   Chicago  5/25/2015  1.96  0.09  37
   5   Chicago  5/25/2015  1.31  0.04  39
   6   Chicago  5/25/2015  0.65  0.02  34
   7   Chicago  5/25/2015  0.03  0.09  23
   8   Chicago  5/25/2015  0.58  0.03  36
   9   Chicago  5/25/2015  1.16  0.06  35
   10  Chicago  5/25/2015  2.31  0.05  36
   11  Chicago  5/25/2015  2.89 -0.41  20
   12  Chicago  5/25/2015  3.47 -0.38  35
   13  Chicago  6/16/2015  6.55  0.30  36
   14  Chicago  6/16/2015  3.93  0.16  21
   15  Chicago  6/16/2015  3.27  0.06  32
   16  Chicago  6/16/2015  2.62 -0.28  35
   17  Chicago  6/16/2015  1.96  0.09  37
   18  Chicago  6/16/2015  1.31  0.04  39
   19  Chicago  6/16/2015  0.65  0.02  34
   20  Chicago  6/16/2015  0.03  0.09  23
   21  Chicago  6/16/2015  0.58  0.03  36
   22  Chicago  6/16/2015  1.16  0.06  35
   23  Chicago  6/16/2015  2.31  0.05  36
   24  Chicago  6/16/2015  2.89 -0.41  20
   25  Chicago  6/16/2015  3.47 -0.38  35
   26      NYC  2/22/2015  6.55  0.10  36
   27      NYC  2/22/2015  3.93  0.16  21
   28      NYC  2/22/2015  3.27  0.06  32
   29      NYC  2/22/2015  2.62 -0.28  35
   30      NYC  2/22/2015  1.96  0.09  37
   31      NYC  2/22/2015  1.31  0.04  39
   32      NYC  2/22/2015  0.65  0.02  34
   33      NYC  2/22/2015  0.03  0.09  23
   34      NYC  2/22/2015  0.58  0.03  36
   35      NYC  2/22/2015  1.16  0.06  35
   36      NYC  2/22/2015  2.31  0.05  36
   37      NYC  2/22/2015  2.89 -0.41  20
   38      NYC  2/22/2015  3.47 -0.38  35
   39      NYC   5/5/2015  6.55  0.30  36
   40      NYC   5/5/2015  3.93  0.16  21
   41      NYC   5/5/2015  3.27  0.06  32
   42      NYC   5/5/2015  2.62 -0.28  35
   43      NYC   5/5/2015  1.96  0.09  37
   44      NYC   5/5/2015  1.31  0.04  39
   45      NYC   5/5/2015  0.65  0.02  34
   46      NYC   5/5/2015  0.03  0.09  23
   47      NYC   5/5/2015  0.58  0.03  36
   48      NYC   5/5/2015  1.16  0.06  35
   49      NYC   5/5/2015  2.31  0.05  36
   50      NYC   5/5/2015  2.89 -0.41  20
   51      NYC   5/5/2015  3.47 -0.38  35


def custom_func(group):
    # get index location of min-dist
    min_idx = group.dist.argmin()
    # processing upper side
    # ==================================================
    # reverse the order from first to min_idx
    temp1 = group.loc[min_idx:group.index[0]-1:-1].copy()
    # get the first negative trigger, use the cumsum trick
    temp1['is_capped'] = (temp1.a < 0).astype(int).cumsum().astype(bool)
    temp1['new_b'] = temp1.b[~temp1.is_capped]
    temp1 = temp1.fillna(method='ffill')
    # processing lower side
    # ==================================================
    # get index location of min-dist
    min_idx = group.dist.argmin()
    # reverse the order from first to min_idx
    temp2 = group.loc[min_idx:group.index[-1]+1:1].copy()
    # get the first negative trigger, use the cumsum trick
    temp2['is_capped'] = (temp2.a < 0).astype(int).cumsum().astype(bool)
    temp2['new_b'] = temp2.b[~temp2.is_capped]
    temp2 = temp2.fillna(method='ffill')
    # combine, min_idx row is duplicated
    # ==================================================
    res = temp1[::-1].append(temp2.iloc[1:])
    return res[['dist', 'a', 'b', 'is_capped', 'new_b']]

result = df.groupby(['City', 'date']).apply(custom_func).reset_index(level=['City', 'date'])

Out[394]: 
       City       date  dist     a   b is_capped  new_b
0   Chicago  5/25/2015  6.55  0.10  36      True     37
1   Chicago  5/25/2015  3.93  0.16  21      True     37
2   Chicago  5/25/2015  3.27  0.06  32      True     37
3   Chicago  5/25/2015  2.62 -0.28  35      True     37
4   Chicago  5/25/2015  1.96  0.09  37     False     37
5   Chicago  5/25/2015  1.31  0.04  39     False     39
6   Chicago  5/25/2015  0.65  0.02  34     False     34
7   Chicago  5/25/2015  0.03  0.09  23     False     23
8   Chicago  5/25/2015  0.58  0.03  36     False     36
9   Chicago  5/25/2015  1.16  0.06  35     False     35
10  Chicago  5/25/2015  2.31  0.05  36     False     36
11  Chicago  5/25/2015  2.89 -0.41  20      True     36
12  Chicago  5/25/2015  3.47 -0.38  35      True     36
13  Chicago  6/16/2015  6.55  0.30  36      True     37
14  Chicago  6/16/2015  3.93  0.16  21      True     37
15  Chicago  6/16/2015  3.27  0.06  32      True     37
16  Chicago  6/16/2015  2.62 -0.28  35      True     37
17  Chicago  6/16/2015  1.96  0.09  37     False     37
18  Chicago  6/16/2015  1.31  0.04  39     False     39
19  Chicago  6/16/2015  0.65  0.02  34     False     34
20  Chicago  6/16/2015  0.03  0.09  23     False     23
21  Chicago  6/16/2015  0.58  0.03  36     False     36
22  Chicago  6/16/2015  1.16  0.06  35     False     35
23  Chicago  6/16/2015  2.31  0.05  36     False     36
24  Chicago  6/16/2015  2.89 -0.41  20      True     36
25  Chicago  6/16/2015  3.47 -0.38  35      True     36
26      NYC  2/22/2015  6.55  0.10  36      True     37
27      NYC  2/22/2015  3.93  0.16  21      True     37
28      NYC  2/22/2015  3.27  0.06  32      True     37
29      NYC  2/22/2015  2.62 -0.28  35      True     37
30      NYC  2/22/2015  1.96  0.09  37     False     37
31      NYC  2/22/2015  1.31  0.04  39     False     39
32      NYC  2/22/2015  0.65  0.02  34     False     34
33      NYC  2/22/2015  0.03  0.09  23     False     23
34      NYC  2/22/2015  0.58  0.03  36     False     36
35      NYC  2/22/2015  1.16  0.06  35     False     35
36      NYC  2/22/2015  2.31  0.05  36     False     36
37      NYC  2/22/2015  2.89 -0.41  20      True     36
38      NYC  2/22/2015  3.47 -0.38  35      True     36
39      NYC   5/5/2015  6.55  0.30  36      True     37
40      NYC   5/5/2015  3.93  0.16  21      True     37
41      NYC   5/5/2015  3.27  0.06  32      True     37
42      NYC   5/5/2015  2.62 -0.28  35      True     37
43      NYC   5/5/2015  1.96  0.09  37     False     37
44      NYC   5/5/2015  1.31  0.04  39     False     39
45      NYC   5/5/2015  0.65  0.02  34     False     34
46      NYC   5/5/2015  0.03  0.09  23     False     23
47      NYC   5/5/2015  0.58  0.03  36     False     36
48      NYC   5/5/2015  1.16  0.06  35     False     35
49      NYC   5/5/2015  2.31  0.05  36     False     36
50      NYC   5/5/2015  2.89 -0.41  20      True     36
51      NYC   5/5/2015  3.47 -0.38  35      True     36