我想提前提到这个问题非常接近问题#30990147
Capping values after a trigger level in a different variable _after GroupBy
不同之处在于,在这种情况下,当我们进一步远离数据中心时,触发加盖的变量不会继续触发。在问题30990147中,一旦触发器被触发,所有后续值也将触发相同的上限。
在这种情况下,我需要检查从min(dist)向两个方向命中的触发器。在下面的例子中,我使用了min(dist)两侧的第一个负数的触发器。 min(dist)在第一个" city / date"的索引= 7。基。
我手动添加了列is_capped和new_b来说明触发级别发生的位置
In [6]:
df
Out[6]:
City date dist a b is_capped new_b
0 Chicago 5/25/2015 6.55 0.10 36 True 37
1 Chicago 5/25/2015 3.93 0.16 21 True 37
2 Chicago 5/25/2015 3.27 0.06 32 True 37
3 Chicago 5/25/2015 2.62 -0.28 35 True 37
4 Chicago 5/25/2015 1.96 0.09 37 False 37
5 Chicago 5/25/2015 1.31 0.04 39 False 39
6 Chicago 5/25/2015 0.65 0.02 34 False 34
7 Chicago 5/25/2015 0.03 0.09 23 False 23
8 Chicago 5/25/2015 0.58 0.03 36 False 36
9 Chicago 5/25/2015 1.16 0.06 35 False 35
10 Chicago 5/25/2015 2.31 0.05 36 False 36
11 Chicago 5/25/2015 2.89 -0.41 20 True 36
12 Chicago 5/25/2015 3.47 -0.38 35 True 36
13 Chicago 6/16/2015 6.55 0.30 36 True 37
14 Chicago 6/16/2015 3.93 0.16 21 True 37
15 Chicago 6/16/2015 3.27 0.06 32 True 37
16 Chicago 6/16/2015 2.62 -0.28 35 True 37
17 Chicago 6/16/2015 1.96 0.09 37 False 37
18 Chicago 6/16/2015 1.31 0.04 39 False 39
19 Chicago 6/16/2015 0.65 0.02 34 False 34
20 Chicago 6/16/2015 0.03 0.09 23 False 23
21 Chicago 6/16/2015 0.58 0.03 36 False 36
22 Chicago 6/16/2015 1.16 0.06 35 False 35
23 Chicago 6/16/2015 2.31 0.05 36 False 36
24 Chicago 6/16/2015 2.89 -0.41 20 True 36
25 Chicago 6/16/2015 3.47 -0.38 35 True 36
26 NYC 2/22/2015 6.55 0.10 36 True 37
27 NYC 2/22/2015 3.93 0.16 21 True 37
28 NYC 2/22/2015 3.27 0.06 32 True 37
29 NYC 2/22/2015 2.62 -0.28 35 True 37
30 NYC 2/22/2015 1.96 0.09 37 False 37
31 NYC 2/22/2015 1.31 0.04 39 False 39
32 NYC 2/22/2015 0.65 0.02 34 False 34
33 NYC 2/22/2015 0.03 0.09 23 False 23
34 NYC 2/22/2015 0.58 0.03 36 False 36
35 NYC 2/22/2015 1.16 0.06 35 False 35
36 NYC 2/22/2015 2.31 0.05 36 False 36
37 NYC 2/22/2015 2.89 -0.41 20 True 36
38 NYC 2/22/2015 3.47 -0.38 35 True 36
39 NYC 5/5/2015 6.55 0.30 36 True 37
40 NYC 5/5/2015 3.93 0.16 21 True 37
41 NYC 5/5/2015 3.27 0.06 32 True 37
42 NYC 5/5/2015 2.62 -0.28 35 True 37
43 NYC 5/5/2015 1.96 0.09 37 False 37
44 NYC 5/5/2015 1.31 0.04 39 False 39
45 NYC 5/5/2015 0.65 0.02 34 False 34
46 NYC 5/5/2015 0.03 0.09 23 False 23
47 NYC 5/5/2015 0.58 0.03 36 False 36
48 NYC 5/5/2015 1.16 0.06 35 False 35
49 NYC 5/5/2015 2.31 0.05 36 False 36
50 NYC 5/5/2015 2.89 -0.41 20 True 36
51 NYC 5/5/2015 3.47 -0.38 35 True 36
然后我将数据分组如下:
gb = df.groupby(['City','date'])
一切似乎都很好:
In [6]:
gb.City.count()
Out[6]:
City date
Chicago 5/25/2015 13
6/16/2015 13
NYC 2/22/2015 13
5/5/2015 13
Name: City, dtype: int64
我需要的是在第一次出现< 0时查看min(dist)的两侧。在第一组(2015年5月25日芝加哥),下行触发发生在指数= 3,因此下行的所有值(abs(dist))将与触发前的值具有相同的b水平(指数= 4)。从min(dist)开始向上发生同样的事情。上行触发位于指数水平= 11,进一步上行的new_b值将全部设定为指数= 10时的b值
此外,根本不需要触发器。
感谢您的帮助
约翰
答案 0 :(得分:1)
import pandas as pd
import numpy as np
import io # I use py3.4
# your data
raw_data = ',City,date,dist,a,b\n0,Chicago,5/25/2015,6.55,0.1,36\n1,Chicago,5/25/2015,3.93,0.16,21\n2,Chicago,5/25/2015,3.27,0.06,32\n3,Chicago,5/25/2015,2.62,-0.28,35\n4,Chicago,5/25/2015,1.96,0.09,37\n5,Chicago,5/25/2015,1.31,0.04,39\n6,Chicago,5/25/2015,0.65,0.02,34\n7,Chicago,5/25/2015,0.03,0.09,23\n8,Chicago,5/25/2015,0.58,0.03,36\n9,Chicago,5/25/2015,1.16,0.06,35\n10,Chicago,5/25/2015,2.31,0.05,36\n11,Chicago,5/25/2015,2.89,-0.41,20\n12,Chicago,5/25/2015,3.47,-0.38,35\n13,Chicago,6/16/2015,6.55,0.3,36\n14,Chicago,6/16/2015,3.93,0.16,21\n15,Chicago,6/16/2015,3.27,0.06,32\n16,Chicago,6/16/2015,2.62,-0.28,35\n17,Chicago,6/16/2015,1.96,0.09,37\n18,Chicago,6/16/2015,1.31,0.04,39\n19,Chicago,6/16/2015,0.65,0.02,34\n20,Chicago,6/16/2015,0.03,0.09,23\n21,Chicago,6/16/2015,0.58,0.03,36\n22,Chicago,6/16/2015,1.16,0.06,35\n23,Chicago,6/16/2015,2.31,0.05,36\n24,Chicago,6/16/2015,2.89,-0.41,20\n25,Chicago,6/16/2015,3.47,-0.38,35\n26,NYC,2/22/2015,6.55,0.1,36\n27,NYC,2/22/2015,3.93,0.16,21\n28,NYC,2/22/2015,3.27,0.06,32\n29,NYC,2/22/2015,2.62,-0.28,35\n30,NYC,2/22/2015,1.96,0.09,37\n31,NYC,2/22/2015,1.31,0.04,39\n32,NYC,2/22/2015,0.65,0.02,34\n33,NYC,2/22/2015,0.03,0.09,23\n34,NYC,2/22/2015,0.58,0.03,36\n35,NYC,2/22/2015,1.16,0.06,35\n36,NYC,2/22/2015,2.31,0.05,36\n37,NYC,2/22/2015,2.89,-0.41,20\n38,NYC,2/22/2015,3.47,-0.38,35\n39,NYC,5/5/2015,6.55,0.3,36\n40,NYC,5/5/2015,3.93,0.16,21\n41,NYC,5/5/2015,3.27,0.06,32\n42,NYC,5/5/2015,2.62,-0.28,35\n43,NYC,5/5/2015,1.96,0.09,37\n44,NYC,5/5/2015,1.31,0.04,39\n45,NYC,5/5/2015,0.65,0.02,34\n46,NYC,5/5/2015,0.03,0.09,23\n47,NYC,5/5/2015,0.58,0.03,36\n48,NYC,5/5/2015,1.16,0.06,35\n49,NYC,5/5/2015,2.31,0.05,36\n50,NYC,5/5/2015,2.89,-0.41,20\n51,NYC,5/5/2015,3.47,-0.38,35\n'
df = pd.read_csv(io.StringIO(raw_data), index_col=[0])
Out[105]:
City date dist a b
0 Chicago 5/25/2015 6.55 0.10 36
1 Chicago 5/25/2015 3.93 0.16 21
2 Chicago 5/25/2015 3.27 0.06 32
3 Chicago 5/25/2015 2.62 -0.28 35
4 Chicago 5/25/2015 1.96 0.09 37
5 Chicago 5/25/2015 1.31 0.04 39
6 Chicago 5/25/2015 0.65 0.02 34
7 Chicago 5/25/2015 0.03 0.09 23
8 Chicago 5/25/2015 0.58 0.03 36
9 Chicago 5/25/2015 1.16 0.06 35
10 Chicago 5/25/2015 2.31 0.05 36
11 Chicago 5/25/2015 2.89 -0.41 20
12 Chicago 5/25/2015 3.47 -0.38 35
13 Chicago 6/16/2015 6.55 0.30 36
14 Chicago 6/16/2015 3.93 0.16 21
15 Chicago 6/16/2015 3.27 0.06 32
16 Chicago 6/16/2015 2.62 -0.28 35
17 Chicago 6/16/2015 1.96 0.09 37
18 Chicago 6/16/2015 1.31 0.04 39
19 Chicago 6/16/2015 0.65 0.02 34
20 Chicago 6/16/2015 0.03 0.09 23
21 Chicago 6/16/2015 0.58 0.03 36
22 Chicago 6/16/2015 1.16 0.06 35
23 Chicago 6/16/2015 2.31 0.05 36
24 Chicago 6/16/2015 2.89 -0.41 20
25 Chicago 6/16/2015 3.47 -0.38 35
26 NYC 2/22/2015 6.55 0.10 36
27 NYC 2/22/2015 3.93 0.16 21
28 NYC 2/22/2015 3.27 0.06 32
29 NYC 2/22/2015 2.62 -0.28 35
30 NYC 2/22/2015 1.96 0.09 37
31 NYC 2/22/2015 1.31 0.04 39
32 NYC 2/22/2015 0.65 0.02 34
33 NYC 2/22/2015 0.03 0.09 23
34 NYC 2/22/2015 0.58 0.03 36
35 NYC 2/22/2015 1.16 0.06 35
36 NYC 2/22/2015 2.31 0.05 36
37 NYC 2/22/2015 2.89 -0.41 20
38 NYC 2/22/2015 3.47 -0.38 35
39 NYC 5/5/2015 6.55 0.30 36
40 NYC 5/5/2015 3.93 0.16 21
41 NYC 5/5/2015 3.27 0.06 32
42 NYC 5/5/2015 2.62 -0.28 35
43 NYC 5/5/2015 1.96 0.09 37
44 NYC 5/5/2015 1.31 0.04 39
45 NYC 5/5/2015 0.65 0.02 34
46 NYC 5/5/2015 0.03 0.09 23
47 NYC 5/5/2015 0.58 0.03 36
48 NYC 5/5/2015 1.16 0.06 35
49 NYC 5/5/2015 2.31 0.05 36
50 NYC 5/5/2015 2.89 -0.41 20
51 NYC 5/5/2015 3.47 -0.38 35
def custom_func(group):
# get index location of min-dist
min_idx = group.dist.argmin()
# processing upper side
# ==================================================
# reverse the order from first to min_idx
temp1 = group.loc[min_idx:group.index[0]-1:-1].copy()
# get the first negative trigger, use the cumsum trick
temp1['is_capped'] = (temp1.a < 0).astype(int).cumsum().astype(bool)
temp1['new_b'] = temp1.b[~temp1.is_capped]
temp1 = temp1.fillna(method='ffill')
# processing lower side
# ==================================================
# get index location of min-dist
min_idx = group.dist.argmin()
# reverse the order from first to min_idx
temp2 = group.loc[min_idx:group.index[-1]+1:1].copy()
# get the first negative trigger, use the cumsum trick
temp2['is_capped'] = (temp2.a < 0).astype(int).cumsum().astype(bool)
temp2['new_b'] = temp2.b[~temp2.is_capped]
temp2 = temp2.fillna(method='ffill')
# combine, min_idx row is duplicated
# ==================================================
res = temp1[::-1].append(temp2.iloc[1:])
return res[['dist', 'a', 'b', 'is_capped', 'new_b']]
result = df.groupby(['City', 'date']).apply(custom_func).reset_index(level=['City', 'date'])
Out[394]:
City date dist a b is_capped new_b
0 Chicago 5/25/2015 6.55 0.10 36 True 37
1 Chicago 5/25/2015 3.93 0.16 21 True 37
2 Chicago 5/25/2015 3.27 0.06 32 True 37
3 Chicago 5/25/2015 2.62 -0.28 35 True 37
4 Chicago 5/25/2015 1.96 0.09 37 False 37
5 Chicago 5/25/2015 1.31 0.04 39 False 39
6 Chicago 5/25/2015 0.65 0.02 34 False 34
7 Chicago 5/25/2015 0.03 0.09 23 False 23
8 Chicago 5/25/2015 0.58 0.03 36 False 36
9 Chicago 5/25/2015 1.16 0.06 35 False 35
10 Chicago 5/25/2015 2.31 0.05 36 False 36
11 Chicago 5/25/2015 2.89 -0.41 20 True 36
12 Chicago 5/25/2015 3.47 -0.38 35 True 36
13 Chicago 6/16/2015 6.55 0.30 36 True 37
14 Chicago 6/16/2015 3.93 0.16 21 True 37
15 Chicago 6/16/2015 3.27 0.06 32 True 37
16 Chicago 6/16/2015 2.62 -0.28 35 True 37
17 Chicago 6/16/2015 1.96 0.09 37 False 37
18 Chicago 6/16/2015 1.31 0.04 39 False 39
19 Chicago 6/16/2015 0.65 0.02 34 False 34
20 Chicago 6/16/2015 0.03 0.09 23 False 23
21 Chicago 6/16/2015 0.58 0.03 36 False 36
22 Chicago 6/16/2015 1.16 0.06 35 False 35
23 Chicago 6/16/2015 2.31 0.05 36 False 36
24 Chicago 6/16/2015 2.89 -0.41 20 True 36
25 Chicago 6/16/2015 3.47 -0.38 35 True 36
26 NYC 2/22/2015 6.55 0.10 36 True 37
27 NYC 2/22/2015 3.93 0.16 21 True 37
28 NYC 2/22/2015 3.27 0.06 32 True 37
29 NYC 2/22/2015 2.62 -0.28 35 True 37
30 NYC 2/22/2015 1.96 0.09 37 False 37
31 NYC 2/22/2015 1.31 0.04 39 False 39
32 NYC 2/22/2015 0.65 0.02 34 False 34
33 NYC 2/22/2015 0.03 0.09 23 False 23
34 NYC 2/22/2015 0.58 0.03 36 False 36
35 NYC 2/22/2015 1.16 0.06 35 False 35
36 NYC 2/22/2015 2.31 0.05 36 False 36
37 NYC 2/22/2015 2.89 -0.41 20 True 36
38 NYC 2/22/2015 3.47 -0.38 35 True 36
39 NYC 5/5/2015 6.55 0.30 36 True 37
40 NYC 5/5/2015 3.93 0.16 21 True 37
41 NYC 5/5/2015 3.27 0.06 32 True 37
42 NYC 5/5/2015 2.62 -0.28 35 True 37
43 NYC 5/5/2015 1.96 0.09 37 False 37
44 NYC 5/5/2015 1.31 0.04 39 False 39
45 NYC 5/5/2015 0.65 0.02 34 False 34
46 NYC 5/5/2015 0.03 0.09 23 False 23
47 NYC 5/5/2015 0.58 0.03 36 False 36
48 NYC 5/5/2015 1.16 0.06 35 False 35
49 NYC 5/5/2015 2.31 0.05 36 False 36
50 NYC 5/5/2015 2.89 -0.41 20 True 36
51 NYC 5/5/2015 3.47 -0.38 35 True 36