我有这个数据框
A
0 -2
1 0
2 2
3 2
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2
它的情节是这样的
我想根据序列的长度对数据进行阈值处理,以使B部分变平,因为它的长度小于3,如下所示
答案 0 :(得分:1)
好吧,首先让我们创建一个数据框
df = pd.DataFrame([-2,0,2,2,0,0,0,0,0,0,0,0,2,2,2,2,2,3,2,0,2,2,2,0,3,3,0])
df.columns = ['A']
df
为了进行理智检查,我在末尾添加了两个3和一个4,这给了我们
A
0 -2
1 0
2 2
3 2
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2
23 0
24 3
25 3
26 0
现在,我们必须查看为此用途必须将哪些元素设置为零
prev = None
flag = 0
terminationLst = []
for val,i in zip(df['A'],df.index):
if val == 0 and prev == None: #First time encountering a zero element
prev = i
continue
if val !=0 and prev != None: #Encountering a non zero element after having seen a zero
flag = 1
elif val == 0 and i-prev > 3: Encountering a zero after more than 3 consecutive none zeros
prev = i
elif val == 0 and i-prev <=3 and flag ==1: #Encountering a zero after less than 3 consecutive non zeros
flag = 0
terminationLst.append([x for x in range(prev+1,i)])
prev = i
print (terminationLst)
这为我们提供了需要变为零的元素的索引[[2, 3], [24, 25], [27]]
现在我们只需要将它们设置为零即可,
for elem in terminationLst:
df['A'].iloc[elem] = 0
现在数据框变为
A
0 -2
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 2
13 2
14 2
15 2
16 2
17 3
18 2
19 0
20 2
21 2
22 2
23 0
24 0
25 0
26 0
27 0
28 0
如果您在理解任何特定部分时遇到任何问题,请在下面发表评论。
答案 1 :(得分:1)
没有for循环的替代解决方案(使用@ anand_v.singh的答案中的df):
positive_mask = df>0
sequence_groups = positive_mask.astype(int).diff(1).fillna(0).abs().cumsum().squeeze()
sequence_size = positive_mask.groupby(sequence_groups).transform(len)
df_extended = pd.concat([df, positive_mask, sequence_groups, sequence_size], axis=1)
df_extended.columns = ['value', 'is_positive', 'sequence_group', 'sequence_size']
df_extended
value is_positive sequence_group sequence_size
0 -2 False 0.0 2
1 0 False 0.0 2
2 2 True 1.0 2
3 2 True 1.0 2
4 0 False 2.0 8
5 0 False 2.0 8
6 0 False 2.0 8
7 0 False 2.0 8
8 0 False 2.0 8
9 0 False 2.0 8
10 0 False 2.0 8
11 0 False 2.0 8
12 2 True 3.0 7
13 2 True 3.0 7
14 2 True 3.0 7
15 2 True 3.0 7
16 2 True 3.0 7
17 3 True 3.0 7
18 2 True 3.0 7
19 0 False 4.0 1
20 2 True 5.0 3
21 2 True 5.0 3
22 2 True 5.0 3
23 0 False 6.0 1
24 3 True 7.0 2
25 3 True 7.0 2
26 0 False 8.0 1
flat_mask = (df_extended.sequence_size < 3) & (df_extended.is_positive)
df_extended.loc[flat_mask, 'value'] = 0
df_extended.value.plot()