我正试图找到连续的零值,并在几个小时内陷入该问题。
我有一个类似的DataFrame:
Day | ID | Values
-------------------
1 | aa | 0
1 | aa | 0
1 | aa | 0
1 | aa | 0
1 | aa | 2.5
1 | aa | 2.3
1 | aa | 0
1 | aa | 0
1 | aa | 0
2 | aa | 0
2 | aa | 0
2 | aa | 2.3
2 | aa | 0
1 | bb | 0
1 | bb | 0
1 | bb | 0
1 | bb | 0
1 | bb | 3.5
我想找到像这样的连续零值:
Day | ID | Values | consec_zeros
--------------------------------------
1 | aa | 0 | 0
1 | aa | 0 | 1
1 | aa | 0 | 2
1 | aa | 0 | 3
1 | aa | 2.5 | 4 # --> there were 4 of consecutive 0s
1 | aa | 2.3 | 0 # 2.5 just destroy consecutive values
1 | aa | 0 | 0
1 | aa | 0 | 1
1 | aa | 0 | 2
2 | aa | 0 | 0 # no 0s before this of Day 2
2 | aa | 0 | 1
2 | aa | 2.3 | 2
2 | aa | 0 | 0
1 | bb | 0 | 0 # --> no 0s before this in ID 'bb'
1 | bb | 0 | 1
1 | bb | 0 | 2
1 | bb | 0 | 3
1 | bb | 3.5 | 4
我试图做的是:
g = df['Values'].ne(df['Values'].shift(1)).cumsum()
counts = df.groupby(['ID','Day',g])['Values'].transform('size')
df['consec_zeros'] = np.where(df['Values'].eq(0), counts, 0)
由于我是新手,请提供帮助并指出我做错了什么。
提前谢谢
答案 0 :(得分:3)
主要问题是用GroupBy.cumcount
将第一个非零值添加到下一个计数器值,但也用于脱粒,在我的解决方案中添加了1
来对计数器进行区分以区分计数器中的第一个值:< / p>
g = df['Values'].ne(df['Values'].shift(1)).cumsum()
counts = df.groupby(['ID','Day',g])['Values'].cumcount() + 1
df['consec_zeros'] = np.where(df['Values'].eq(0), counts, 0)
#replace 0 to `NaN`s
a = df['consec_zeros'].mask(df['consec_zeros'].eq(0))
#add 1 to forward filling missing values by limit 1 per groups
df['consec_zeros'] = (np.where(a.isna(),
a.groupby([df['ID'],df['Day']]).ffill(limit=1) + 1,
df['consec_zeros']) - 1)
df['consec_zeros'] = df['consec_zeros'].fillna(0).astype(int)
print (df)
Day ID Values consec_zeros
0 1 aa 0.0 0
1 1 aa 0.0 1
2 1 aa 0.0 2
3 1 aa 0.0 3
4 1 aa 2.5 4
5 1 aa 2.3 0
6 1 aa 0.0 0
7 1 aa 0.0 1
8 1 aa 0.0 2
9 2 aa 0.0 0
10 2 aa 0.0 1
11 2 aa 2.3 2
12 2 aa 0.0 0
13 1 bb 0.0 0
14 1 bb 0.0 1
15 1 bb 0.0 2
16 1 bb 0.0 3
17 1 bb 3.5 4