查找具有某些条件的大熊猫的连续值

时间:2019-06-13 11:41:56

标签: python pandas dataframe

我正试图找到连续的零值,并在几个小时内陷入该问题。

我有一个类似的DataFrame:

Day  |  ID  |  Values
-------------------
1    |  aa  |    0
1    |  aa  |    0
1    |  aa  |    0
1    |  aa  |    0
1    |  aa  |    2.5
1    |  aa  |    2.3
1    |  aa  |    0
1    |  aa  |    0
1    |  aa  |    0
2    |  aa  |    0
2    |  aa  |    0
2    |  aa  |    2.3
2    |  aa  |    0
1    |  bb  |    0
1    |  bb  |    0
1    |  bb  |    0
1    |  bb  |    0
1    |  bb  |    3.5

我想找到像这样的连续零值:

Day  |  ID  |  Values   | consec_zeros
--------------------------------------
1    |  aa  |    0      |      0
1    |  aa  |    0      |      1
1    |  aa  |    0      |      2
1    |  aa  |    0      |      3
1    |  aa  |    2.5    |      4      # --> there were 4 of consecutive 0s 
1    |  aa  |    2.3    |      0      # 2.5 just destroy consecutive values
1    |  aa  |    0      |      0
1    |  aa  |    0      |      1
1    |  aa  |    0      |      2      
2    |  aa  |    0      |      0      # no 0s before this of Day 2
2    |  aa  |    0      |      1
2    |  aa  |    2.3    |      2
2    |  aa  |    0      |      0
1    |  bb  |    0      |      0     # --> no 0s before this in ID 'bb'
1    |  bb  |    0      |      1
1    |  bb  |    0      |      2
1    |  bb  |    0      |      3
1    |  bb  |    3.5    |      4

我试图做的是:

g = df['Values'].ne(df['Values'].shift(1)).cumsum()
counts = df.groupby(['ID','Day',g])['Values'].transform('size')
df['consec_zeros'] = np.where(df['Values'].eq(0), counts, 0)

由于我是新手,请提供帮助并指出我做错了什么。

提前谢谢

1 个答案:

答案 0 :(得分:3)

主要问题是用GroupBy.cumcount将第一个非零值添加到下一个计数器值,但也用于脱粒,在我的解决方案中添加了1来对计数器进行区分以区分计数器中的第一个值:< / p>

g = df['Values'].ne(df['Values'].shift(1)).cumsum()
counts = df.groupby(['ID','Day',g])['Values'].cumcount() + 1
df['consec_zeros'] = np.where(df['Values'].eq(0), counts, 0)

#replace 0 to `NaN`s
a = df['consec_zeros'].mask(df['consec_zeros'].eq(0))
#add 1 to forward filling missing values by limit 1 per groups
df['consec_zeros'] = (np.where(a.isna(), 
                               a.groupby([df['ID'],df['Day']]).ffill(limit=1) + 1, 
                               df['consec_zeros']) - 1)
df['consec_zeros'] = df['consec_zeros'].fillna(0).astype(int)
print (df)
    Day  ID  Values  consec_zeros
0     1  aa     0.0             0
1     1  aa     0.0             1
2     1  aa     0.0             2
3     1  aa     0.0             3
4     1  aa     2.5             4
5     1  aa     2.3             0
6     1  aa     0.0             0
7     1  aa     0.0             1
8     1  aa     0.0             2
9     2  aa     0.0             0
10    2  aa     0.0             1
11    2  aa     2.3             2
12    2  aa     0.0             0
13    1  bb     0.0             0
14    1  bb     0.0             1
15    1  bb     0.0             2
16    1  bb     0.0             3
17    1  bb     3.5             4