在python中找到nan的连续计数?

时间:2019-04-18 07:10:37

标签: python pandas

我有一个缺少值的数据框。我想找到连续缺失值的数量及其计数。以下是我期望的示例数据和示例结果

Sample data

Timestamp            X
2018-01-02 00:00:00  6
2018-01-02 00:05:00  6
2018-01-02 00:10:00  4
2018-01-02 00:15:00  nan
2018-01-02 00:20:00  nan
2018-01-02 00:25:00  3
2018-01-02 00:30:00  4
2018-01-02 00:35:00  nan
2018-01-02 00:40:00  nan
2018-01-02 00:45:00  nan
2018-01-02 00:50:00  nan
2018-01-02 00:55:00  nan
2018-01-02 01:00:00  nan
2018-01-02 01:05:00  2
2018-01-02 01:10:00  4
2018-01-02 01:15:00  6
2018-01-02 01:20:00  6
2018-01-02 01:25:00  nan
2018-01-02 01:30:00  nan
2018-01-02 01:35:00  6
2018-01-02 01:40:00  nan
2018-01-02 01:45:00  nan
2018-01-02 01:50:00  6
2018-01-02 01:55:00  6
2018-01-02 02:00:00  nan
2018-01-02 02:05:00  nan
2018-01-02 02:10:00  nan
2018-01-02 02:15:00  3
2018-01-02 02:20:00  4

预期结果

Consecutive missing 
values range                Cases
0-2                          3
3-5                          1
6 and above                  1

1 个答案:

答案 0 :(得分:2)

首先使用Identifying consecutive NaN's with pandas中的解决方案,然后过滤掉0值,并将cut用于垃圾箱,最后使用GroupBy.size计数值:

s = df.X.isna().groupby(df.X.notna().cumsum()).sum()
s = s[s!=0]

b = pd.cut(s, bins=[0, 2, 5, np.inf], labels=['0-2','3-5','6 and above'])
out = b.groupby(b).size().reset_index(name='Cases')
print (out)
             X  Cases
0          0-2      3
1          3-5      1
2  6 and above      1