我有一个缺少值的数据框。我想找到连续缺失值的数量及其计数。以下是我期望的示例数据和示例结果
Sample data
Timestamp X
2018-01-02 00:00:00 6
2018-01-02 00:05:00 6
2018-01-02 00:10:00 4
2018-01-02 00:15:00 nan
2018-01-02 00:20:00 nan
2018-01-02 00:25:00 3
2018-01-02 00:30:00 4
2018-01-02 00:35:00 nan
2018-01-02 00:40:00 nan
2018-01-02 00:45:00 nan
2018-01-02 00:50:00 nan
2018-01-02 00:55:00 nan
2018-01-02 01:00:00 nan
2018-01-02 01:05:00 2
2018-01-02 01:10:00 4
2018-01-02 01:15:00 6
2018-01-02 01:20:00 6
2018-01-02 01:25:00 nan
2018-01-02 01:30:00 nan
2018-01-02 01:35:00 6
2018-01-02 01:40:00 nan
2018-01-02 01:45:00 nan
2018-01-02 01:50:00 6
2018-01-02 01:55:00 6
2018-01-02 02:00:00 nan
2018-01-02 02:05:00 nan
2018-01-02 02:10:00 nan
2018-01-02 02:15:00 3
2018-01-02 02:20:00 4
预期结果
Consecutive missing
values range Cases
0-2 3
3-5 1
6 and above 1
答案 0 :(得分:2)
首先使用Identifying consecutive NaN's with pandas
中的解决方案,然后过滤掉0
值,并将cut
用于垃圾箱,最后使用GroupBy.size
计数值:
s = df.X.isna().groupby(df.X.notna().cumsum()).sum()
s = s[s!=0]
b = pd.cut(s, bins=[0, 2, 5, np.inf], labels=['0-2','3-5','6 and above'])
out = b.groupby(b).size().reset_index(name='Cases')
print (out)
X Cases
0 0-2 3
1 3-5 1
2 6 and above 1