生成缺少时间间隔的列表

时间:2018-07-12 11:15:16

标签: python pandas

我有Pandas系列,其中有一些用nans表示的缺失数据,并希望大致了解该数据缺失多长时间以及多少次

一个例子是:

10:01    1.23
10:02    2.23
10:03    nan
10:04    nan
10:05    nan
10:06    6.23
10:07    nan
10:08    nan
10:09    9.23

然后,期望输出将是这样的列表

missing = [[10:03,10:05], [10:07,10:08]]
N_missing = 2

2 个答案:

答案 0 :(得分:3)

使用:

#create DataFrame
df = df.reset_index()
df.columns = ['A','B']

#boolean mask for check no NaNs to variable for reuse
m = df['B'].notnull()
#create index by cumulative sum for unique groups for consecutive NaNs
df.index = m.cumsum()

#filter only NaNs row and aggregate first and last value, convert to list
missing = df[~m.values].groupby(level=0)['A'].agg(['first','last']).values.tolist()
print (missing)
[['10:03', '10:05'], ['10:07', '10:08']]

#get length of nested lists
N_missing = len(missing)
print (N_missing)
2

详细信息

print (df[~m.values])
       A   B
B           
2  10:03 NaN
2  10:04 NaN
2  10:05 NaN
3  10:07 NaN
3  10:08 NaN

Series相似的解决方案:

m = s.notnull()
cum = m.cumsum()
missing = s[~m.values].index.to_series().groupby(cum).agg(['first','last']).values.tolist()
print (missing)
[['10:03', '10:05'], ['10:07', '10:08']]

N_missing = len(missing)
print (N_missing)
2

答案 1 :(得分:0)

如果数据框的列使用这些名称存储

missing=df[np.isnan(f['value'])]
no_missing=len(missing)

缺少

date     value
10:03    NaN
10:04    NaN
10:05    NaN
10:07    NaN
10:08    NaN

不容错过

5